GPUs¶

Working with GPUs is something many deep learning packages do by default. Simply select a GPU pony on the about page, and your code will run a lot faster. At the time of writing, there are enough GPUs for everyone, so there is no need for a formal distribution system.

Useful commands:

nvidia-smi: like htop but for the GPUs.
top | grep PID: gives more information on the process running on the GPU (replace PID with the ID of the process you see in nvidia-smi).
ls -d /usr/local/cuda-*: give an overview of what GPU drivers (CUDA) are installed on the machine.
printf "\n\nmistmane\n";ssh mistmane nvidia-smi;printf "\n\nrariry\n";ssh rarity nvidia-smi;printf "\n\nthunderlane\n";ssh thunderlane nvidia-smi;printf "\n\nsnips\n";ssh snips nvidia-smi: gives an overview of what is going on at all GPUs; should be run from a non-GPU pony (thanks, Martijn Bentum)

Example output of nvidia-smi

 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla T4            On   | 00000000:03:00.0 Off |                    0 |
 | N/A   34C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla T4            On   | 00000000:41:00.0 Off |                    0 |
 | N/A   30C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla T4            On   | 00000000:81:00.0 Off |                    0 |
 | N/A   68C    P0    84W /  70W |   9866MiB / 15109MiB |     99%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    2   N/A  N/A     34931      C   python                           9863MiB |
 +-----------------------------------------------------------------------------+

We do not have a formal system to distribute GPU time among different users. So we kindly ask you to keep others in mind when using the GPUs. Don't use all GPUs on Ponyland. Also try not to use all 8 GPUs on Wildfire, because someone else might need to use those particular GPUs.

If you need to use a particular GPU and someone else is using it, you can always send them an email and ask nicely. To get their email address:

Look at the output of nvidia-smi and get the PID of the process that is running on the GPU you desire. So using the example output above: You want to use GPU 2 on Thunderlane. Under Processes in the nvidia-smi output you can see that a process with the PID 34931 is running on that GPU.
Get the username that started the process: ps -p 34931 -o user. This will output something like:
```
USER
timzee
```
Send an email to username@science.ru.nl, so timzee@science.ru.nl in the example, asking if they can free up the GPUs that you want to use.

Benchmark: GPU speed and memory¶

Pony	GPU	RAM	Simple script	Deep speech recognition
Fancypants	none	n/a	1m51s
Snips	M40	12GB	1m15s	18m13s
Thunderlane	T4	16GB	35s	18m48s
Rarity	V100	16GB	35s	7m46s

The Tensorflow test script used for this can be found below; the results are the average of 3 attempts. The deep speech recognition numbers are a single epoch with the same random seed, performed by Danny Merkx (thanks!).

TensorFlow tricks¶

A simple example script that trains fake data:

from numpy import zeros, array
from random import randrange

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

instances = []
target = []

for i in range(100000):

    n = zeros(500)
    n[randrange(500)] = 1
    instances.append(n)
    target.append(n)

instances = array(instances)
target = array(target)

model = Sequential(
  [
      Dense(500, activation="relu", name="layer1"),
      Dense(510, activation="relu", name="layer2"),
      Dense(500, activation='softmax', name="layer3"),
  ]
)

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(instances,target,epochs=10)

For an overview of which GPUs Tensorflow detects:

from tensorflow.config.experimental import list_physical_devices

print("Num GPUs Available: ", len(list_physical_devices('GPU')))
#Possible output: Num GPUs available: 3

Tensorflow will be default run on the first GPU, so you might want to manually select another one by putting all code within a with tensorflow.device. This way, you can also force CPU usage:

from tensorflow import device

with device('/device:GPU:2'):
   ...

with device('/device:CPU:0'):
   ...

Note that it will use all available CPUs, not just the first one.

TensorFlow on Ponyland¶

The main CUDA install on Ponyland is version 12. However, TensorFlow is not compatible yet with CUDA 12. As such, if you try to use TensorFlow on Ponyland you will get an error like this:

2023-03-02 15:03:44.904322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'

To ~~hack~~ fix this, we can use the CUDA 11 install from Matlab. You can tell TensorFLow where to find CUDA 11 by editing the LD_LIBRARY_PATH environment variable.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/matlab-R2023a/bin/glnxa64

More TensorFlow Config¶

Make sure you install the TensorFLow version that is compatible with the installed CUDA version, see this table

To check the installed CUDA versions: ls /usr/local/ | grep cuda

If you get the following error:

2023-05-08 15:05:58.427605: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.

Try the following command before running your script:

export XLA_FLAGS=--xla_gpu_cuda_data_dir="$(find /usr/local/ -name nvvm | awk -F '/' -v OFS='/' '{$NF=""}1' | head -n 1)"

If you still get an error, for instance something like:

2023-05-09 18:05:26.759467: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:433] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-05-09 18:05:26.759578: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Possibly insufficient driver version: 510.108.3

Try:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/matlab-R2023a/bin/glnxa64

GPUs¶

Sharing GPUs with other Ponyland users¶

Benchmark: GPU speed and memory¶

TensorFlow tricks¶

TensorFlow on Ponyland¶

More TensorFlow Config¶