GPUs¶
Working with GPUs is something many deep learning packages do by default. Simply select a GPU pony on the about page, and your code will run a lot faster. At the time of writing, there are enough GPUs for everyone, so there is no need for a formal distribution system.
Useful commands:
-
nvidia-smi
: likehtop
but for the GPUs. -
top | grep PID
: gives more information on the process running on the GPU (replace PID with the ID of the process you see innvidia-smi
). -
ls -d /usr/local/cuda-*
: give an overview of what GPU drivers (CUDA) are installed on the machine. -
printf "\n\nmistmane\n";ssh mistmane nvidia-smi;printf "\n\nrariry\n";ssh rarity nvidia-smi;printf "\n\nthunderlane\n";ssh thunderlane nvidia-smi;printf "\n\nsnips\n";ssh snips nvidia-smi
: gives an overview of what is going on at all GPUs; should be run from a non-GPU pony (thanks, Martijn Bentum)
Example output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:03:00.0 Off | 0 |
| N/A 34C P0 26W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:41:00.0 Off | 0 |
| N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:81:00.0 Off | 0 |
| N/A 68C P0 84W / 70W | 9866MiB / 15109MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 2 N/A N/A 34931 C python 9863MiB |
+-----------------------------------------------------------------------------+
Sharing GPUs with other Ponyland users¶
We do not have a formal system to distribute GPU time among different users. So we kindly ask you to keep others in mind when using the GPUs. Don't use all GPUs on Ponyland. Also try not to use all 8 GPUs on Wildfire, because someone else might need to use those particular GPUs.
If you need to use a particular GPU and someone else is using it, you can always send them an email and ask nicely. To get their email address:
- Look at the output of
nvidia-smi
and get the PID of the process that is running on the GPU you desire. So using the example output above: You want to use GPU 2 on Thunderlane. UnderProcesses
in thenvidia-smi
output you can see that a process with the PID 34931 is running on that GPU. - Get the username that started the process:
ps -p 34931 -o user
. This will output something like: - Send an email to username@science.ru.nl, so timzee@science.ru.nl in the example, asking if they can free up the GPUs that you want to use.
Benchmark: GPU speed and memory¶
Pony | GPU | RAM | Simple script | Deep speech recognition |
---|---|---|---|---|
Fancypants | none | n/a | 1m51s | |
Snips | M40 | 12GB | 1m15s | 18m13s |
Thunderlane | T4 | 16GB | 35s | 18m48s |
Rarity | V100 | 16GB | 35s | 7m46s |
The Tensorflow test script used for this can be found below; the results are the average of 3 attempts. The deep speech recognition numbers are a single epoch with the same random seed, performed by Danny Merkx (thanks!).
TensorFlow tricks¶
A simple example script that trains fake data:
from numpy import zeros, array
from random import randrange
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
instances = []
target = []
for i in range(100000):
n = zeros(500)
n[randrange(500)] = 1
instances.append(n)
target.append(n)
instances = array(instances)
target = array(target)
model = Sequential(
[
Dense(500, activation="relu", name="layer1"),
Dense(510, activation="relu", name="layer2"),
Dense(500, activation='softmax', name="layer3"),
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(instances,target,epochs=10)
For an overview of which GPUs Tensorflow detects:
from tensorflow.config.experimental import list_physical_devices
print("Num GPUs Available: ", len(list_physical_devices('GPU')))
#Possible output: Num GPUs available: 3
Tensorflow will be default run on the first GPU, so you might want to
manually select another one by putting all code within a
with tensorflow.device
. This way, you can also force CPU usage:
Note that it will use all available CPUs, not just the first one.
TensorFlow on Ponyland¶
The main CUDA install on Ponyland is version 12. However, TensorFlow is not compatible yet with CUDA 12. As such, if you try to use TensorFlow on Ponyland you will get an error like this:
2023-03-02 15:03:44.904322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'
To hack fix this, we can use the CUDA 11 install from Matlab. You can tell TensorFLow where to find CUDA 11 by editing the LD_LIBRARY_PATH
environment variable.
More TensorFlow Config¶
Make sure you install the TensorFLow version that is compatible with the installed CUDA version, see this table
To check the installed CUDA versions: ls /usr/local/ | grep cuda
If you get the following error:
2023-05-08 15:05:58.427605: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Try the following command before running your script:
export XLA_FLAGS=--xla_gpu_cuda_data_dir="$(find /usr/local/ -name nvvm | awk -F '/' -v OFS='/' '{$NF=""}1' | head -n 1)"
If you still get an error, for instance something like:
2023-05-09 18:05:26.759467: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:433] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-05-09 18:05:26.759578: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Possibly insufficient driver version: 510.108.3
Try: