Tensorflow

GPU?

👉 The corresponding versions between TF and Cuda.

1# check if GPU available?
2import tensorflow as tf
3tf.config.list_physical_devices('GPU')
4

1# prevent tf uses gpu
2# add below before any tf import
3import os
4os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
5

Installation with docker

👉 Official guide.
👉 Note: Docker & GPU .

The advantage of this method is that you only have to install GPU driver on the host machine.

Without docker-compose

👉 Different types of images for tensorflow.

1# pull the image
2docker pull tensorflow/tensorflow:latest-gpu-jupyter
3
4# run a container
5mkdir ~/Downloads/test/notebooks
6docker run --name docker_thi_test -it --rm -v $(realpath ~/Downloads/test/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

1# check if gpu available?
2nvidia-smi
3
4# check if tf2 working?
5docker exec -it docker_thi_test bash
6python

1import tensorflow as tf
2tf.config.list_physical_devices('GPU')

With docker-compose?

👉 Read Docker & GPU instead.

On Windows WSL2

Update later…

Install directly on Linux (without docker)

On my computer, Dell XPS 15 7590 - NVIDIA® GeForce® GTX 1650 Mobile.

🚨

This section is not complete, the guide is still not working!

Installation

👉 GPU support : TensorFlow

This guide is specific for:

1pip show tensorflow # 2.3.1
2pip show tensorflow-gpu # 2.3.1
3nvidia-smi # NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0

👉 Note: PyTorch .
👉 Note: Fresh Ubuntu / Pop!_OS Installation .
👉 Note: Linux .

Errors?

🐞 Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory

Need to install new cuda & CUDNN libraries and tensorflow. (This note is for tensorflow==2.3.1 and CUDA 11.1) (ref).

1# update path
2export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
3export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib\\
4                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
5
6# quickly test cuda version
7nvcc --version

🐞 WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.

Problem come from you don't have enough images!

1train_generator = train_datagen.flow_from_directory(batch_size = 20)
2validation_generator =  test_datagen.flow_from_directory(batch_size  = 20)
3
4# Found 1027 images belonging to 2 classes.
5# Found 256 images belonging to 2 classes.
6
7model.fit(
8    validation_data = validation_generator,
9    steps_per_epoch = 100,
10    epochs = 20,
11    validation_steps = 50,
12    verbose = 2)

We must have steps_per_epoch * batch_size <= #of images, in this case 100*20 = 2000 > 1027. Check this answer for more information.

1# correct
2model.fit(
3    ...
4    steps_per_epoch = 50, # batches in the generator are 20, so it takes 1027//20 batches to get to 1027 images
5    ...
6    validation_steps = 12, # batches in the generator are 20, so it takes 256//20 batches to get to 256 images
7    ...)

🐞 Not found: No algorithm worked! OR This is probably because cuDNN failed to initialize

1nvidia-smi
2# check and kill the process that uses GPU much
3# restart the task

1# OR: add the following to your code
2from tensorflow.compat.v1 import ConfigProto
3from tensorflow.compat.v1 import InteractiveSession
4
5config = ConfigProto()
6config.gpu_options.allow_growth = True
7session = InteractiveSession(config=config)