Docker + GPUs

01-12-2020

πŸ‘‰ Docker note.

WSL + Windows #

Make WSL2 recognize GPU on Windows 10 πŸ‘‰ Check this tut.

If you meet error "Your insider preview build settings need attention", restart many times don't solve the problem. πŸ‘‰ Go to Account setting, then choose "Verify".

With Tensorflow or PyTorch #

πŸ‘‰ Official doc for TF + docker
πŸ‘‰ My note for docker + TF.
πŸ‘‰ An example of docker pytorch with gpu support.

Basic installation #

It works perfectly on Pop!_OS 20.04,

sudo apt update
sudo apt install -y nvidia-container-runtime
sudo apt install -y nvidia-container-toolkit
sudo apt install -y nvidia-cuda-toolkit
# restard required

Check info #

# verify that your computer has a graphic card
lspci -nn | grep '\[03'
# First, install drivers and check
nvidia-smi
# output: NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
# it's maximum CUDA version that your driver supports
# check current version of cuda
nvcc --version
# If there is not nvcc, it may be in /usr/local/cuda/bin/
# Add this location to PATH
# modify ~/.zshrc or ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH

# You may need to install
sudo apt install -y nvidia-cuda-toolkit
# install and check nvidia-docker
dpkg -l | grep nvidia-docker
# or
nvidia-docker version
# Verifying –gpus option under docker run
docker run --help | grep -i gpus
# output: --gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)
# Listing out GPU devices
docker run -it --rm --gpus all ubuntu nvidia-smi -L
# output: GPU 0: GeForce GTX 1650 (...)
# Verifying again with nvidia-smi
docker run -it --rm --gpus all ubuntu nvidia-smi
# test a working setup container-toolkit
docker run --rm --gpus all nvidia/cuda nvidia-smi
# test a working setup container-runtime
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

# Error response from daemon: Unknown runtime specified nvidia.
# Search below for "/etc/docker/daemon.json"
# Maybe it helps.

Install nvidia-docker2 #

More information (ref)

This package is the only docker-specific package of any of them. It takes the script associated with the nvidia-container-runtime and installs it into docker's /etc/docker/daemon.json file for you. This then allows you to run (for example) docker run --runtime=nvidia ... to automatically add GPU support to your containers. It also installs a wrapper script around the native docker CLI called nvidia-docker which lets you invoke docker without needing to specify --runtime=nvidia every single time. It also lets you set an environment variable on the host (NV_GPU) to specify which GPUs should be injected into a container.

πŸ‘‰ Officicial guide to install.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2

# restart docker
sudo systemctl restart docker
# check version
nvidia-docker version

Difference: nvidia-container-toolkit vs nvidia-container-runtime #

πŸ‘‰ What's the difference between the lastest nvidia-docker and nvidia container runtime?

In this note, with Docker 19.03+ (docker --version), he says that nvidia-container-toolkit is used for --gpus (in docker run ...), nvidia-container-runtime is used for --runtime=nvidia (can also be used in docker-compose file).

However, if you want to use Kubernetes with Docker 19.03, you actually need to continue using nvidia-docker2 because Kubernetes doesn't support passing GPU information down to docker through the --gpus flag yet. It still relies on the nvidia-container-runtime to pass GPU information down the stack via a set of environment variables.

πŸ‘‰ Installation Guide β€” NVIDIA Cloud Native Technologies documentation

Using docker-compose? #

Purpose?

# instead of using
docker run \
--gpus all\
--name docker_thi_test\
--rm\
-v abc:abc\
-p 8888:8888
# we use this with docker-compose.yml
docker-compose up
# check version of docker-compose
docker-compose --version
# If "version" in docker-compose.yml < 2.3
# Modify: /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# restart our docker daemon
sudo pkill -SIGHUP dockerd
# If "version" in docker-compose.yml >=2.3
# docker-compose.yml => able to use "runtime"
version: '2.3' # MUST BE >=2.3 AND <3
services:
testing:
ports:
- "8000:8000"
runtime: nvidia
volumes:
- ./object_detection:/object_detection

πŸ‘‰ Check more in my repo my-dockerfiles on Github.

Run the test,

docker pull tensorflow/tensorflow:latest-gpu-jupyter
mkdir ~/Downloads/test/notebooks

Without using docker-compose.yml (tensorflow) (cf. this note for more)

docker run --name docker_thi_test -it --rm -v $(realpath ~/Downloads/test/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

With docker-compose.yml?

# ~/Download/test/Dockerfile
FROM tensorflow/tensorflow:latest-gpu-jupyter
# ~/Download/test/docker-compose.yml
version: '2'
services:
jupyter:
container_name: 'docker_thi_test'
build: .
volumes:
- ./notebooks:/tf/notebooks # notebook directory
ports:
- 8888:8888 # exposed port for jupyter
environment:
- NVIDIA_VISIBLE_DEVICES=0 # which gpu do you want to use for this container
- PASSWORD=12345

Then run,

docker-compose run --rm jupyter

Make NVIDIA work in docker (Linux) #

This section is still working (on 26-Oct-2020) but it's old for newer methods.

Idea: Using NVIDIA driver of the base machine, don't install anything in docker!

Detail of steps

  1. First, maker sure your base machine has an NVIDIA driver.

    # list all gpus
    lspci -nn | grep '\[03'

    # check nvidia & cuda versions
    nvidia-smi
  2. Install nvidia-container-runtime

    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

    curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

    sudo apt-get update

    sudo apt-get install nvidia-container-runtime
  3. Note that, we cannot use docker-compose.yml in this case!!!

  4. Create an image img_datas with Dockerfile is

    FROM nvidia/cuda:10.2-base

    RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get install -y python3-pip python3-dev locales git

    # install dependencies
    COPY requirements.txt requirements.txt
    RUN python3 -m pip install --upgrade pip && \
    python3 -m pip install -r requirements.txt
    COPY . .

    # default command
    CMD [ "jupyter", "lab", "--no-browser", "--allow-root", "--ip=0.0.0.0" ]
  5. Create a container,

    docker run --name docker_thi --gpus all -v /home/thi/folder_1/:/srv/folder_1/ -v /home/thi/folder_1/git/:/srv/folder_2 -dp 8888:8888 -w="/srv" -it img_datas

    # -v: volumes
    # -w: working dir
    # --gpus all: using all gpus on base machine

This article is also very interesting and helpful in some cases.

References #

  1. Difference between base, runtime and devel in Dockerfile of CUDA.
  2. Dockerfile on Github of Tensorflow.