Incus containers on Lightning¶

On our webserver Lightning, it is possible to request your own Incus container. Within this container, you have full permissions to install everything you want. They are primarily used for serving web applications. It's a similar experience to managing a Virtual Machine, without actually depending on virtualization.

An Incus container (LXC container, Linux container, system container, or referred to as 'fat containers') provide you with the means to manage an entire Linux system yourself. Incus is a fork of LXD.:

Requesting a container¶

Contact us with the following information:

Why do you need an Incus container?
A preferred subdomain name for your application (your_application.cls.ru.nl), a normal domain name also works if you already own one control its nameserver.
The Linux distribution you'd like to run in the container (typically the latest Ubuntu LTS release).
Which ponyland users should have access to the container.
Whether you will uses Docker containers within the Incus container (nesting)
Whether you need a GPU (also read the section on GPUs below!)
OPTIONAL: URL to the source git repository where your application's source code resides. This will be used to extract metadata for the portal page. Can be omitted if you do not desire to be listed on https://webservices.cls.ru.nl.
OPTIONAL: If you use CLARIAH's authentication infrastructure (recommended), pass any OAuth2 endpoints, we will provide you with a client ID and client secret. If you use CLAM, you can skip this as we can derive this automatically for you.

Accessing your container¶

From Lightning¶

You can enter the application by going to Lightning, and then entering the container (where x is the name of your container):

ssh lightning
sudo /etc/cncz/bin/incusexec x

Directly from your local machine¶

If you want to SSH directly to the container, for example so you can use an SFTP client to move around files, follow the following steps:

Figure out the container IP address by doing sudo /etc/cncz/bin/incusexec on Lightning, outside the container .
Inside the container, install SSH: sudo apt install openssh-server and create a user with a name identical to your ponyland login: sudo adduser yourponylandlogin. You should now be able to SSH into the container from Lightning with the container IP address. The rest of the instructions are about making sure you can SSH into the container from places other than Lightning itself.
If you are at home, set up a VPN so you have access to Lightning directly.
On your local machine, create an SSH tunnel. This even works on Windows: ssh -L 6789:yourcontainerip:22 yourponylandname@lightning.science.ru.nl . This makes sure that everything you send to port 6789 on your local machine (like files over SFTP) is first sent to Lightning, and then to yourcontainerip:22 (the SSH port on your container).
In your SFTP-client, connect to sftp:/localhost (but with two slashes) and port 6789.
If you want to directly SSH into the container, the command is: ssh -t yourponylandname@lightning.science.ru.nl ssh yourcontainerip
To avoid having to enter your password twice, you can set up authentication using SSH keys. You have to do this once from your local machine to Lightning, and once from Lightning to your container.

Development versus production containers¶

In most cases, when you request a container, you will actually get two: x (the production container) and x-dev (the development container). The idea is that you can freely develop and experiment in the dev container, and only once you are ready, you put it on the production container. Ideally, you do not install the production server by hand, but instead use an installation script that you developed and tested inside the development container. This workflow has a number of advantages:

You have a way to quickly recreate the containers in case they get lost. Unless communicated differently, we have no backups of the containers, so you cannot assume they are safe (see this page for more details).
Upgrading the OS is just a matter of running your installation script in a new container with the new OS, and check if everything still works at your own pace.
You are always ready to migrate to another machine if needed.

While this installation script could be in any format, we have very good experience with Ansible; see the next section.

Quickly creating installation scripts with Ansible¶

Ansible allows you to simply describe the end state of your container, like this:

    - name: Create the repo dir
      file:
        path: /var/www/repo
        state: directory

    - name: Clone the repository
      ansible.builtin.git:
        repo: 'https://github.com/x'
        dest: /var/www/repo
        update: no

    - name: Install the dependencies into the virtualenv
      ansible.builtin.pip:
         requirements: /var/www/repo/requirements.txt
         virtualenv: /var/www/env

When you run the ansible-playbook command, Ansible checks if everything is like you described it, and if not, makes it so. This way, you can be sure that your development and production containers are identical. We have a template here that you can use as a starting point for your own Ansible script. To get a better grasp on what our template is doing you can take a look at this page, which describes the manual steps for deploying an application within an LXD container

Todo:

Actually test the template
Make it so the Ansible script runs from your local machine instead from within the container

GPU¶

Lightning has a GPU and your container can make use of it. However, there is only one GPU which is shared between all services on lightning so it is essential that your service only loads models into the GPU memory when they are being used, and frees them again afterwards! Your service must also take into account that another user/application may have already taken the GPU, so loading a model can fail. The GPU is claimed on a first come, first save basis, with the understanding that applications are not allowed to hold it over long periods of time and must free GPU memory after having served user requests.

Docker with GPU¶

If you use nested containers and want to pass GPU access to a docker container, you must install nvidia-container-toolkit, e.g. as follows (on a debian/ubuntu system) in the Incus container:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    tee /etc/apt/sources.list.d/nvidia-container-toolkit.list &&\
apt-get update &&\
apt-get upgrade &&\
apt-get install nvidia-container-toolkit &&\
nvidia-ctk runtime configure --runtime=docker &&\
sed -i 's/#no-cgroups = false/no-cgroups = true/' /etc/nvidia-container-runtime/config.toml

The docker containers can best be derived from base images like nvidia/cuda:12.3.2-base-ubuntu22.04 (adapt CUDA and base ubuntu version accordingly if needed).

Troubleshooting¶

If you get something like:

Running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: write
/sys/fs/cgroup/devices/docker/355ca7157c15c853f12eee8cc3f628858fc1d3f7fc32c820f74633108f8a9107/devices.allow: operation not permitted: unknown

Then make sure to set no-cgroups = true in /etc/nvidia-container-runtime/config.toml (as per the last sed line in the above instructions).