Sidharth Shambu

How to share Nvidia GPUs that don’t support MIG and when vGPU isn’t an option

I recently had to build a system to share two Nvidia L40S GPUs (each with 48 GB of memory) among a group of users. The L40S does not support MIG (Multi-instance GPU) as it was designed as a PCIe-only GPU. Therefore, hardware slicing an individual GPU was not an option. Nvidia's vGPU was another option, but it would require separate per-GPU or per-user licensing, which did not make sense for a small academic setup.

How can I share a couple of GPUs with a small group of users without MIG, using vGPU, and at the same time without giving everyone root access on the host?

The TLDR answer is:

This post walks you through what I built, what worked, and what still needs improvement.

What is genv and why did I use it?

Genv is an open source GPU environment and cluster manager. You can think of it as "virtual environments for GPUs"

It allows you to define environments with:

This was perfect for running training/inference scripts on GPUs. As an admin, I had a way to enforce the memory capacity that was allocated for that environment without needing MIG or vGPU licenses. The soft partitioning was good enough for a small set of trustworthy users.

High-level architecture

From the user's perspective, this is a VM that they can SSH into. However, they land inside their own GPU-ready container, which includes the necessary libraries/tools pre-installed, as well as external drives mounted, allowing them to access their larger files/data.

Host side setup

On the host, I started with the basic Nvidia container stack:

  1. Install GPU drivers and CUDA - I used ubuntu-drivers autoinstall to get an appropriate driver and then followed Nvidia's guide to install the container toolkit and hook it into Docker.

  2. Install genv and genv-docker - Following the genv docs, I installed the core genv tool and the Docker integration, and added the genv runtime into Docker's daemon.json instead of passing it via dockerd flags.

  3. Build a base GPU image: Below is a stripped-down version that gives you a ~ 12 GB base image.

FROM nvidia/cuda:12.9.1-runtime-ubuntu24.04
# 1. Install OS packages & SSH server
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-venv python3-pip python3-dev \
    build-essential git curl ca-certificates \
    libsm6 libxext6 libnss-sss sssd-common \
    sudo vim wget \
 && rm -rf /var/lib/apt/lists/*


# 2. Optimize pip usage
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_BREAK_SYSTEM_PACKAGES=1


# 3. Install ML stack
RUN python3 -m pip install --no-cache-dir \
    numpy pandas scipy scikit-learn matplotlib seaborn \
    scikit-image transformers \
 && python3 -m pip install --no-cache-dir \
    torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
    --index-url https://download.pytorch.org/whl/cu124 \
 && python3 -m pip install --no-cache-dir tensorflow==2.16.*


WORKDIR /workspace
CMD ["sleep", "infinity"]

The goal is to provide users with a "batteries included" ML environment, so they do not need to install additional software (they can still use apt-get install).

Disk quotas with XFS project quotas

Since all the containers would run on a single host and to avoid certain users filling up the host's entire disk, having a hard cap on a container's disk usage is non-negotiable.

You could define disk quotas using Docker's overlay2.size option. However, this only works if the underlying file system is XFS. So I had to create a 1 TB loopback image, format it as XFS (ftype=1), and mount it on /var/lib/docker with the pquota flag.

Below are the relevant commands:

sudo fallocate -l 1T /home/docker-data.img
LOOP=$(sudo losetup -f --show /home/docker-data.img)
sudo mkfs.xfs -n ftype=1 "$LOOP"
sudo losetup -d "$LOOP"

echo '/home/docker-data.img /var/lib/docker xfs loop,pquota 0 0' | sudo tee -a /etc/fstab
sudo mount -a

Then in /etc/docker/daemon.json

{
  "storage-driver": "overlay2",
  "storage-opts": ["overlay2.size=100G"],
  "log-driver": "json-file",
  "log-opts": { "max-size": "50m", "max-file": "3" }
}

This gives:

You can also attach XFS project quotas to specific named volumes to set per-volume caps.

Auto-launching a personal container on SSH

I wanted the user experience to be - SSH to the host, and you are automatically dropped into your personal GPU container. To do that, you need:

This script does the following:

If the container already exists, the script starts or attaches instead of recreating it. If it has never been created, it runs genv-docker run to spin up a fresh one.

From the user's point of view, this feels like a personal VM, but in reality, they are inside a container with a GPU slice and disk limit.

Enforcing GPU memory limits with genv-enforce

Everything so far is about wiring. The enforcement itself comes from genv enforce.

There are two key facts:

To make the limits real, I run this on the host as a systemd service: genv enforce --env-memory --non-env-processes --interval 1 with a unit file like:

[Unit]
Description=genv enforce watchdog
After=network-online.target local-fs.target

[Service]
Type=simple
ExecStart=/usr/local/bin/genv enforce --env-memory --non-env-processes --interval 1
StandardOutput=append:/var/log/genv-enforce.log
StandardError=append:/var/log/genv-enforce.log
Restart=always
RestartSec=2s

[Install]
WantedBy=multi-user.target

This loop looks at GPU usage and kills:

Provisioning and housekeeping

To create a new user environment, I wrote a small provisioning script:

./provision_gpu_container.sh <username> <gpu_memory_gb> <gpus> [disk_gb] [image]

For example:

./provision_gpu_container.sh johndoe 6 1 40 gpu-cont

This:

On the maintenance side, I have:

What worked well

A few things turned out nicely:

What was painful and what I would fix

You can feel the rough edges in the system. A few examples:

  1. Memory debugging is hard - 
Users cannot easily see how close they are to their genv quota. They only see "process killed" and guess it was OOM. Environment monitoring or a simple CLI that displays "You currently use X of Y MiB" would be beneficial.

  2. File movement
 - Copying data in and out of the container is not straightforward, even though the external drive was mounted. I need maybe a helper script for Rsync or scp.

  3. No scheduling - 
Right now, this is "first-come, first-served". No scheduler decides who gets a GPU slice when. For a small group, this is fine, but if more users show up, I would need a real scheduler or move to Kubernetes or Slurm with GPU support.

  4. No comprehensive observability or monitoring
 - I currently only have access to genv envs, which show me what envs are being actively used. We could spin up a small service in each of the containers that monitors GPU usage and reports it back. We could then use this data on a control plane to manage environments without needing the CLI or scripts.

When this system makes sense

This setup is not a universal pattern, but it can work well if:

Core takeaway

If you are stuck with GPUs that cannot do MIG and vGPU is not practical, you can still get a long way with:

it ain’t much but its honest work meme header - creates a gpu env that runs on scripts and hope

#containers #docker #genv #gpu #nvidia