This implementation adds NVIDIA GPU support to Concourse CI workers, enabling ML/AI workloads, GPU-accelerated builds, and compute-intensive tasks.
enable-gpu (boolean, default: false) - Enable GPU support for workergpu-device-ids (string, default: βallβ) - Specify which GPUs to exposedetect_nvidia_gpus() - Detect NVIDIA GPUs using nvidia-smiverify_nvidia_container_runtime() - Verify nvidia-container-runtime is availableconfigure_containerd_for_gpu() - Configure containerd with NVIDIA runtime_get_gpu_tags() - Generate worker tags based on GPU configurationget_gpu_status_message() - Get GPU status for unit status displayβ NVIDIA GPU: RTX A500 Laptop GPU β NVIDIA Driver: 580.95.05 β nvidia-container-runtime: /usr/bin/nvidia-container-runtime β Juju model on local machine
IMPORTANT: LXD GPU Passthrough Required
When using Juju with LXD (localhost cloud), GPU devices must be passed through to containers. See LXD GPU Setup below.
# Deploy PostgreSQL
juju deploy postgresql --channel 14/stable
# Deploy web server
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm web \
--config mode=web
# Deploy GPU-enabled worker
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm worker \
--config mode=worker \
--config enable-gpu=true
# Create relations
juju relate web:postgresql postgresql:db
juju relate web:web-tsa worker:worker-tsa
# Wait for deployment
juju status --watch 1s
# Check worker status (should show GPU info)
juju status worker
# Expected: "Worker ready (GPU: 1x NVIDIA)"
# Check workers have GPU tags in Concourse
fly -t <target> workers
# Expected tags: gpu, gpu-type=nvidia, gpu-count=1
# Check automation logs if needed
juju debug-log --include worker/0 | grep -i gpu
GPU workers automatically support dataset mounting for ML training pipelines. The charm includes an OCI runtime wrapper that injects dataset mounts into task containers.
# 1. Find your GPU worker's LXC container
juju status gpu-worker
# Note the machine number (e.g., "4")
# 2. Identify the container name
CONTAINER=$(lxc list | grep "juju-.*-4" | awk '{print $2}')
# 3. Mount your dataset directory
lxc config device add $CONTAINER datasets disk \
source=/path/to/your/datasets \
path=/srv/datasets \
readonly=true
# 4. Verify the mount
lxc exec $CONTAINER -- ls -lah /srv/datasets/
Once mounted via LXC, datasets are automatically available in all GPU tasks at /srv/datasets:
jobs:
- name: train-model
plan:
- task: training
tags: [gpu]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: pytorch/pytorch
tag: latest
run:
path: python
args:
- -c
- |
# /srv/datasets automatically available!
import os
print(f"Datasets: {os.listdir('/srv/datasets')}")
No pipeline modifications required! The OCI wrapper automatically injects the mount.
For detailed instructions, examples, and troubleshooting, see dataset-mounting.md.
Create gpu-test.yaml:
jobs:
- name: gpu-check
plan:
- task: nvidia-smi
tags: [gpu]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: nvidia/cuda
tag: 13.1.0-base-ubuntu24.04
run:
path: nvidia-smi
Deploy and run:
# Get web IP
WEB_IP=$(juju status web/0 --format=json | jq -r '.applications.web.units["web/0"]["public-address"]')
# Get admin password
ADMIN_PASS=$(juju run web/leader get-admin-password --format=json | jq -r '."unit-web-0".results.password')
# Login to Concourse
fly -t gpu login -c http://$WEB_IP:8080 -u admin -p "$ADMIN_PASS"
# Set pipeline
fly -t gpu set-pipeline -p gpu-test -c gpu-test.yaml
fly -t gpu unpause-pipeline -p gpu-test
# Trigger job
fly -t gpu trigger-job -j gpu-test/gpu-check -w
Expected output: nvidia-smi output showing your RTX A500 GPU
nvidia-smi to detect GPU count, models, and driver version/etc/containerd/config.tomlgpu, gpu-type=nvidia, gpu-count=Nenable-gpu=true
β
Detect GPUs (nvidia-smi)
β
Verify nvidia-container-runtime
β
Configure containerd with nvidia runtime
β
Add GPU tags to worker config
β
Start/restart worker with GPU support
config.yaml - Added GPU configuration optionslib/concourse_common.py - Added GPU detection functionslib/concourse_worker.py - Added GPU configuration and taggingsrc/charm.py - Integrated GPU setup in install/config eventsREADME.md - Added GPU support documentationAll GPUs (default):
juju config worker gpu-device-ids=all
Worker tags: gpu, gpu-type=nvidia, gpu-count=1
Specific GPUs:
juju config worker gpu-device-ids=0,2
Worker tags: gpu, gpu-type=nvidia, gpu-count=2, gpu-devices=0,2
# Deploy 3 GPU workers
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm worker -n 3 \
--config mode=worker \
--config enable-gpu=true
# 2 GPU workers
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm gpu-worker -n 2 \
--config mode=worker \
--config enable-gpu=true
# 4 regular workers
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm cpu-worker -n 4 \
--config mode=worker
Pipeline can target specific workers:
- task: train-model
tags: [gpu] # Only on GPU workers
- task: build-app
tags: [] # Any worker
TensorFlow Training:
- task: train
tags: [gpu, gpu-count=1]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: tensorflow/tensorflow
tag: latest-gpu
run:
path: python
args: [train.py]
PyTorch Training:
- task: train
tags: [gpu]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: pytorch/pytorch
tag: latest
run:
path: python
args: ["-c", "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"]
CUDA Compilation:
- task: build
tags: [gpu]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: nvidia/cuda
tag: 13.1.0-devel-ubuntu24.04
run:
path: sh
args:
- -c
- |
nvcc --version
nvcc my_cuda_program.cu -o my_program
# Check GPU hardware
nvidia-smi
# Check driver
modinfo nvidia
# Check container runtime
which nvidia-container-runtime
nvidia-container-runtime --version
# Check containerd config
sudo cat /etc/containerd/config.toml | grep -A 5 nvidia
# Restart containerd
sudo systemctl restart containerd
sudo systemctl status containerd
# Check logs
sudo journalctl -u containerd -n 100
# Check worker logs
sudo journalctl -u concourse-worker -n 100 -f
# Check worker config
sudo cat /var/lib/concourse/config.env
# Restart worker
sudo systemctl restart concourse-worker
# Test GPU access manually
sudo ctr run --rm --runtime io.containerd.runc.v2 \
docker.io/nvidia/cuda:13.1.0-base-ubuntu24.04 \
test-gpu nvidia-smi
# If this fails, check:
# 1. nvidia-container-runtime is installed
# 2. containerd config has nvidia runtime
# 3. containerd has been restarted
detect_nvidia_gpus())verify_nvidia_container_runtime())_get_gpu_tags())juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm worker \
--config mode=worker \
--config enable-gpu=true
When deploying on LXD (Juju localhost cloud), GPU devices must be passed through to containers.
# Find the worker container name
lxc list | grep worker
# Add GPU device (container will automatically restart)
lxc config device add <container-name> gpu0 gpu
# Example:
lxc config device add juju-abc123-0 gpu0 gpu
# Verify GPU in container (check status)
juju status worker
# Should show: "Worker ready (GPU: 1x NVIDIA)"
# If status doesn't show GPU, trigger reconfiguration
juju config worker enable-gpu=false
juju config worker enable-gpu=true
# Create LXD profile with GPU
lxc profile create gpu-profile
lxc profile device add gpu-profile gpu0 gpu
# Use Juju constraints to apply profile (requires manual LXD config)
# This is more complex and requires Juju 3.x features
For production GPU workloads, deploy workers on bare metal or VMs instead of LXD:
# Add a MAAS machine or manual cloud
juju add-machine ssh:user@gpu-host
# Deploy worker to specific machine
juju deploy ./concourse-ci-machine_ubuntu-22.04-amd64.charm worker \
--to 1 \
--config mode=worker \
--config enable-gpu=true
# Check worker status shows GPU
juju status worker
# Expected: "Worker ready (GPU: 1x NVIDIA)"
# Check Concourse workers list
fly -t <target> workers
# Should show GPU tags: gpu, gpu-type=nvidia, gpu-count=1
# Run a test pipeline with nvidia-smi to verify GPU access
fly -t <target> trigger-job -j <pipeline>/<gpu-job> -w
gpu-driver=amd)gpu-driver=intel)For issues or questions:
juju debug-log --include worker | grep -i gpujuju status worker (should show GPU count)fly -t <target> workers (should show GPU tags)