Configure GPU Workers
Enable NVIDIA CUDA or AMD ROCm GPU support on workers
Enable NVIDIA GPU
# Deploy with CUDA enabled
juju deploy concourse-ci-machine --channel edge worker \
--config mode=worker \
--config compute-runtime=cuda
# Or enable on existing worker
juju config worker compute-runtime=cuda
Add GPU to LXC Container
# Find container name (look for "juju-" prefix)
lxc list
# Add GPU device (replace with your actual container name)
lxc config device add juju-abc123-0 gpu0 gpu
# Verify
lxc exec juju-abc123-0 -- nvidia-smi
Enable AMD GPU (ROCm)
# Deploy with ROCm enabled
juju deploy concourse-ci-machine --channel edge worker \
--config mode=worker \
--config compute-runtime=rocm
# Or enable on existing worker
juju config worker compute-runtime=rocm
Add AMD GPU to LXC Container
# Query available GPUs (important for multi-GPU systems)
lxc query /1.0/resources | jq '.gpu.cards[] | {id: .drm.id, driver, driver_version, vendor_id, product_id}'
# Add specific AMD GPU by ID (recommended)
lxc config device add <container-name> gpu1 gpu id=1
# Add /dev/kfd for compute workloads (REQUIRED)
lxc config device add <container-name> kfd unix-char \
source=/dev/kfd \
path=/dev/kfd
# Verify
lxc exec <container-name> -- rocm-smi
⚠️ Multi-GPU systems: Always use
id=N to target specific AMD GPUs when multiple vendors are present. Without ID, all GPUs are passed through causing conflicts.
Disable GPU
juju config worker compute-runtime=none
Configure GPU Device Selection
Control which GPUs are exposed to tasks:
# Expose all GPUs (default)
juju config worker gpu-device-ids=all
# Expose specific GPUs
juju config worker gpu-device-ids="0,1"
Verify GPU Configuration
# Check worker status
juju status worker
# Should show: "Worker ready (GPU: 1x NVIDIA)" or "Worker ready (GPU: 1x AMD)"
# Check Concourse CI worker tags
juju ssh web/0
fly -t local workers
# Should show tags: cuda (or rocm), gpu-count=1
Test GPU in Pipeline
NVIDIA Test
jobs:
- name: test-nvidia
plan:
- task: gpu-test
tags: [cuda]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: nvidia/cuda
tag: 13.1.0-base-ubuntu24.04
run:
path: nvidia-smi
AMD Test
jobs:
- name: test-amd
plan:
- task: gpu-test
tags: [rocm]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: rocm/dev-ubuntu-24.04
tag: latest
run:
path: rocm-smi
Common Issues
NVIDIA: "GPU enabled but no GPU detected"
# Check host has GPU
nvidia-smi
# Check LXC device
lxc config device show <container-name>
# Check inside container
lxc exec <container-name> -- nvidia-smi
AMD: "CUDA (ROCm) available: False" in PyTorch
# 1. Verify /dev/kfd exists
lxc exec <container-name> -- ls -la /dev/kfd
# 2. If missing, add it
lxc config device add <container-name> kfd unix-char \
source=/dev/kfd path=/dev/kfd
# 3. For integrated GPUs (Phoenix1/gfx1103), use override
# In your pipeline:
export HSA_OVERRIDE_GFX_VERSION=11.0.0
GPU Not Showing in Task
- Ensure task uses
tags: [cuda]ortags: [rocm] - Verify GPU-enabled image (nvidia/cuda or rocm/* base)
- Check worker registered:
fly -t local workers
Related Documentation
- GPU Workers Tutorial - Complete walkthrough with examples
- Mount Datasets - Add training data to GPU tasks
- Troubleshooting - Fix common GPU issues