Troubleshooting
Fix common issues with Concourse CI Machine Charm
Charm Shows "Blocked" Status
Cause: Usually means PostgreSQL relation is missing (for web units).
# Fix: Create PostgreSQL relation
juju integrate concourse-ci:postgresql postgresql:database
Web Server Won't Start
Check Logs
# View charm logs
juju debug-log --include concourse-ci/0 --replay --no-tail | tail -50
# SSH to unit and check systemd
juju ssh concourse-ci/0
sudo journalctl -u concourse-server -f
Common Causes
- Database not configured: Check PostgreSQL relation exists
- Auth configuration missing:
sudo cat /var/lib/concourse/config.env - Port already in use: Change
web-portconfig
Workers Not Connecting
# Check worker status
juju ssh concourse-ci/1 # Worker unit
sudo systemctl status concourse-worker
sudo journalctl -u concourse-worker -f
Common Causes
- TSA keys not generated: Check
/var/lib/concourse/keys/ - Containerd not running:
sudo systemctl status containerd - Network connectivity: Ensure workers can reach web server
GPU Not Detected
NVIDIA GPU
# Check GPU on host
nvidia-smi
# Check LXC device added
lxc config device show <container-name>
# Check inside container
lxc exec <container-name> -- nvidia-smi
# If missing, add GPU device
lxc config device add <container-name> gpu0 gpu
AMD GPU
# Verify GPU device AND /dev/kfd exist
lxc exec <container-name> -- ls -la /dev/dri/
lxc exec <container-name> -- ls -la /dev/kfd
# Add missing /dev/kfd (REQUIRED for ROCm compute)
lxc config device add <container-name> kfd unix-char \
source=/dev/kfd path=/dev/kfd
# For integrated GPUs, use override in pipeline
export HSA_OVERRIDE_GFX_VERSION=11.0.0
Shared Storage Issues
"Waiting for shared storage mount"
# Run setup script to mount storage
./scripts/setup-shared-storage.sh <app-name> /path/to/shared
# Verify mount exists
juju ssh <unit> -- mount | grep concourse
Permission Denied in Shared Storage
# Ensure LXC device uses shift=true for UID/GID mapping
lxc config device show <container-name>
# Should show: shift: "true"
Tasks Failing to Start
Missing Image Resource
This charm uses containerd runtime. All tasks must include an image_resource:
# ❌ Wrong - no image_resource
config:
platform: linux
run:
path: echo
args: ["hello"]
# ✅ Correct - with image_resource
config:
platform: linux
image_resource:
type: registry-image
source:
repository: busybox
run:
path: echo
args: ["hello"]
Container Pull Failures
# Check containerd status
juju ssh <worker-unit>
sudo systemctl status containerd
# Check DNS configuration
cat /etc/resolv.conf
# Enable containerd DNS proxy if needed
juju config <app> containerd-dns-proxy-enable=true
juju config <app> containerd-dns-server="8.8.8.8,1.1.1.1"
Folder Mounts Not Working
Folders Not Visible in Tasks
# 1. Path MUST be under /srv
# ✅ Correct
path=/srv/datasets
# ❌ Wrong - not under /srv
path=/mnt/datasets
# 2. Verify LXC mount exists
lxc config device show <container-name>
lxc exec <container-name> -- ls -la /srv/
Cannot Write to Writable Folder
# Folder name MUST end with _writable or _rw
# ✅ Correct
path=/srv/outputs_writable
# ❌ Wrong - missing suffix
path=/srv/outputs
# LXC device must not be readonly
lxc config device show <container-name>
# Should NOT show: readonly: "true" for writable folders
Upgrade Failed
# Rollback to previous version
juju config concourse-ci version=<previous-version>
# Check error logs
juju debug-log --include concourse-ci --replay | grep -i error
# Verify database connectivity
juju ssh concourse-ci/0
sudo journalctl -u concourse-server | grep -i database
View Configuration
# Check all config values
juju config concourse-ci
# View runtime config file
juju ssh concourse-ci/0
sudo cat /var/lib/concourse/config.env
Check Service Status
# Web server
juju ssh concourse-ci/0
sudo systemctl status concourse-server
sudo journalctl -u concourse-server -n 100
# Worker
juju ssh concourse-ci/1
sudo systemctl status concourse-worker
sudo journalctl -u concourse-worker -n 100
# Containerd (on workers)
sudo systemctl status containerd
Database Connection Issues
# Verify PostgreSQL relation
juju status --relations
# Check database credentials (stored in Juju secrets)
juju ssh concourse-ci/0
sudo cat /var/lib/concourse/config.env | grep DATABASE
# Test connection manually
sudo -u concourse psql <connection-string>
Get Help
If issues persist:
- GitHub Issues: Report bugs
- Logs to include:
juju debug-log --include <app> --replay --no-tailjuju status --relations --storagejuju config <app>
Related Documentation
- Architecture Reference - Understand system components
- Configuration Reference - All config options
- Deployment Guide - Start fresh