Concourse CI Machine Charm - Documentation

Understanding Container Runtime

Why containerd (not Docker), why workers run as root, and the container isolation model

Why Containerd, Not Docker?

The charm uses containerd as the container runtime, not Docker. This is a deliberate choice made by Concourse CI upstream, not this charm. Understanding why requires understanding the container ecosystem evolution:

The Container Stack

┌────────────────────────┐
│  Concourse Worker      │ ← Our charm installs this
├────────────────────────┤
│  containerd            │ ← Container runtime (charm installs)
├────────────────────────┤
│  runc                  │ ← OCI runtime (spawns containers)
├────────────────────────┤
│  Linux Kernel          │ ← Namespaces, cgroups
│  (namespaces, cgroups) │
└────────────────────────┘

Docker, by contrast, adds additional layers:

┌────────────────────────┐
│  docker CLI            │ ← User-facing tool
├────────────────────────┤
│  dockerd               │ ← Docker daemon (image management, networking)
├────────────────────────┤
│  containerd            │ ← Actually runs containers
├────────────────────────┤
│  runc                  │ ← OCI runtime
└────────────────────────┘

Why Containerd is Better for Concourse

Requirement containerd Docker
Lightweight ✅ ~50MB binary, minimal daemon ❌ ~200MB, heavyweight daemon with many features Concourse doesn't need
OCI-compliant ✅ Native OCI support ✅ Via containerd backend
No unnecessary features ✅ Just container lifecycle management ❌ Docker Swarm, Docker Compose, legacy image formats
Kubernetes-compatible ✅ Used by Kubernetes since 1.20 ❌ Deprecated as K8s runtime
Direct control ✅ Concourse talks directly to containerd ❌ Extra layer of indirection
💡 Industry Trend: Kubernetes deprecated Docker runtime support in v1.20 (2020) and removed it entirely in v1.24 (2022), standardizing on containerd. Concourse CI follows the same principle—use the simplest, most direct runtime possible.

What Concourse Doesn't Need from Docker

Why Workers Run as Root

The concourse-worker systemd service runs as root, not an unprivileged user. This seems surprising given security best practices, but it's necessary for several reasons:

Technical Requirements

Capability Needed Why Root is Required
Create user namespaces Containers need isolated UID/GID spaces. Requires CAP_SYS_ADMIN.
Mount filesystems Task containers need bind mounts for caches, inputs, outputs. Requires CAP_SYS_ADMIN.
Manage cgroups Resource limits (CPU, memory) enforced via cgroups. Requires root or CAP_SYS_ADMIN.
Network namespaces Isolated networking per container. Requires CAP_NET_ADMIN.
Device access GPUs, block devices need /dev access. Requires root or device ownership.

The Rootless Containers Myth

You might have heard of "rootless Docker" or "rootless Podman." These technologies allow launching containers without root, but with significant trade-offs:

Feature Rootless Mode Impact on Concourse
Network modes ❌ No bridge networking Concourse tasks need network isolation
Port binding <1024 ❌ Privileged ports blocked Some tasks need to bind port 80/443
Cgroup limits ⚠️ Limited enforcement Can't reliably limit task resources
GPU passthrough ❌ No device access GPU workers would be impossible
Overlay filesystems ⚠️ Fuse-overlay (slow) Performance degradation for image layers

Verdict: Rootless mode sacrifices too many features Concourse relies on. The security benefits don't outweigh the functional limitations.

Security Mitigations

Running as root doesn't mean "no security." The charm implements several layers of protection:

  1. Container isolation: Task containers run in namespaces with limited capabilities
  2. AppArmor/SELinux profiles: Kernel-level MAC (Mandatory Access Control)
  3. Seccomp filters: Restrict syscalls available to containers
  4. Network policies: Firewall rules limit worker attack surface
  5. Read-only root filesystem: Worker binary directories mounted read-only
⚠️ Attack Surface: If a worker host is compromised (root access gained), an attacker can access all containers on that worker. Best practice: Run workers in isolated VMs or containers, not directly on sensitive hosts.

Container Isolation Model

Concourse uses Linux namespaces and cgroups to isolate task containers. Understanding this model explains what tasks can and cannot do:

Namespace Isolation

Namespace What's Isolated Concourse Usage
PID Process IDs (containers see own PID 1) Tasks can't see other task processes
Mount Filesystem mounts Each task has own rootfs from image
Network Network stack (IP, routes, firewall) Tasks have isolated networking (bridge mode)
UTS Hostname and domain name Each task has unique hostname
IPC Inter-process communication (shared memory, semaphores) Tasks can't IPC with other tasks
User UID/GID mappings Task UID 0 maps to host UID 100000+ (non-privileged)

What Tasks CAN Do

What Tasks CANNOT Do

Privileged Containers: The Exception

Concourse supports privileged containers via the privileged: true flag in task configs. This disables most isolation:

task: build-docker-image
privileged: true  # ⚠️ Dangerous!
config:
  platform: linux
  image_resource:
    type: registry-image
    source: {repository: docker}
  run:
    path: docker
    args: [build, -t, myimage, .]

What Privileged Mode Grants

Access Security Impact
All Linux capabilities ❌ Container can load kernel modules, change network config
Host device access ❌ Can access /dev/sda (host disks), /dev/mem (physical memory)
Apparmor/SELinux bypass ❌ MAC policies not enforced
Cgroup manipulation ❌ Can escape resource limits
⚠️ Security Warning: Privileged containers can escape to the host. Only use for tasks that absolutely require it (Docker-in-Docker builds, kernel testing). Never run untrusted code in privileged mode.

When Privileged Mode is Necessary

Containerd Configuration in the Charm

The charm configures containerd with Concourse-specific settings:

/etc/containerd/config.toml

# Snapshotter for image layers
[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"  # Faster than fuse-overlay

# DNS configuration
[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.k8s.io/pause:3.10"
[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/opt/cni/bin"
  conf_dir = "/etc/cni/net.d"

# GPU support (when compute-runtime=cuda)
[plugins."io.containerd.grpc.v1.cri".containerd]
  default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
  runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
    BinaryName = "/usr/bin/nvidia-container-runtime"

Key Configuration Choices

Comparison with Other CI Systems

CI System Container Runtime Isolation Model
Concourse CI containerd + runc Full namespace isolation, user namespacing
GitLab Runner Docker (default) or Kubernetes Docker-in-Docker or Kubernetes pods
Jenkins Docker plugin (optional) Varies (can run without containers)
GitHub Actions Docker (self-hosted) or VM (cloud) Full VMs for cloud runners
Drone CI Docker Docker containers

Concourse's advantage: By using containerd directly, Concourse avoids Docker's overhead while maintaining strong isolation. This makes workers lighter and more efficient.

LXD Compatibility: Nested Containers

When workers run inside LXD containers (common for Juju localhost deployments), we have nested containerization:

Host (bare metal)
  ↓
LXD Container (Juju unit)
  ↓
containerd (Concourse worker)
  ↓
Task Container (Concourse task)

This works because:

Automatic Configuration: When deploying to LXD (localhost cloud), Juju automatically sets security.nesting=true. No manual LXD configuration needed.

Related Topics