Understanding Shared Storage

Why shared storage matters, how it works, and the design trade-offs

The Problem: Binary Duplication

In a typical multi-worker Concourse CI deployment, each worker unit independently downloads and stores the Concourse binaries:

concourse binary: ~100 MB
gdn (Garden runc): ~15 MB
Configuration files: TSA keys, environment configs

For a deployment with 1 web server + 5 workers:

Without shared storage: 6 units × 115 MB = 690 MB total disk usage
With shared storage: 115 MB (62% disk savings!)

More critically, upgrades become slow and wasteful. Each worker must independently download the new version, multiplying network bandwidth usage by the number of workers.

The Solution: LXC Disk Device Passthrough

Shared storage leverages LXC's disk device feature to mount a single host directory into multiple containers simultaneously. This is not NFS, not Ceph, not a distributed filesystem—it's simpler and faster.

How LXC Disk Devices Work

When you run:

lxc config device add juju-abc123-0 concourse-shared disk \
    source=/tmp/concourse-shared \
    path=/var/lib/concourse \
    shift=true

LXC performs these operations:

Bind mount: The host directory /tmp/concourse-shared is bind-mounted into the container at /var/lib/concourse
UID/GID mapping (shift=true): Container UIDs are automatically mapped to avoid permission conflicts
Shared access: Multiple containers can mount the same source directory simultaneously

💡 Key Insight: This is not a network filesystem. All units are on the same physical host, sharing a local directory. Performance is identical to local disk—no network latency, no distributed locking overhead.

Architecture: Leader-Writes, Workers-Read

The charm implements a simple coordination model:

Role	Responsibilities	Writes To
Web/Leader	Downloads binaries, writes keys, manages version marker	`/var/lib/concourse/bin/` `/var/lib/concourse/keys/` `/var/lib/concourse/.installed_version`
Workers	Read binaries, create isolated work directories	`/var/lib/concourse/worker/<unit-name>/`

Why This Model Works

No distributed locking needed: Only the leader writes to bin/ and keys/
Workers are isolated: Each worker creates its own subdirectory under worker/
Version marker acts as signal: Workers check .installed_version to know binaries are ready

Data Flow: Initial Deployment

┌─────────────────┐
│ Web/Leader Unit │
└────────┬────────┘
         │
         ├─ 1. Mount shared storage
         ├─ 2. Acquire exclusive lock (.install.lock)
         ├─ 3. Download binaries to bin/
         ├─ 4. Write .installed_version marker
         └─ 5. Start concourse-server.service
         
┌──────────────┐
│ Worker Units │ (added later)
└──────┬───────┘
       │
       ├─ 1. Mount shared storage (same volume)
       ├─ 2. Check .installed_version (exists!)
       ├─ 3. Verify binaries in bin/
       ├─ 4. Create worker/{unit}/ directory
       └─ 5. Start concourse-worker.service

Key observation: Workers don't download anything. They immediately find ready-to-use binaries and start within seconds.

Upgrade Coordination

Upgrades introduce a challenge: the leader must download new binaries while workers are using the old ones. The charm solves this with a coordinated upgrade protocol via Juju peer relations:

┌─────────────────┐
│ Web/Leader Unit │
└────────┬────────┘
         │
         ├─ 1. Set upgrade-state=prepare in peer relation
         ├─ 2. Wait for worker acknowledgments (2min timeout)
         ├─ 3. Acquire exclusive lock
         ├─ 4. Download new binaries
         ├─ 5. Write new .installed_version
         ├─ 6. Restart concourse-server.service
         └─ 7. Set upgrade-state=complete in peer relation
         
┌──────────────┐
│ Worker Units │
└──────┬───────┘
       │
       ├─ 1. Detect upgrade-state=prepare
       ├─ 2. Stop concourse-worker.service
       ├─ 3. Set upgrade-ready=true in peer relation
       ├─ 4. Poll for upgrade-state=complete (5min timeout)
       └─ 5. Start concourse-worker.service

⚠️ Important: Workers must stop before the leader replaces binaries. Otherwise, workers would crash mid-execution when their binary is replaced.

Trade-offs: When Shared Storage Makes Sense

✅ Advantages

Massive disk savings: 62% reduction in a 6-unit deployment
Fast upgrades: Download once, not N times
Fast worker scaling: New workers start in seconds (no download)
Simplified management: Single source of truth for binaries

❌ Disadvantages

LXC-only: Requires LXC containers on the same host. Not suitable for:
- Bare metal deployments (no container isolation)
- Multi-host deployments (no shared filesystem between hosts)
- Kubernetes (different architecture)
Manual setup required: Admin must run setup-shared-storage.sh script to configure LXC devices
Less flexibility: All workers must use the same Concourse version (can't have mixed versions)
Upgrade downtime: Workers stop during upgrades (though this is typically brief)

When to Use Shared Storage

Scenario	Recommendation
Multi-worker LXD deployment (same host)	✅ Highly Recommended
Frequent Concourse version upgrades	✅ Recommended
Bare metal deployment	❌ Not Supported
Multi-host deployment	❌ Not Supported
Mixed Concourse versions needed	❌ Use mode=worker without shared storage

Why Not NFS or Ceph?

You might wonder: why not use a "real" distributed filesystem like NFS or Ceph? Several reasons:

Overkill for this use case: Shared storage is only beneficial when all units are on the same host. If units are on different hosts, there's no point—they can't share local disk anyway.
Added complexity: NFS/Ceph require separate infrastructure, configuration, and maintenance.
Performance overhead: Network filesystems introduce latency. For frequently-accessed binaries, this adds up.
Single-host optimization: LXC disk devices are optimized for the single-host case (which is exactly our scenario).

Design Philosophy: The charm's shared storage feature is intentionally simple and targeted. It solves a specific problem (binary duplication on single-host LXD deployments) with the simplest possible mechanism. It doesn't attempt to be a general-purpose distributed storage solution.

Permission Handling: The shift=true Parameter

A subtle but critical detail: LXC containers use UID/GID namespacing. A process running as UID 1000 inside the container is actually UID 101000 on the host (with default LXC mapping).

Without shift=true, you'd see permission errors:

# Inside container (UID 1000):
$ touch /var/lib/concourse/test.txt

# On host:
$ ls -l /tmp/concourse-shared/test.txt
-rw-r--r-- 1 101000 101000 0 Feb 4 10:00 test.txt
# UID mismatch causes permission errors when web unit tries to read

The shift=true parameter tells LXC to automatically remap UIDs/GIDs:

Transparent mapping: Container sees UID 1000, host sees UID 101000
No manual chown needed: LXC handles translation automatically
Works across containers: Multiple containers with different UID namespaces can share files

Best Practice: Always use shift=true when setting up shared storage with LXC disk devices. The setup-shared-storage.sh script does this automatically.

Concourse CI Machine Charm - Documentation