Concourse CI Machine Charm - Documentation

Understanding Key Distribution

Why TSA uses SSH keys, how peer and TSA relations work, and the security model

The Challenge: Workers Must Connect to Web

Concourse CI's architecture is fundamentally distributed: workers execute tasks remotely while the web server (ATC) schedules and coordinates. But how do workers authenticate to the web server securely?

Traditional approaches have problems:

Concourse chose a simpler, more elegant solution: SSH public key authentication.

Why SSH Keys?

SSH keys provide several advantages for Concourse's worker-to-web connection:

Advantage Why It Matters
Asymmetric cryptography Public key can be freely distributed; private key never leaves the worker
Per-worker identity Each worker has unique keypair, enabling granular access control
Battle-tested protocol SSH is mature, well-understood, widely supported
No external dependencies No need for external CA, LDAP, or OAuth providers
Secure key exchange SSH host key verification prevents man-in-the-middle attacks

TSA: The "SSH Gateway" for Workers

TSA (Transportation Security Administration) is Concourse's SSH server component. It runs on the web server and acts as the authentication gateway for workers:

Worker Initiates Connection πŸ”‘ Worker Private Key πŸ” TSA Host Key (stored) TSA (Web) SSH Gateway πŸ”‘ TSA Host Key πŸ” Authorized Worker Keys ATC (Scheduler) SSH over TCP Worker authenticates with public key ↩ Reverse SSH Tunnel (ATC sends commands via tunnel) Controls πŸ’‘ Workers initiate outbound connection β†’ Firewall-friendly!

The TSA doesn't just authenticate workersβ€”it also establishes a reverse SSH tunnel that ATC uses to send commands back to workers.

πŸ’‘ Key Insight: Workers initiate the SSH connection to TSA (outbound from worker's perspective). The web server never needs to connect to workersβ€”it uses the reverse tunnel. This makes firewall configuration much simpler.

The Three Types of Keys

The charm manages three categories of SSH keys:

Key Type Purpose Owner Distribution
TSA Host Key Identifies the TSA server (prevents MITM attacks) Web/Leader Shared with all workers
TSA Public Key Used by TSA to verify worker connections Web/Leader Shared with all workers
Worker Private/Public Key Worker authenticates itself to TSA Each worker Public key sent to web, private stays on worker

Key Storage Locations

# On web/leader unit:
/var/lib/concourse/keys/
β”œβ”€β”€ tsa_host_key          # TSA's SSH server private key
β”œβ”€β”€ tsa_host_key.pub      # TSA's SSH server public key
β”œβ”€β”€ session_signing_key   # For signing session tokens
└── authorized_worker_keys  # Worker public keys (authorized_keys format)

# On worker unit:
/var/lib/concourse/keys/
β”œβ”€β”€ worker_key            # Worker's SSH private key
β”œβ”€β”€ worker_key.pub        # Worker's SSH public key
└── tsa_host_key.pub      # TSA's public key (for host verification)

Distribution Mechanisms: Peer vs TSA Relations

The charm uses two different Juju relations to distribute keys, depending on the deployment mode:

Peer Relation (mode=auto)

In mode=auto deployments, all units belong to the same Juju application. They automatically share data via the peer relation:

mode=auto: Peer Relation (Automatic Key Distribution) concourse-ci/0 (Leader = Web Server) βœ… Generates: β€’ TSA host key β€’ TSA public key β€’ Session signing key Juju Peer Relation (Encrypted by Juju) Stores keys concourse-ci/1 (Worker) Reads TSA keys from peer data concourse-ci/2 (Worker) Reads TSA keys from peer data concourse-ci/3 (Worker) Reads TSA keys from peer data Auto-distributed

How it works:

  1. Leader generates TSA keys on first install
  2. Leader stores keys in peer relation data (encrypted by Juju)
  3. Workers read keys from peer relation automatically
  4. Workers generate their own worker keypair
  5. Workers store their public key in peer relation
  6. Leader collects all worker public keys and writes to authorized_worker_keys
Zero Configuration: In mode=auto, all key distribution is automatic. No manual key copying, no configuration needed. Just deploy and scale.

TSA + Flight Relations (mode=web + mode=worker)

When deploying separate web and worker applications, they use explicit tsa and flight relations:

mode=web + mode=worker: TSA/Flight Relations (Explicit) web/0 (Web Server Application) πŸ“€ Provides: tsa relation β€’ TSA host key β€’ TSA public key β€’ TSA endpoint (IP:2222) juju integrate web:tsa ↔ worker:flight worker/0 (Worker) πŸ“₯ Requires: flight Sends worker public key ↑ worker/1 (Worker) πŸ“₯ Requires: flight Sends worker public key ↑ worker/2 (Worker) πŸ“₯ Requires: flight Sends worker public key ↑

How it works:

  1. Web application generates TSA keys on install
  2. When tsa relation is created, web shares keys with worker application
  3. Worker application receives TSA keys via flight relation
  4. Workers generate their own worker keypair
  5. Workers send their public key back via flight relation
  6. Web application collects worker public keys and updates authorized_worker_keys
πŸ’‘ Benefit of Separate Applications: You can scale workers independently from the web server. Want 10 more workers? juju add-unit worker -n 10β€”keys are exchanged automatically via the existing relation.

Security Model

Threat Model: What We're Protecting Against

Threat Protection Mechanism
Rogue worker registration Worker public key must be in authorized_worker_keys
Man-in-the-middle attack Workers verify TSA host key (prevents impersonation)
Key theft from worker Private keys stored with 0600 permissions, owned by root
Relation data interception Juju encrypts relation data in transit and at rest
Unauthorized task execution ATC controls task assignment; workers cannot self-assign

Why Workers Don't Need Mutual TLS

You might wonder: why not use mutual TLS instead of SSH keys? Several reasons:

Attack Scenario: Compromised Worker

What happens if a worker's private key is compromised?

  1. Attacker gains worker access: Can connect to TSA as that worker
  2. Limited blast radius: Attacker can only execute tasks assigned to that worker (cannot access other workers' tasks)
  3. No ATC access: Worker keys don't grant access to the web UI or API
  4. Revocation: Remove worker's public key from authorized_worker_keys to block access
⚠️ Security Best Practice: Treat worker private keys as sensitive credentials. They're stored as files on disk, not in Juju secrets, so ensure workers run in isolated environments (VMs/containers with restricted access).

Key Rotation

The charm does not currently support automatic key rotation. Keys are generated once on initial deployment and reused indefinitely. To rotate keys:

Rotating TSA Keys (Requires Downtime)

  1. Stop all workers: juju run concourse-ci/1,2,3 -- systemctl stop concourse-worker
  2. SSH to leader: juju ssh concourse-ci/0
  3. Delete old keys: sudo rm /var/lib/concourse/keys/tsa_*
  4. Regenerate: sudo ssh-keygen -t rsa -b 4096 -f /var/lib/concourse/keys/tsa_host_key -N ''
  5. Update peer relation data with new keys (manual Juju operation)
  6. Restart web: sudo systemctl restart concourse-server
  7. Restart workers: juju run concourse-ci/1,2,3 -- systemctl start concourse-worker

Rotating Worker Keys (Per-Worker)

  1. Stop worker: juju ssh concourse-ci/1 -- sudo systemctl stop concourse-worker
  2. Delete old key: sudo rm /var/lib/concourse/keys/worker_key*
  3. Regenerate: sudo ssh-keygen -t rsa -b 4096 -f /var/lib/concourse/keys/worker_key -N ''
  4. Update peer relation with new public key
  5. Restart worker: sudo systemctl start concourse-worker
Future Enhancement: Automated key rotation with zero downtime is planned for a future version. It would involve generating new keys in parallel, updating TSA to accept both old and new keys temporarily, then phasing out old keys.

Debugging Key Issues

Worker Can't Connect to TSA

Symptom: Worker logs show "Permission denied (publickey)"

Check:

# On web/leader, verify worker's public key is authorized:
juju ssh concourse-ci/0
sudo cat /var/lib/concourse/keys/authorized_worker_keys
# Should contain: ssh-rsa AAAA...worker/1's key...

# On worker, verify it has TSA host key:
juju ssh concourse-ci/1
sudo cat /var/lib/concourse/keys/tsa_host_key.pub
# Should match web's tsa_host_key.pub

Worker Shows "Host Key Verification Failed"

Cause: Worker's tsa_host_key.pub doesn't match web's actual host key

Fix:

# Compare keys:
juju ssh concourse-ci/0 -- sudo cat /var/lib/concourse/keys/tsa_host_key.pub
juju ssh concourse-ci/1 -- sudo cat /var/lib/concourse/keys/tsa_host_key.pub
# They must match exactly

Related Topics