Concourse CI Machine Charm

Documentation

Monitor with Prometheus

Enable Prometheus metrics endpoint for monitoring

Enable Metrics

Metrics are enabled by default on port 9391:

# Verify metrics are enabled
juju config concourse-ci enable-metrics
# Should show: true

# To disable (not recommended)
juju config concourse-ci enable-metrics=false

Access Metrics Endpoint

# Get web server IP
juju status concourse-ci

# Fetch metrics
curl http://<web-ip>:9391/metrics

Integrate with Prometheus Charm

# Deploy Prometheus (machine charm)
juju deploy prometheus-machine prometheus --channel edge

# Create relation
juju integrate concourse-ci:monitoring prometheus:target

# Verify relation
juju status --relations

Result: Prometheus automatically scrapes metrics from Concourse CI web server.

Note: This guide uses prometheus-machine and grafana-machine charms, which are machine charms compatible with this Concourse CI machine charm deployment.

Key Metrics to Monitor

Worker Metrics

Build Metrics

Database Metrics

Sample Prometheus Queries

# Number of running workers
concourse_workers_running

# Build success rate (last hour)
rate(concourse_builds_finished_total{status="succeeded"}[1h])
/ rate(concourse_builds_finished_total[1h])

# Average build duration
rate(concourse_builds_duration_seconds_sum[5m])
/ rate(concourse_builds_duration_seconds_count[5m])

# Worker utilization
concourse_containers / concourse_workers_running

Set Up Grafana Dashboard

# Deploy Grafana (machine charm)
juju deploy grafana-machine grafana --channel edge

# Integrate with Prometheus
juju integrate grafana:grafana-source prometheus:grafana-source

# Get Grafana URL and credentials
juju run grafana/leader get-admin-password

Import Concourse CI dashboard from Grafana.com.

Alert Rules Example

groups:
  - name: concourse
    interval: 30s
    rules:
      - alert: ConcourseNoWorkersRunning
        expr: concourse_workers_running == 0
        for: 5m
        annotations:
          summary: "No Concourse workers running"
      
      - alert: ConcourseStalledWorkers
        expr: concourse_workers_stalled > 0
        for: 10m
        annotations:
          summary: "{{ $value }} stalled workers detected"
      
      - alert: ConcourseHighBuildDuration
        expr: rate(concourse_builds_duration_seconds_sum[5m]) > 300
        for: 15m
        annotations:
          summary: "Average build duration exceeds 5 minutes"

Verify Metrics Working

# Check Prometheus targets
# Open Prometheus UI: http://<prometheus-ip>:9090/targets
# Look for concourse-ci endpoint with status "UP"

# Run test query in Prometheus UI
concourse_workers_running
✅ Success: Query returns current number of workers (should match fly workers output).

Related Documentation