Health & Telemetry

Endpoints for monitoring system status, integration health, and runtime performance.

Quick Reference

Endpoint	Auth	Purpose
`GET /api/v1/ping`	No	Connectivity check
`GET /api/v1/health`	No	Basic health for load balancers
`GET /api/v1/health/system`	Yes	Detailed system status
`GET /api/v1/telemetry`	Yes	Runtime performance metrics

Ping

Simple connectivity check. No authentication required.

GET /api/v1/ping

Response

{
  "message": "pong",
  "timestamp": "2024-01-15T10:30:00Z"
}

Basic Health Check

Minimal status information for load balancers and uptime monitors. No authentication required.

GET /api/v1/health

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "environment": "production",
  "timestamp": "2024-01-15T10:30:00Z"
}

Status Values

Status	HTTP Code	Meaning
`healthy`	200	All systems operational
`degraded`	200	Operational with issues
`unhealthy`	503	Service unavailable

Configure load balancer health checks against /api/v1/health and expect a 200 status. Any 5xx response means the instance should be removed from the pool.

System Health (Detailed)

Full system status including integration health, component metrics, and uptime. Requires authentication.

GET /api/v1/health/system

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "environment": "production",
  "uptime_seconds": 864000.5,
  "system_health": {
    "status": "healthy",
    "initialized": true,
    "started": true,
    "timestamp": "2024-01-15T10:30:00Z",
    "total_apps": 5,
    "active_apps": 5,
    "manager_metrics": {
      "active_instances": 3,
      "total_instances": 5,
      "total_organizations": 2
    },
    "registry_metrics": {
      "total_factories": 12,
      "total_created": 2871,
      "total_errors": 3,
      "cache_hit_rate": 0.94
    },
    "scheduler_metrics": {
      "active_workers": 4,
      "total_executions": 1543,
      "successful_runs": 1540,
      "failed_runs": 3,
      "success_rate": 0.998
    },
    "staging_metrics": {
      "total_processed": 50000,
      "total_failed": 12,
      "total_cleaned_up": 49988
    }
  }
}

System Health Fields

Field	Description
`initialized`	Whether all components started successfully
`started`	Whether the system is actively processing
`total_apps` / `active_apps`	Configured vs. active integrations
`manager_metrics`	Instance and organization counts
`registry_metrics`	Factory/handler performance and cache stats
`scheduler_metrics`	Background job execution stats
`staging_metrics`	Data pipeline processing stats

Kubernetes Probes

livenessProbe:
  httpGet:
    path: /api/v1/health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /api/v1/ping
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Best Practices

Use /health for load balancers — fast and unauthenticated
Use /health/system for alerting — detailed component breakdown
Use /telemetry for dashboards — rich performance data
Poll at 30-second intervals; more frequent polling adds unnecessary load
Set 5-second timeouts for health checks, 10 seconds for telemetry

Error Responses

401 Unauthorized

{
  "code": 401,
  "error": "Unauthorized",
  "details": "Valid API key required for this endpoint"
}

503 Service Unavailable

{
  "code": 503,
  "error": "Service unavailable",
  "details": { "status": "unhealthy", "reason": "Database connection lost" }
}

Telemetry

Detailed runtime performance metrics.

Integrations

Platform sync status.

Health & Telemetry

Telemetry

Integrations

On this page