Health & Telemetry
System health checks, uptime monitoring, and runtime metrics
Health & Telemetry
Endpoints for monitoring system status, integration health, and runtime performance.
Quick Reference
| Endpoint | Auth | Purpose |
|---|---|---|
GET /api/v1/ping | No | Connectivity check |
GET /api/v1/health | No | Basic health for load balancers |
GET /api/v1/health/system | Yes | Detailed system status |
GET /api/v1/telemetry | Yes | Runtime performance metrics |
Ping
Simple connectivity check. No authentication required.
GET /api/v1/pingResponse
{
"message": "pong",
"timestamp": "2024-01-15T10:30:00Z"
}Basic Health Check
Minimal status information for load balancers and uptime monitors. No authentication required.
GET /api/v1/healthResponse
{
"status": "healthy",
"version": "1.0.0",
"environment": "production",
"timestamp": "2024-01-15T10:30:00Z"
}Status Values
| Status | HTTP Code | Meaning |
|---|---|---|
healthy | 200 | All systems operational |
degraded | 200 | Operational with issues |
unhealthy | 503 | Service unavailable |
Configure load balancer health checks against /api/v1/health and expect a 200 status. Any 5xx response means the instance should be removed from the pool.
System Health (Detailed)
Full system status including integration health, component metrics, and uptime. Requires authentication.
GET /api/v1/health/systemResponse
{
"status": "healthy",
"version": "1.0.0",
"environment": "production",
"uptime_seconds": 864000.5,
"system_health": {
"status": "healthy",
"initialized": true,
"started": true,
"timestamp": "2024-01-15T10:30:00Z",
"total_apps": 5,
"active_apps": 5,
"manager_metrics": {
"active_instances": 3,
"total_instances": 5,
"total_organizations": 2
},
"registry_metrics": {
"total_factories": 12,
"total_created": 2871,
"total_errors": 3,
"cache_hit_rate": 0.94
},
"scheduler_metrics": {
"active_workers": 4,
"total_executions": 1543,
"successful_runs": 1540,
"failed_runs": 3,
"success_rate": 0.998
},
"staging_metrics": {
"total_processed": 50000,
"total_failed": 12,
"total_cleaned_up": 49988
}
}
}System Health Fields
| Field | Description |
|---|---|
initialized | Whether all components started successfully |
started | Whether the system is actively processing |
total_apps / active_apps | Configured vs. active integrations |
manager_metrics | Instance and organization counts |
registry_metrics | Factory/handler performance and cache stats |
scheduler_metrics | Background job execution stats |
staging_metrics | Data pipeline processing stats |
Kubernetes Probes
livenessProbe:
httpGet:
path: /api/v1/health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/v1/ping
port: 8080
initialDelaySeconds: 5
periodSeconds: 10Best Practices
- Use
/healthfor load balancers — fast and unauthenticated - Use
/health/systemfor alerting — detailed component breakdown - Use
/telemetryfor dashboards — rich performance data - Poll at 30-second intervals; more frequent polling adds unnecessary load
- Set 5-second timeouts for health checks, 10 seconds for telemetry
Error Responses
401 Unauthorized
{
"code": 401,
"error": "Unauthorized",
"details": "Valid API key required for this endpoint"
}503 Service Unavailable
{
"code": 503,
"error": "Service unavailable",
"details": { "status": "unhealthy", "reason": "Database connection lost" }
}