Linkit

Health & Telemetry

System health checks, uptime monitoring, and runtime metrics

Health & Telemetry

Endpoints for monitoring system status, integration health, and runtime performance.


Quick Reference

EndpointAuthPurpose
GET /api/v1/pingNoConnectivity check
GET /api/v1/healthNoBasic health for load balancers
GET /api/v1/health/systemYesDetailed system status
GET /api/v1/telemetryYesRuntime performance metrics

Ping

Simple connectivity check. No authentication required.

GET /api/v1/ping

Response

{
  "message": "pong",
  "timestamp": "2024-01-15T10:30:00Z"
}

Basic Health Check

Minimal status information for load balancers and uptime monitors. No authentication required.

GET /api/v1/health

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "environment": "production",
  "timestamp": "2024-01-15T10:30:00Z"
}

Status Values

StatusHTTP CodeMeaning
healthy200All systems operational
degraded200Operational with issues
unhealthy503Service unavailable

Configure load balancer health checks against /api/v1/health and expect a 200 status. Any 5xx response means the instance should be removed from the pool.


System Health (Detailed)

Full system status including integration health, component metrics, and uptime. Requires authentication.

GET /api/v1/health/system

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "environment": "production",
  "uptime_seconds": 864000.5,
  "system_health": {
    "status": "healthy",
    "initialized": true,
    "started": true,
    "timestamp": "2024-01-15T10:30:00Z",
    "total_apps": 5,
    "active_apps": 5,
    "manager_metrics": {
      "active_instances": 3,
      "total_instances": 5,
      "total_organizations": 2
    },
    "registry_metrics": {
      "total_factories": 12,
      "total_created": 2871,
      "total_errors": 3,
      "cache_hit_rate": 0.94
    },
    "scheduler_metrics": {
      "active_workers": 4,
      "total_executions": 1543,
      "successful_runs": 1540,
      "failed_runs": 3,
      "success_rate": 0.998
    },
    "staging_metrics": {
      "total_processed": 50000,
      "total_failed": 12,
      "total_cleaned_up": 49988
    }
  }
}

System Health Fields

FieldDescription
initializedWhether all components started successfully
startedWhether the system is actively processing
total_apps / active_appsConfigured vs. active integrations
manager_metricsInstance and organization counts
registry_metricsFactory/handler performance and cache stats
scheduler_metricsBackground job execution stats
staging_metricsData pipeline processing stats

Kubernetes Probes

livenessProbe:
  httpGet:
    path: /api/v1/health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /api/v1/ping
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Best Practices

  • Use /health for load balancers — fast and unauthenticated
  • Use /health/system for alerting — detailed component breakdown
  • Use /telemetry for dashboards — rich performance data
  • Poll at 30-second intervals; more frequent polling adds unnecessary load
  • Set 5-second timeouts for health checks, 10 seconds for telemetry

Error Responses

401 Unauthorized

{
  "code": 401,
  "error": "Unauthorized",
  "details": "Valid API key required for this endpoint"
}

503 Service Unavailable

{
  "code": 503,
  "error": "Service unavailable",
  "details": { "status": "unhealthy", "reason": "Database connection lost" }
}