Introduction
Every edge deployment needs observability. The Pyvorin Edge Agent exposes two built-in HTTP
endpoints—/health and /metrics—that provide a real-time view of
pipeline health, resource utilisation, and cloud sync state. This article explains the
structure of both endpoints, shows how to convert the JSON output into Prometheus exposition
format, provides a complete Grafana dashboard, and supplies ready-to-use Alertmanager rules.
The /health Endpoint
The /health endpoint is served by _HealthHandler in
edge_runtime/pyv_edge_agent/main.py. It returns a single JSON document with
nested objects for agent state, system metrics, cloud queue depth, privacy configuration,
and ingest adapters.
curl -s http://localhost:8080/health | python3 -m json.tool
A typical response looks like this:
{
"status": "healthy",
"timestamp": 1717000000.0,
"metrics": {
"cpu_percent": 12.5,
"ram_percent": 34.0,
"disk_percent": 45.2,
"thermal_celsius": 42.0,
"uptime_seconds": 86400.0,
"timestamp": 1717000000.0
},
"agent": {
"running": true,
"buffer_count": 4,
"readings_processed": 150000,
"events_triggered": 23
},
"cloud": {
"queue_depth": 12,
"last_flush_time": 1716999900.0,
"messages_sent_today": 1440,
"endpoint": "https://api.pyvorin.com/v1/ingest"
},
"privacy": {
"enabled": true,
"rules_active": 3,
"fields_redacted": ["patient_id"],
"fields_hashed": ["device_uuid"]
},
"ingest": {
"adapters_connected": ["simulator", "mqtt"],
"devices_configured": 4
}
}
| Key | Source | Description |
|---|---|---|
status | EdgeAgent.is_running | "healthy" if the agent loop is active. |
metrics | SystemMetrics.to_dict() | CPU, RAM, disk, thermal, and uptime. |
agent.buffer_count | len(self._buffers) | Number of active ring buffers. |
agent.readings_processed | self._readings_processed | Lifetime counter of ingested readings. |
agent.events_triggered | self._events_triggered | Lifetime counter of fired rule events. |
cloud.queue_depth | CloudSyncQueue.pending_count() | Items waiting for upstream upload. |
cloud.messages_sent_today | self._cloud.messages_sent_today | Daily egress counter (resets at midnight). |
privacy.rules_active | len(self._privacy.rules) | Number of privacy rules currently loaded. |
ingest.adapters_connected | self._adapter_types.values() | List of active adapter type names. |
The /metrics Endpoint
The /metrics endpoint returns the raw output of
SystemMetrics().to_dict() from
edge_runtime/pyv_edge_agent/health_monitor/metrics.py. This is the lowest-overhead
way to pull system telemetry because it bypasses the agent state object entirely.
curl -s http://localhost:8080/metrics | python3 -m json.tool
{
"cpu_percent": 12.5,
"ram_percent": 34.0,
"disk_percent": 45.2,
"thermal_celsius": 42.0,
"uptime_seconds": 86400.0,
"timestamp": 1717000000.0
}
Prometheus Metrics Export Format
Prometheus does not natively understand JSON. You need a small bridge script that polls
/metrics and translates the dictionary into the Prometheus text exposition format.
The script below can be run as a sidecar or cron job.
#!/usr/bin/env python3
"""Prometheus bridge for Pyvorin Edge /metrics."""
import json
import urllib.request
from pathlib import Path
METRICS_URL = "http://localhost:8080/metrics"
OUTPUT_PATH = Path("/var/lib/node_exporter/textfile_collector/pyvorin_edge.prom")
PROM_TEMPLATE = """\
# HELP pyvorin_edge_cpu_percent CPU utilisation percentage.
# TYPE pyvorin_edge_cpu_percent gauge
pyvorin_edge_cpu_percent {cpu_percent}
# HELP pyvorin_edge_ram_percent RAM utilisation percentage.
# TYPE pyvorin_edge_ram_percent gauge
pyvorin_edge_ram_percent {ram_percent}
# HELP pyvorin_edge_disk_percent Disk utilisation percentage.
# TYPE pyvorin_edge_disk_percent gauge
pyvorin_edge_disk_percent {disk_percent}
# HELP pyvorin_edge_thermal_celsius SoC temperature in Celsius.
# TYPE pyvorin_edge_thermal_celsius gauge
pyvorin_edge_thermal_celsius {thermal_celsius}
# HELP pyvorin_edge_uptime_seconds System uptime in seconds.
# TYPE pyvorin_edge_uptime_seconds counter
pyvorin_edge_uptime_seconds {uptime_seconds}
"""
def fetch():
with urllib.request.urlopen(METRICS_URL, timeout=5) as resp:
return json.loads(resp.read().decode("utf-8"))
def write_prom(data: dict):
OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(OUTPUT_PATH, "w", encoding="utf-8") as f:
f.write(PROM_TEMPLATE.format(
cpu_percent=data.get("cpu_percent", 0.0),
ram_percent=data.get("ram_percent", 0.0),
disk_percent=data.get("disk_percent", 0.0),
thermal_celsius=data.get("thermal_celsius", 0.0),
uptime_seconds=data.get("uptime_seconds", 0.0),
))
if __name__ == "__main__":
write_prom(fetch())
Complete Grafana Dashboard JSON
Import the following dashboard into Grafana. It assumes Prometheus is scraping the textfile
metrics above, plus a second job that hits /health and exposes
pyvorin_edge_queue_depth via a similar bridge.
{
"dashboard": {
"id": null,
"title": "Pyvorin Edge Health",
"tags": ["edge", "pyvorin"],
"timezone": "utc",
"panels": [
{
"id": 1,
"title": "CPU %",
"type": "stat",
"targets": [
{
"expr": "pyvorin_edge_cpu_percent",
"legendFormat": "CPU"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
},
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
},
{
"id": 2,
"title": "RAM %",
"type": "stat",
"targets": [
{
"expr": "pyvorin_edge_ram_percent",
"legendFormat": "RAM"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
},
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0}
},
{
"id": 3,
"title": "SoC Temperature",
"type": "stat",
"targets": [
{
"expr": "pyvorin_edge_thermal_celsius",
"legendFormat": "°C"
}
],
"fieldConfig": {
"defaults": {
"unit": "celsius",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 60},
{"color": "red", "value": 75}
]
}
}
},
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0}
},
{
"id": 4,
"title": "Cloud Queue Depth",
"type": "timeseries",
"targets": [
{
"expr": "pyvorin_edge_queue_depth",
"legendFormat": "Pending items"
}
],
"fieldConfig": {
"defaults": {
"custom": {"drawStyle": "line"}
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
},
{
"id": 5,
"title": "Disk Usage %",
"type": "gauge",
"targets": [
{
"expr": "pyvorin_edge_disk_percent",
"legendFormat": "Disk"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 85}
]
}
}
},
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
}
]
}
}
Alertmanager Rules
The following Prometheus Alertmanager rules trigger on resource exhaustion, thermal throttling
risk, and cloud sync backlog. Save them as /etc/prometheus/alerts/pyvorin_edge.yml.
groups:
- name: pyvorin_edge
rules:
- alert: EdgeHighCPU
expr: pyvorin_edge_cpu_percent > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU has been above 85% for more than 5 minutes."
- alert: EdgeHighRAM
expr: pyvorin_edge_ram_percent > 90
for: 2m
labels:
severity: critical
annotations:
summary: "High RAM on {{ $labels.instance }}"
description: "RAM usage is above 90%. OOM kills are likely."
- alert: EdgeHighThermal
expr: pyvorin_edge_thermal_celsius > 75
for: 1m
labels:
severity: critical
annotations:
summary: "Thermal throttling risk on {{ $labels.instance }}"
description: "SoC temperature is above 75 °C. Performance will degrade."
- alert: EdgeDiskFull
expr: pyvorin_edge_disk_percent > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Disk filling on {{ $labels.instance }}"
description: "Disk usage is above 85%. SQLite WAL may fail to grow."
- alert: EdgeSyncBacklog
expr: pyvorin_edge_queue_depth > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "Cloud sync backlog on {{ $labels.instance }}"
description: "More than 1000 items are queued. Check connectivity."
SystemMetrics API Usage
If you need to collect metrics inside your own Python script rather than via HTTP, use the
SystemMetrics class directly.
from pyv_edge_agent.health_monitor.metrics import SystemMetrics, MetricsSnapshot
metrics = SystemMetrics()
# Individual accessors
print(f"CPU: {metrics.cpu_percent():.1f}%")
print(f"RAM: {metrics.ram_percent():.1f}%")
print(f"Disk: {metrics.disk_percent('/var/lib/pyvorin'):.1f}%")
print(f"Thermal: {metrics.thermal_celsius()}°C")
print(f"Uptime: {metrics.uptime_seconds():.0f}s")
# Full snapshot
snapshot: MetricsSnapshot = metrics.snapshot()
print(snapshot.to_dict())
Summary
You now have full visibility into the Edge Agent's health. The /health endpoint
gives you operational state, /metrics gives you system telemetry, the Prometheus
bridge converts JSON into scrapable text format, and the Grafana dashboard plus Alertmanager
rules turn raw numbers into actionable alerts.