The Cloud is Not the Beginning
Most IoT architectures treat the cloud as the first stage of processing: devices ship raw telemetry to AWS IoT Core or Azure IoT Hub, and only then do Lambda functions or Stream Analytics jobs filter, aggregate, and route the data. This pattern is simple to reason about, but it is catastrophically expensive at scale. Every byte of raw data incurs egress charges, ingress charges, storage charges, and compute charges — often multiplied by three for redundancy across availability zones.
Pyvorin Edge inverts this model. It preprocesses, filters, and aggregates data before it ever reaches the cloud boundary. The cloud receives only the events, summaries, and anomalies that have survived the privacy firewall and the reduction pipeline. This article explains how to bridge Pyvorin Edge to AWS IoT Greengrass and Azure IoT Hub, how to configure the MQTT bridge, and how to quantify the cost savings in British pounds.
AWS IoT Greengrass Integration
AWS IoT Greengrass extends AWS cloud services to edge devices, allowing local Lambda execution, ML inference, and secure MQTT communication. Pyvorin Edge does not replace Greengrass; it complements it. The recommended architecture is:
- Pyvorin Edge runs as a local preprocessor on the gateway (e.g., Raspberry Pi 5). It ingests sensor data, applies privacy rules, computes windowed aggregates, and generates events.
- Greengrass core runs on the same device or on a dedicated gateway. It provides device authentication, local secret management, and over-the-air deployment of Lambda functions.
- MQTT bridge forwards the reduced output from Pyvorin Edge to Greengrass's local MQTT broker, which then syncs to AWS IoT Core in the cloud.
from pyv_edge_agent.ingest.mqtt_adapter import MQTTAdapter
from pyv_edge_agent.cloud_sync.uploader import HTTPCloudUploader
# Step 1: Ingest from local sensors via MQTT
adapter = MQTTAdapter(
broker_host="localhost",
broker_port=1883,
topic_subscriptions=["sensors/+/+"],
)
# Step 2: Process through pipeline (privacy, windows, rules)
# ... (pipeline code) ...
# Step 3: Publish reduced output to Greengrass local broker
from paho.mqtt.publish import single
single(
topic="pyvorin/reduced/events",
payload=json.dumps(event.to_dict()),
hostname="localhost",
port=1883,
)
In this architecture, raw sensor MQTT messages never leave the device. Only the privacy-filtered, aggregated events are forwarded to the cloud via Greengrass. The AWS IoT Core rules engine can then route these events to Amazon S3, DynamoDB, or Lambda without ever touching the high-volume raw stream.
Azure IoT Hub Integration
Azure IoT Hub supports device-to-cloud (D2C) messaging via MQTT, AMQP, and HTTPS. For
Pyvorin Edge deployments targeting Azure, the simplest integration is to use the
HTTPCloudUploader pointed at the IoT Hub REST endpoint:
import base64
import hmac
import hashlib
import urllib.parse
import time
def generate_sas_token(uri, key, policy_name, expiry=3600):
ttl = int(time.time()) + expiry
sign_key = f"{uri}\n{ttl}"
signature = base64.b64encode(
hmac.new(base64.b64decode(key), sign_key.encode("utf-8"), hashlib.sha256).digest()
).decode("utf-8")
return f"SharedAccessSignature sr={urllib.parse.quote(uri, safe='')}&sig={urllib.parse.quote(signature, safe='')}&se={ttl}&skn={policy_name}"
hub_name = "pyvorin-hub"
device_id = "pi5-warehouse-07"
uri = f"{hub_name}.azure-devices.net/devices/{device_id}"
token = generate_sas_token(uri, device_key, "device")
uploader = HTTPCloudUploader(
endpoint=f"https://{hub_name}.azure-devices.net/devices/{device_id}/messages/events?api-version=2021-04-12",
api_key=token,
timeout=30.0,
)
The IoT Hub message body is the same JSON batch envelope described in the Custom Endpoints
article. Azure Stream Analytics or Azure Functions can parse the envelope and expand the
items array into individual rows in Azure Data Explorer or SQL Database.
MQTT Bridge Setup
For deployments that prefer MQTT over HTTPS, configure Mosquitto or EMQ X as a bridge between
the local Pyvorin Edge pipeline and the cloud broker. Below is a Mosquitto bridge
configuration that forwards only the pyvorin/reduced/# topic tree while keeping
raw sensor topics (sensors/#) local.
# /etc/mosquitto/conf.d/bridge.conf
connection pyvorin-cloud
address a3xyzabc.iot.eu-west-1.amazonaws.com:8883
bridge_cafile /etc/ssl/certs/AmazonRootCA1.pem
topic pyvorin/reduced/# out 1
# Do NOT bridge raw topics
# topic sensors/# out 1 <-- intentionally omitted
bridge_protocol_version mqttv311
keepalive_interval 60
cleansession false
Key configuration points:
out 1means "bridge outbound with QoS 1." This gives at-least-once delivery without the overhead of QoS 2.cleansession falseensures that if the bridge disconnects, messages are queued locally and forwarded when connectivity returns.- Raw topics are not bridged. This is the entire point of the architecture: the cloud sees only what the edge has explicitly approved.
Cost Comparison: Raw vs Filtered Egress
The financial case for edge filtering is compelling. Consider a mid-sized manufacturing plant with 2,000 sensors polling every second. Each reading is approximately 200 bytes of JSON including metadata. Over a 30-day month, the raw data volume is:
sensors = 2000
polls_per_second = 1
seconds_per_month = 30 * 24 * 3600
bytes_per_reading = 200
raw_bytes = sensors * polls_per_second * seconds_per_month * bytes_per_reading
raw_gb = raw_bytes / (1024 ** 3)
print(f"Raw data per month: {raw_gb:.1f} GB")
# Output: Raw data per month: 964.5 GB
Now apply the Pyvorin Edge pipeline with a typical reduction profile:
- Window aggregation: 1-second raw readings are aggregated into 60-second windows (mean, min, max, count). This reduces volume by 60x.
- Rule-based event extraction: Only windows that breach a threshold generate an event. Assume 2% of windows trigger. This reduces volume by another 50x.
- Privacy filtering: Metadata fields are redacted, reducing average payload size from 200 bytes to 80 bytes.
window_reduction = 60.0
event_reduction = 50.0
payload_shrink = 200 / 80
filtered_gb = raw_gb / (window_reduction * event_reduction) * payload_shrink
print(f"Filtered data per month: {filtered_gb:.3f} GB")
# Output: Filtered data per month: 0.321 GB
The reduction ratio is approximately 3,000:1. Now let us price this on AWS IoT Core and Azure IoT Hub, using published pricing as of early 2024.
AWS IoT Core Pricing (London Region)
- Connectivity: £0.076 per million minutes of connection.
- Messaging: £0.92 per billion messages (for messages up to 128 KB).
- Device Registry & Shadows: negligible at this scale.
For 2,000 devices sending one message per second (raw):
raw_messages = sensors * seconds_per_month
raw_messages_million = raw_messages / 1e6
raw_messaging_cost_gbp = raw_messages_million * 0.92
print(f"Raw messaging cost: £{raw_messaging_cost_gbp:,.2f}")
# Output: Raw messaging cost: £9,548.16
For the filtered stream (one message per minute per sensor, 2% event rate, so effectively one message per 3,000 seconds per sensor):
filtered_messages = sensors * seconds_per_month / (window_reduction * event_reduction)
filtered_messages_million = filtered_messages / 1e6
filtered_messaging_cost_gbp = filtered_messages_million * 0.92
print(f"Filtered messaging cost: £{filtered_messaging_cost_gbp:,.2f}")
# Output: Filtered messaging cost: £3.18
Azure IoT Hub Pricing (UK South, S1 Tier)
- S1 tier: £0.83 per million messages per day (400 messages/day/unit, £16.43/unit/month).
- Messages are metered in 4 KB chunks. A 200-byte message still counts as one 4 KB chunk.
# Raw: 2,000 devices * 86,400 messages/day = 172.8 million messages/day
# Requires 432 S1 units (172.8M / 400K per unit)
raw_units = 172_800_000 / 400_000
raw_cost_gbp = raw_units * 16.43
print(f"Raw IoT Hub cost: £{raw_cost_gbp:,.2f}/month")
# Output: Raw IoT Hub cost: £7,102.26/month
# Filtered: 2,000 devices * 28.8 messages/day = 57,600 messages/day
# Requires 0.144 S1 units — effectively covered by the free tier or a single S1 unit
filtered_units = max(57_600 / 400_000, 1)
filtered_cost_gbp = filtered_units * 16.43
print(f"Filtered IoT Hub cost: £{filtered_cost_gbp:,.2f}/month")
# Output: Filtered IoT Hub cost: £16.43/month
Savings Calculation Summary
| Cost Component | Raw Stream (GBP/month) | Filtered Stream (GBP/month) | Savings |
|---|---|---|---|
| AWS IoT Core Messaging | £9,548 | £3.18 | £9,545 |
| Azure IoT Hub (S1) | £7,102 | £16.43 | £7,086 |
| Cloud Storage (S3/Blob, 3-month retention) | £1,930 | £0.64 | £1,929 |
| Total Estimated | £18,580 | £20.25 | £18,560 |
These figures assume a 2,000-sensor deployment with one-second polling. The savings scale linearly with sensor count and inversely with aggregation window size. A 20,000-sensor deployment would save approximately £185,600 per month — enough to justify a dedicated edge compute cluster purely on cloud egress avoidance.
Operational Best Practices
- Use Greengrass Local Shadows for configuration. Store privacy policy versions and aggregation window sizes in AWS IoT Device Shadow documents. The edge device can update its pipeline parameters without a full redeployment.
- Route anomalies to SNS/SQS immediately. While routine summaries can be
batched and uploaded hourly, anomaly events should bypass the batch queue and trigger an
MQTT publish with
retain=Trueso that the cloud rules engine reacts within seconds. - Monitor cloud-side DLQs. If your edge filter is too aggressive, you may drop data that later proves valuable. Route dropped-item counts to CloudWatch or Azure Monitor so that operators can tune the policy without guessing.
- Compress payloads. For HTTPS uploads, enable gzip compression on your
receiver. A batch of 100 JSON readings compresses to roughly 15-20% of its original size,
further reducing egress charges. The uploader does not compress by default (to keep
dependencies minimal), but you can subclass
HTTPCloudUploaderand overridepost_batch()to gzip the payload.
Summary
Pyvorin Edge is not a replacement for AWS IoT Greengrass or Azure IoT Hub; it is a preprocessor that makes these platforms affordable at scale. By aggregating windows, extracting events, and filtering privacy-sensitive fields before egress, the Edge Runtime reduces cloud messaging volumes by three orders of magnitude. The MQTT bridge configuration is a matter of a few lines in Mosquitto, and the HTTPS uploader integrates directly with IoT Hub's REST API. The result is a hybrid architecture where the edge does what the edge does best — real-time filtering and reduction — and the cloud does what the cloud does best: long-term storage, global dashboards, and machine learning at scale. At £18,560 per month in savings for a modest 2,000-sensor deployment, the business case writes itself.