Camera-vision-enrichment — vendor-neutral scene labels
A Viseron-shaped service that watches any RTSP stream and emits structured labels (people, vehicles, animals, parcels) for downstream automation.
Customers had cameras from 4–5 vendors. Each vendor's analytics is locked to their app. There was no way to write a single rule like 'tell me when anyone enters Zone B at night, regardless of which camera saw them.'
Pull RTSP from every camera, run a shared inference pipeline, expose state as a JSON document that other services can query or subscribe to. Treats vision as a substrate, not a product.
Standard detection + tracking + per-zone occupancy.
Defined the schema (the contract that downstream services depend on) and the operator-facing rule grammar.
A single 'scene state' that drives signage, alerts, dashboards, and the broadcast-studio cinematic camera switcher.
First version coupled too tightly to one camera vendor's RTSP quirks. Second version assumes nothing about the source.
Treat AI output as data, not as features. Ship it as a contract; let the apps that consume it be small and replaceable.
- 01If you're going to use vision in more than one product, build the labelling layer once and let everything subscribe to it.
- 02JSON schemas age better than ML model versions.
- 03Vendor-neutral substrates are the highest-leverage AI investment for multi-site operators.