## Story
WearEdge Pro is a prototype industrial edge AI system that connects wearable first-person capture to a local Jetson inference node. The goal is to support frontline workflows such as predictive maintenance, quality inspection, changeover, work-instruction assistance, and hazard review without sending sensitive factory images to a cloud model.
The prototype uses a Jetson-class edge node running local multimodal inference. A wearable or browser client captures an image and sends it to the gateway. The model response is not treated as a final uncontrolled answer. It is validated into route-specific fields and converted into an operator action card, follow-up plan, and audit record.
## Why This Project Exists
Industrial AI has a different risk profile from consumer AI. A factory operator needs more than a fluent answer. They need a traceable recommendation:
- What image or evidence was used?
- Which workflow route was selected?
- Which fields were required?
- Is this a maintenance, quality, changeover, WI, or EHS action?
- Does the action require human confirmation?
- Can the event be audited later?
WearEdge Pro is built around that idea.
## Hardware
- Jetson-class 8GB edge node
- NVMe model storage
- Wearable first-person capture device or browser capture client
- Local network connection between capture device and Jetson
## Software
- llama.cpp multimodal endpoint
- GGUF model artifacts and multimodal projectors
- FastAPI gateway
- route-specific prompt contracts
- action-card and audit runtime
- PowerShell benchmark harness for Windows-to-Jetson testing
## Five-Model Jetson Benchmark
To choose the current baseline, we ran five compact multimodal candidates through the same Jetson endpoint path:
| Model | Result | Average latency | Role |
| --- | ---: | ---: | --- |
| Gemma 4 E2B | 5/5 agents | 37.51s raw | Current baseline |
| Qwen2.5-VL-3B | 5/5 at 560 and 1024 image tokens | 39.72s / 63.48s | OCR/IQC challenger |
| SmolVLM2-2.2B | 5/5 | 12.84s | Fast triage candidate |
| InternVL3-2B | 2/5 at ctx2048, 5/5 at ctx4096 | 80.35s at ctx4096 | Deferred |
| Qwen2.5-Omni-3B | 5/5 | 50.09s | Future audio/video branch |
## What We Learned
Qwen2.5-VL was the most compelling challenger. It read a changeover machine and SKU exactly as `LABELER-FL1` and `SKU-C500`, and it produced useful defect-score details for quality inspection.
SmolVLM2 was the fastest, but the answers were too generic for trusted industrial guidance.
InternVL3 needed a larger context window to complete the full matrix, and the resulting latency was too high for the current product baseline.
Qwen2.5-Omni ran successfully and is interesting for future speech/video workflows.
Gemma 4 E2B stayed the baseline because it best fit the complete runtime: local inference, multimodal evidence, structured contracts, action cards, guards, and audit logs.
## Build Notes
The practical architecture is:
```text
wearable image + operator context
- > local Jetson gateway
- > llama.cpp multimodal endpoint
- > route-specific contract
- > action card / follow-up plan
- > audit log
```
The model comparison uses one VLM endpoint at a time to avoid memory pressure and to keep results comparable.
## Future Work
- Put Qwen2.5-VL behind the same WearEdge guards for IQC/changeover A/B tests.
- Add better runtime telemetry capture during each model run.
- Evaluate native audio/video branches with Qwen2.5-Omni or future Gemma runtime support.
- Reduce latency for high-detail maintenance workflows.
- Prepare a fully public reproducibility package.
Public artifact note: the measured benchmark report and reproducible harness summary are ready locally; a public artifact mirror can be attached after the repository/artifact is opened.


Comments