NavCap: Wearable AI Navigation Hat for the Blind
Gemini Vision + Haptic Feedback on Raspberry Pi 5
The Problem
The Solution
How It Works
Step 1 — Voice Capture
Step 2 — Speech-to-Text
Step 3 — Ring Buffer Camera
Step 4 — Gemini Object Detection
Step 5 — Sprint Catch-Up
Step 6 — Live Tracking
Step 7 — Zone Engine
Step 8 — Motor Pulse
The Ring Buffer Trick — Solving API Latency
The Problem
The Solution
Why KCF for the Sprint
This Pattern is Reusable
Hardware Build
Components
Motor Placement
Motor Driver PCB
Camera Mount
Trigger Button
GPIO Pin Map
Software & Code
1. Voice Capture + Speech-to-Text
2. Gemini Object Detection
3. Ring Buffer Camera Thread
4. Motor Pulse Thread
State Machine
Setup & Installation
Requirements
Install Dependencies
Set API Key
Run
Controls
Challenges & Lessons Learned
Pi 5 GPIO is a Breaking Change
KCF is Fast Enough — CSRT is Not
Gemini's Coordinate Space is 0-1000, Not 0.0-1.0
The Ring Buffer Solves a Problem You Don't Know You Have
A Dedicated Motor Thread is Non-Negotiable
Single Gemini Call Per Session is Intentional
What's Next
Team & Context

Akash

•

Ashish Joy

•

Sushant S Nair

Published May 13, 2026 © CC BY-NC-SA

NavCap

A wearable AI hat that guides blind users to any object they name.

ExpertFull instructions provided24 hours107

Things used in this project

Hardware components

Raspberry Pi 5

Raspberry Pi Camera Module 3

Seeed Studio Grove - Haptic Motor

USB Microphone

Story

Gemini Vision + Haptic Feedback on Raspberry Pi 5

Built in under 24 hours at the TinkerHub Physical AI Hackathon 2026, Kochi — Accessibility Track.

GitHub: https://github.com/akash2000e/NavCap

The Problem

wearing NavCap at the hackathon

A white cane is one of humanity's oldest assistive tools. It tells you what's on the ground — a curb, a step, a puddle. It tells you nothing about what's in the room.

Blind individuals navigating an unfamiliar indoor space have no affordable, wearable way to find a specific object. There's no device that lets a person say "find the switch board" and get guided toward it — hands-free, without pulling out a phone, without asking someone nearby.

Screen-based apps don't work if you can't see. GPS turn-by-turn only works outdoors. Smart glasses exist, but they cost thousands and speak back to you — adding noise to an already demanding sensory environment.

We wanted to build something different. Something that communicates through touch instead of sound. Something worn, not held. Something a person could use in a crowded room without anyone else noticing.

NavCap is a black cowboy hat with one camera, six vibration motors, and a Raspberry Pi inside — built in under 24 hours.

The Solution

Full NavCap assembly on table

Speak a query. Feel your way there.

NavCap is a wearable navigation assistant built into a cowboy hat. The user presses a button on the brim, speaks the name of an object, and the hat finds it — then guides them toward it through vibration alone.

No screen. No audio instructions. No hands required after the button press.

"Find the switch board." The front-left motor pulses gently. Walk left. The pulses continue. You arrive.

The entire system runs on a Raspberry Pi 5 mounted inside the crown. One Pi Camera Module 3 watches the scene. Six coin vibration motors sewn around the brim tell you where to go. One Gemini API call identifies the object. Everything after that — tracking, guiding, pulsing — runs fully on-device.

Hat side view

This build uses off-the-shelf hardware intentionally — to prove the concept fast with parts anyone can source. The same system can be rebuilt with a custom PCB and purpose-made enclosure, making it completely indistinguishable from a normal hat. The tech is already there. The form factor is just waiting to catch up.

How It Works

Interior showing Pi 5, camera, motors

The pipeline has 8 steps. From the moment the button is pressed to the first motor pulse: roughly 2-3 seconds. After that, guidance is continuous and real-time.

Step 1 — Voice Capture

The user presses the button on the hat brim and speaks a query. A USB microphone records 4 seconds of audio on-device using sounddevice, saved as a temporary WAV file.

Step 2 — Speech-to-Text

The WAV file is transcribed locally using Faster-Whisper (tiny model, CPU, float32). No cloud, no internet needed for this step. On the Pi 5, transcription takes 1-2 seconds for a 4-second clip. Output: a plain text string — "find the switch board".

Step 3 — Ring Buffer Camera

While the user was speaking, the camera wasn't idle. A background thread continuously fills a 900-frame ring buffer at 30 FPS using Picamera2. This is the key trick that solves the API latency problem — explained in full below.

Step 4 — Gemini Object Detection

The text query and one camera frame are sent to Gemini 2.5 Flash Lite via the REST API. Gemini returns a bounding box in a 0-1000 coordinate space identifying where the object is in the frame. This is the only API call in the entire session. Once the object is found, Gemini is done.

Step 5 — Sprint Catch-Up

By the time Gemini responds, the live camera has moved 30-90 frames ahead of the anchor frame. A KCF tracker is initialized on the anchor frame with Gemini's bounding box, then fast-forwarded through buffered frames at 5x speed until it catches up to live. No object is lost during the API wait.

Step 6 — Live Tracking

Live CV tracking demo

The tracker now runs on every new frame from the camera. tracker.update(live_frame) returns an updated bounding box each cycle. No more API calls. KCF (Kernelized Correlation Filters) runs at 30+ FPS on the Pi 5 without a GPU.

Step 7 — Zone Engine

The object's horizontal center pixel is mapped to one of 5 direction zones: BACK-LEFT | FRONT-LEFT | FRONT-CENTER | FRONT-RIGHT | BACK-RIGHT. The center zone is wider — being centered matters more than being slightly off-edge. Back zones are narrow — if the object reaches the frame edge, the user has overshot and needs to turn back.

Step 8 — Motor Pulse

Trigger button on the brim

A dedicated background thread runs haptic output independently of the main loop. When the active zone changes, the correct GPIO pin fires: 100ms ON, 400ms OFF — a clear 2 Hz pulse the user can feel and interpret without thinking about it. Motor timing is never disrupted by frame processing or tracking latency.

If the tracker loses the object — occlusion, fast movement, leaving the frame — all motors go silent and the system returns to idle, ready for the next button press.

The Ring Buffer Trick — Solving API Latency

This is the most interesting engineering decision in the whole build, and the one most worth stealing for your own projects.

The Problem

Gemini is a cloud API. Even on a fast connection, a round-trip takes 1-3 seconds. During that time, the camera keeps running and the user keeps moving. By the time Gemini returns a bounding box, the frame it analyzed is ancient history — the object has moved, the tracker has nothing to initialize on.

The naive fix is to freeze the camera while waiting for Gemini. But then if the user moves during the API call, the bounding box points to where the object was, not where it is.

The Solution

Instead of freezing, the camera runs continuously and writes every frame into a fixed-size ring buffer — 900 frames, ~30 seconds of rolling history at 30 FPS.

When the user presses the button:

The anchor frame — the exact frame at button-press — is saved with its buffer index
The voice is recorded and transcribed
That anchor frame is sent to Gemini
While Gemini processes, the camera keeps filling the buffer
Gemini returns a bounding box for the anchor frame — a frame that still exists in the buffer
The KCF tracker is initialized on the anchor frame and fast-forwarded to catch up to live

The API latency disappears from the user's perspective. By the time the first motor pulse fires, the tracker is locked onto the live feed.

Why KCF for the Sprint

KCF was chosen specifically because it is fast enough to sprint. During catch-up it steps through every 5th buffered frame — trading a little localization precision for the speed needed to close a 60-90 frame gap in milliseconds. Once live, it updates every frame at full 30 FPS.

CSRT is more accurate but runs at ~8 FPS on Pi 5 — too slow to sprint through a buffer in time. KCF is the right tradeoff for constrained hardware.

This Pattern is Reusable

Ring buffer + anchor frame + sprint catch-up applies anywhere you have:

A slow external inference call (any cloud API, LLM, heavy local model)
A continuous data stream you don't want to freeze
A downstream consumer that needs to start on the exact frame the API saw

If you are building anything that mixes real-time streaming with cloud AI, this pattern is worth knowing.

Hardware Build

Top-down diagram: components, motor positions, wiring, camera FOV

Everything in NavCap is mounted on or inside a standard black cowboy hat. The wide brim is a natural camera mount. The crown holds the Pi and driver board. The motors are sewn around the crown band.

Components

Raspberry Pi 5 — Main compute: vision, voice, AI, motor control Pi Camera Module 3 (wide-angle) — Object detection and live tracking input USB Microphone — Voice query capture ULN2803AG Darlington array — Drives all 6 motors from 3.3V GPIO signals 6x Coin vibration motors (5V) — Directional haptic feedback on the hat crown Micro-switch button — Triggers voice query on press 20, 000 mAh USB-C power bank — Powers the Pi for extended use

Motor Placement

Motors and wiring sewn around the brim

Six motors are sewn around the inside of the crown band in a compass layout. The 5-zone guidance logic uses FL, FC (front-center), FR, BL, BR. The F and B motors are wired and reserved for future use — arrived detection and reacquisition signals.

Motor Driver PCB

ULN2803AG PCB with GPIO ribbon and motor wires

GPIO pins on the Pi 5 can only source ~8mA. A coin vibration motor draws 60-100mA. The ULN2803AG is an 8-channel Darlington transistor array — each input takes a 3.3V GPIO signal and switches up to 500mA from the 5V rail. One IC drives all six motors with no additional transistors.

Camera Mount

Camera on 3D-printed bracket with ribbon cable

The Pi Camera Module 3 is mounted on a 3D-printed bracket at the front of the hat crown, angled slightly downward to capture the space directly ahead of the user.

Trigger Button

A micro-switch is mounted on the hat brim. Pressing it starts the voice recording cycle. Wired to GPIO 12 with a pull-up resistor — active LOW, 50ms software debounce.

GPIO Pin Map

FRONT-CENTER → GPIO 23 FRONT-LEFT → GPIO 27 FRONT-RIGHT → GPIO 24 BACK-LEFT → GPIO 22 BACK-RIGHT → GPIO 25 B (spare) → GPIO 18 BUTTON → GPIO 12

Pi 5 note: The Pi 5 uses gpiochip4, not the legacy GPIO chip. RPi.GPIO does not work on Pi 5. Use lgpio. This trips up almost everyone building with Pi 5 for the first time.

Software & Code

All logic lives in a single file: CV-Integrated/cv_integrated.py — 457 lines from startup to shutdown.

1. Voice Capture + Speech-to-Text

Records 4 seconds from the USB mic, transcribes locally with Faster-Whisper, returns plain text. The tiny Whisper model is deliberately chosen for speed — transcription completes in 1-2 seconds for a 4-second clip, entirely on CPU with no internet.

2. Gemini Object Detection

The only API call in the whole system. Sends one frame + query, gets a bounding box back.

Important: Gemini returns coordinates in a 0-1000 space, not 0.0-1.0. If you skip dividing by 1000 before scaling to pixel coordinates, your bounding boxes will be wildly off-screen.

3. Ring Buffer Camera Thread

Runs as a daemon thread from startup. Continuously captures frames into a 900-frame circular buffer. At button press, the current write_ptr is saved as the anchor frame sent to Gemini. All frames after that are the sprint catch-up path back to live.

4. Motor Pulse Thread

A dedicated daemon thread fires GPIO independently of the main loop — guaranteeing a consistent 2 Hz pulse rhythm regardless of frame processing speed. The main loop calls set_motor(zone) — the only interface between tracking logic and haptic output.

State Machine

SELECTING (idle, waiting for button press) → button press → voice recorded → Gemini returns bbox → sprint catch-up TRACKING (KCF locked on, motors pulsing) → tracker lost or press r → back to SELECTING → press q → EXITED (GPIO cleanup, process ends)

Setup & Installation

Requirements

Raspberry Pi 5 (4GB or 8GB)
Pi Camera Module 3 connected via CSI ribbon cable
USB microphone
ULN2803AG wired to GPIO pins per the pin map above
WiFi with internet access (for Gemini API)
Google Gemini API key — free tier is sufficient for testing

Install Dependencies

sudo apt update sudo apt install python3-pip python3-opencv portaudio19-dev libcamera-dev -y pip3 install picamera2 faster-whisper lgpio numpy sounddevice scipy requests opencv-python

Set API Key

export GOOGLE_API_KEY

Add this to ~/.bashrc to make it permanent.

Run

python3 CV-Integrated/cv_integrated.py

On startup: Whisper model loads (~5 seconds on first run), camera thread starts and fills the buffer, then "Press button to speak & search" appears on screen. Press the button, speak your query, wait 2-3 seconds, then follow the vibration.

Controls

Press button — Start voice query and detection r key — Reset to idle, keep camera running q key — Clean shutdown: GPIO cleanup, camera close

Challenges & Lessons Learned

Pi 5 GPIO is a Breaking Change

The Pi 5 uses a completely new GPIO chip (gpiochip4). The RPi.GPIO library — used in virtually every Pi GPIO tutorial — does not work on Pi 5. You must use lgpio. This is not well documented and caused several hours of debugging early in the build.

KCF is Fast Enough — CSRT is Not

CSRT is more accurate but runs at ~8 FPS on Pi 5 without a GPU — too slow for real-time guidance and completely unable to sprint through a 90-frame buffer in time. KCF runs at 30+ FPS and sprint-catches in under a second. For constrained hardware, KCF is the right choice.

Gemini's Coordinate Space is 0-1000, Not 0.0-1.0

Gemini returns bounding boxes in a 0-1000 integer space per axis. The API documentation mentions this but it's easy to miss. If you don't divide by 1000 before scaling to pixel coordinates, your bounding boxes will be wildly off-screen.

The Ring Buffer Solves a Problem You Don't Know You Have

In early testing without the buffer, detection felt broken — Gemini would return a valid bounding box but the tracker would immediately fail. The reason: by the time Gemini responded, the scene had changed and the bounding box no longer matched the live frame. The ring buffer fixes this completely.

A Dedicated Motor Thread is Non-Negotiable

In early builds, motor pulses were fired inline in the main tracking loop. Any slowdown in frame processing caused the haptic feedback to stutter or freeze. A stutter in guidance is worse than silence — it's misleading. Moving motor control to a dedicated thread completely solved this.

Single Gemini Call Per Session is Intentional

The original plan called for re-querying Gemini every few seconds. In practice this was unnecessary — KCF handles gradual appearance changes well, and users re-trigger with a button press when they want a new object. One API call per query keeps latency low and cost zero.

What's Next

Arrived detection — fire all six motors simultaneously when the object center reaches the middle zone for N consecutive frames. Currently the system guides but never confirms arrival.
Reacquisition on loss — when the tracker loses the object, automatically re-query Gemini using the last known bounding box region as a crop hint, rather than returning to idle.
Offline fallback — replace the Gemini call with a local model (YOLOv8-nano or MobileNetSSD) for environments without internet. Accuracy drops but the core system remains functional.
IMU head-pose correction — a BMX160 IMU was explored in the Arduino branch. If the user turns their head, the zone boundaries should rotate to compensate. Not implemented in this build.
Custom enclosure — the current build is intentionally rough: off-the-shelf hat, exposed wiring, visible Pi. A purpose-made hat with a custom PCB and flex cable routing would be indistinguishable from a normal hat. The software is already production-ready.

Team & Context

NavCap was built in under 24 hours at the TinkerHub Physical AI Hackathon 2026, Kochi, April 18-19, Accessibility Track.

GitHub: https://github.com/akash2000e/NavCap

Embed the widget on your own site

NavCap

NavCap

Things used in this project

Hardware components

Story

NavCap: Wearable AI Navigation Hat for the Blind

Gemini Vision + Haptic Feedback on Raspberry Pi 5

The Problem

The Solution

How It Works

Step 1 — Voice Capture

Step 2 — Speech-to-Text

Step 3 — Ring Buffer Camera

Step 4 — Gemini Object Detection

Step 5 — Sprint Catch-Up

Step 6 — Live Tracking

Step 7 — Zone Engine

Step 8 — Motor Pulse

The Ring Buffer Trick — Solving API Latency

The Problem

The Solution

Why KCF for the Sprint

This Pattern is Reusable

Hardware Build

Components

Motor Placement

Motor Driver PCB

Camera Mount

Trigger Button

GPIO Pin Map

Software & Code

1. Voice Capture + Speech-to-Text

2. Gemini Object Detection

3. Ring Buffer Camera Thread

4. Motor Pulse Thread

State Machine

Setup & Installation

Requirements

Install Dependencies

Set API Key

Run

Controls

Challenges & Lessons Learned

Pi 5 GPIO is a Breaking Change

KCF is Fast Enough — CSRT is Not

Gemini's Coordinate Space is 0-1000, Not 0.0-1.0

The Ring Buffer Solves a Problem You Don't Know You Have

A Dedicated Motor Thread is Non-Negotiable

Single Gemini Call Per Session is Intentional

What's Next

Team & Context

Credits

Akash

Ashish Joy

Sushant S Nair

Comments

Related channels and tags