Built in under 24 hours at the TinkerHub Physical AI Hackathon 2026, Kochi — Accessibility Track.
GitHub: https://github.com/akash2000e/NavCap
The ProblemA white cane is one of humanity's oldest assistive tools. It tells you what's on the ground — a curb, a step, a puddle. It tells you nothing about what's in the room.
Blind individuals navigating an unfamiliar indoor space have no affordable, wearable way to find a specific object. There's no device that lets a person say "find the switch board" and get guided toward it — hands-free, without pulling out a phone, without asking someone nearby.
Screen-based apps don't work if you can't see. GPS turn-by-turn only works outdoors. Smart glasses exist, but they cost thousands and speak back to you — adding noise to an already demanding sensory environment.
We wanted to build something different. Something that communicates through touch instead of sound. Something worn, not held. Something a person could use in a crowded room without anyone else noticing.
NavCap is a black cowboy hat with one camera, six vibration motors, and a Raspberry Pi inside — built in under 24 hours.
The SolutionSpeak a query. Feel your way there.
NavCap is a wearable navigation assistant built into a cowboy hat. The user presses a button on the brim, speaks the name of an object, and the hat finds it — then guides them toward it through vibration alone.
No screen. No audio instructions. No hands required after the button press.
"Find the switch board." The front-left motor pulses gently. Walk left. The pulses continue. You arrive.
The entire system runs on a Raspberry Pi 5 mounted inside the crown. One Pi Camera Module 3 watches the scene. Six coin vibration motors sewn around the brim tell you where to go. One Gemini API call identifies the object. Everything after that — tracking, guiding, pulsing — runs fully on-device.
This build uses off-the-shelf hardware intentionally — to prove the concept fast with parts anyone can source. The same system can be rebuilt with a custom PCB and purpose-made enclosure, making it completely indistinguishable from a normal hat. The tech is already there. The form factor is just waiting to catch up.
How It WorksThe pipeline has 8 steps. From the moment the button is pressed to the first motor pulse: roughly 2-3 seconds. After that, guidance is continuous and real-time.
Step 1 — Voice CaptureThe user presses the button on the hat brim and speaks a query. A USB microphone records 4 seconds of audio on-device using sounddevice, saved as a temporary WAV file.
Step 2 — Speech-to-TextThe WAV file is transcribed locally using Faster-Whisper (tiny model, CPU, float32). No cloud, no internet needed for this step. On the Pi 5, transcription takes 1-2 seconds for a 4-second clip. Output: a plain text string — "find the switch board".
Step 3 — Ring Buffer CameraWhile the user was speaking, the camera wasn't idle. A background thread continuously fills a 900-frame ring buffer at 30 FPS using Picamera2. This is the key trick that solves the API latency problem — explained in full below.
Step 4 — Gemini Object DetectionThe text query and one camera frame are sent to Gemini 2.5 Flash Lite via the REST API. Gemini returns a bounding box in a 0-1000 coordinate space identifying where the object is in the frame. This is the only API call in the entire session. Once the object is found, Gemini is done.
Step 5 — Sprint Catch-UpBy the time Gemini responds, the live camera has moved 30-90 frames ahead of the anchor frame. A KCF tracker is initialized on the anchor frame with Gemini's bounding box, then fast-forwarded through buffered frames at 5x speed until it catches up to live. No object is lost during the API wait.
Step 6 — Live TrackingThe tracker now runs on every new frame from the camera. tracker.update(live_frame) returns an updated bounding box each cycle. No more API calls. KCF (Kernelized Correlation Filters) runs at 30+ FPS on the Pi 5 without a GPU.
Step 7 — Zone EngineThe object's horizontal center pixel is mapped to one of 5 direction zones: BACK-LEFT | FRONT-LEFT | FRONT-CENTER | FRONT-RIGHT | BACK-RIGHT. The center zone is wider — being centered matters more than being slightly off-edge. Back zones are narrow — if the object reaches the frame edge, the user has overshot and needs to turn back.
Step 8 — Motor PulseA dedicated background thread runs haptic output independently of the main loop. When the active zone changes, the correct GPIO pin fires: 100ms ON, 400ms OFF — a clear 2 Hz pulse the user can feel and interpret without thinking about it. Motor timing is never disrupted by frame processing or tracking latency.
If the tracker loses the object — occlusion, fast movement, leaving the frame — all motors go silent and the system returns to idle, ready for the next button press.
The Ring Buffer Trick — Solving API LatencyThis is the most interesting engineering decision in the whole build, and the one most worth stealing for your own projects.
The ProblemGemini is a cloud API. Even on a fast connection, a round-trip takes 1-3 seconds. During that time, the camera keeps running and the user keeps moving. By the time Gemini returns a bounding box, the frame it analyzed is ancient history — the object has moved, the tracker has nothing to initialize on.
The naive fix is to freeze the camera while waiting for Gemini. But then if the user moves during the API call, the bounding box points to where the object was, not where it is.
The SolutionInstead of freezing, the camera runs continuously and writes every frame into a fixed-size ring buffer — 900 frames, ~30 seconds of rolling history at 30 FPS.
When the user presses the button:
- The anchor frame — the exact frame at button-press — is saved with its buffer index
- The voice is recorded and transcribed
- That anchor frame is sent to Gemini
- While Gemini processes, the camera keeps filling the buffer
- Gemini returns a bounding box for the anchor frame — a frame that still exists in the buffer
- The KCF tracker is initialized on the anchor frame and fast-forwarded to catch up to live
The API latency disappears from the user's perspective. By the time the first motor pulse fires, the tracker is locked onto the live feed.
Why KCF for the SprintKCF was chosen specifically because it is fast enough to sprint. During catch-up it steps through every 5th buffered frame — trading a little localization precision for the speed needed to close a 60-90 frame gap in milliseconds. Once live, it updates every frame at full 30 FPS.
CSRT is more accurate but runs at ~8 FPS on Pi 5 — too slow to sprint through a buffer in time. KCF is the right tradeoff for constrained hardware.
This Pattern is ReusableRing buffer + anchor frame + sprint catch-up applies anywhere you have:
- A slow external inference call (any cloud API, LLM, heavy local model)
- A continuous data stream you don't want to freeze
- A downstream consumer that needs to start on the exact frame the API saw
If you are building anything that mixes real-time streaming with cloud AI, this pattern is worth knowing.
Hardware BuildEverything in NavCap is mounted on or inside a standard black cowboy hat. The wide brim is a natural camera mount. The crown holds the Pi and driver board. The motors are sewn around the crown band.
ComponentsRaspberry Pi 5 — Main compute: vision, voice, AI, motor control Pi Camera Module 3 (wide-angle) — Object detection and live tracking input USB Microphone — Voice query capture ULN2803AG Darlington array — Drives all 6 motors from 3.3V GPIO signals 6x Coin vibration motors (5V) — Directional haptic feedback on the hat crown Micro-switch button — Triggers voice query on press 20, 000 mAh USB-C power bank — Powers the Pi for extended use
Motor PlacementSix motors are sewn around the inside of the crown band in a compass layout. The 5-zone guidance logic uses FL, FC (front-center), FR, BL, BR. The F and B motors are wired and reserved for future use — arrived detection and reacquisition signals.
Motor Driver PCBGPIO pins on the Pi 5 can only source ~8mA. A coin vibration motor draws 60-100mA. The ULN2803AG is an 8-channel Darlington transistor array — each input takes a 3.3V GPIO signal and switches up to 500mA from the 5V rail. One IC drives all six motors with no additional transistors.
Camera MountThe Pi Camera Module 3 is mounted on a 3D-printed bracket at the front of the hat crown, angled slightly downward to capture the space directly ahead of the user.
Trigger ButtonA micro-switch is mounted on the hat brim. Pressing it starts the voice recording cycle. Wired to GPIO 12 with a pull-up resistor — active LOW, 50ms software debounce.
GPIO Pin MapFRONT-CENTER → GPIO 23 FRONT-LEFT → GPIO 27 FRONT-RIGHT → GPIO 24 BACK-LEFT → GPIO 22 BACK-RIGHT → GPIO 25 B (spare) → GPIO 18 BUTTON → GPIO 12
Pi 5 note: The Pi 5 uses gpiochip4, not the legacy GPIO chip. RPi.GPIO does not work on Pi 5. Use lgpio. This trips up almost everyone building with Pi 5 for the first time.
Software & CodeAll logic lives in a single file: CV-Integrated/cv_integrated.py — 457 lines from startup to shutdown.
1. Voice Capture + Speech-to-TextRecords 4 seconds from the USB mic, transcribes locally with Faster-Whisper, returns plain text. The tiny Whisper model is deliberately chosen for speed — transcription completes in 1-2 seconds for a 4-second clip, entirely on CPU with no internet.
2. Gemini Object DetectionThe only API call in the whole system. Sends one frame + query, gets a bounding box back.
Important: Gemini returns coordinates in a 0-1000 space, not 0.0-1.0. If you skip dividing by 1000 before scaling to pixel coordinates, your bounding boxes will be wildly off-screen.
3. Ring Buffer Camera ThreadRuns as a daemon thread from startup. Continuously captures frames into a 900-frame circular buffer. At button press, the current write_ptr is saved as the anchor frame sent to Gemini. All frames after that are the sprint catch-up path back to live.
4. Motor Pulse ThreadA dedicated daemon thread fires GPIO independently of the main loop — guaranteeing a consistent 2 Hz pulse rhythm regardless of frame processing speed. The main loop calls set_motor(zone) — the only interface between tracking logic and haptic output.
State MachineSELECTING (idle, waiting for button press) → button press → voice recorded → Gemini returns bbox → sprint catch-up TRACKING (KCF locked on, motors pulsing) → tracker lost or press r → back to SELECTING → press q → EXITED (GPIO cleanup, process ends)
Setup & InstallationRequirements- Raspberry Pi 5 (4GB or 8GB)
- Pi Camera Module 3 connected via CSI ribbon cable
- USB microphone
- ULN2803AG wired to GPIO pins per the pin map above
- WiFi with internet access (for Gemini API)
- Google Gemini API key — free tier is sufficient for testing
sudo apt update sudo apt install python3-pip python3-opencv portaudio19-dev libcamera-dev -y pip3 install picamera2 faster-whisper lgpio numpy sounddevice scipy requests opencv-python
Set API Keyexport GOOGLE_API_KEY
Add this to ~/.bashrc to make it permanent.
Runpython3 CV-Integrated/cv_integrated.py
On startup: Whisper model loads (~5 seconds on first run), camera thread starts and fills the buffer, then "Press button to speak & search" appears on screen. Press the button, speak your query, wait 2-3 seconds, then follow the vibration.
ControlsPress button — Start voice query and detection r key — Reset to idle, keep camera running q key — Clean shutdown: GPIO cleanup, camera close
Challenges & Lessons LearnedPi 5 GPIO is a Breaking ChangeThe Pi 5 uses a completely new GPIO chip (gpiochip4). The RPi.GPIO library — used in virtually every Pi GPIO tutorial — does not work on Pi 5. You must use lgpio. This is not well documented and caused several hours of debugging early in the build.
KCF is Fast Enough — CSRT is NotCSRT is more accurate but runs at ~8 FPS on Pi 5 without a GPU — too slow for real-time guidance and completely unable to sprint through a 90-frame buffer in time. KCF runs at 30+ FPS and sprint-catches in under a second. For constrained hardware, KCF is the right choice.
Gemini's Coordinate Space is 0-1000, Not 0.0-1.0Gemini returns bounding boxes in a 0-1000 integer space per axis. The API documentation mentions this but it's easy to miss. If you don't divide by 1000 before scaling to pixel coordinates, your bounding boxes will be wildly off-screen.
The Ring Buffer Solves a Problem You Don't Know You HaveIn early testing without the buffer, detection felt broken — Gemini would return a valid bounding box but the tracker would immediately fail. The reason: by the time Gemini responded, the scene had changed and the bounding box no longer matched the live frame. The ring buffer fixes this completely.
A Dedicated Motor Thread is Non-NegotiableIn early builds, motor pulses were fired inline in the main tracking loop. Any slowdown in frame processing caused the haptic feedback to stutter or freeze. A stutter in guidance is worse than silence — it's misleading. Moving motor control to a dedicated thread completely solved this.
Single Gemini Call Per Session is IntentionalThe original plan called for re-querying Gemini every few seconds. In practice this was unnecessary — KCF handles gradual appearance changes well, and users re-trigger with a button press when they want a new object. One API call per query keeps latency low and cost zero.
What's Next- Arrived detection — fire all six motors simultaneously when the object center reaches the middle zone for N consecutive frames. Currently the system guides but never confirms arrival.
- Reacquisition on loss — when the tracker loses the object, automatically re-query Gemini using the last known bounding box region as a crop hint, rather than returning to idle.
- Offline fallback — replace the Gemini call with a local model (YOLOv8-nano or MobileNetSSD) for environments without internet. Accuracy drops but the core system remains functional.
- IMU head-pose correction — a BMX160 IMU was explored in the Arduino branch. If the user turns their head, the zone boundaries should rotate to compensate. Not implemented in this build.
- Custom enclosure — the current build is intentionally rough: off-the-shelf hat, exposed wiring, visible Pi. A purpose-made hat with a custom PCB and flex cable routing would be indistinguishable from a normal hat. The software is already production-ready.
NavCap was built in under 24 hours at the TinkerHub Physical AI Hackathon 2026, Kochi, April 18-19, Accessibility Track.







Comments