Story
Components I used
How I build it
Pin Connections
How you can build it
Sometimes the best ideas start small—small enough to fit in your pockets

Published April 26, 2026 © GPL3+

Voice Assistant Using Arduino Nano ESP32

Voice Assistant on your Arduino nano ESP32 which answers all your questions on a tiny 0.96 OLED display. With an option for a pong game.

IntermediateFull instructions provided2 hours321

Voice Assistant Using Arduino Nano ESP32

Things used in this project

Hardware components

Arduino Nano ESP32

Adafruit Microphone Amplifier Breakout

ElectroPeak 0.96" OLED 64x128 Display Module

Jumper wires (generic)

SparkFun Pushbutton switch 12mm

Software apps and online services

Arduino IDE

MicroPython

Hand tools and fabrication machines

Soldering iron (generic)

Solder Wire, Lead Free

Story

Story-

A couple of days ago, a simple thought struck me 🤔: what if we had an actual pocket voice assistant—something that could answer our questions instantly, without pulling out a phone, unlocking it, opening an app, and typing or tapping around?

We all carry smartphones 📱, but interacting with a voice assistant on a dedicated device feels very different. It feels more direct, more natural—almost like talking to a tiny machine that actually listens 🎙️. That idea kept nagging me, so I decided to turn it into a real, working project.

The result is a compact voice assistant built on the Arduino Nano ESP32 ⚙️. You ask your question through a microphone, and the answer appears directly on a small OLED display 🖥️. No phone screen. No distractions. Just a button, your voice, and a response.

Components I used-

OLED Display 0.96 (I²C) x 1

Arduino Nano ESP32 x 1

Tactile Push Buttons x 1

MAX4466 Microphone x 1

How I build it-

The words you spoke is captured using a MAX4466 microphone module, converted into text using a speech-to-text model specifically which is Whisper, and then sent to an AI text generation model for generating a concise answer 🤖. The ESP32 handles Wi-Fi, processing, and display control.

Two push buttons make interaction straightforward 🔘—one to record your question and another to scroll through longer answers on the OLED. The display even handles long questions gracefully, so nothing feels cramped despite the small screen.

What makes this project exciting is its flexibility ✨. It uses Hugging Face APIs, which means you’re not locked into a single AI model. You can experiment, swap models, and choose whichever text-generation AI best fits your needs 🧠. The same idea can evolve into a study assistant, a technical helper, or even a portable AI companion for experiments and demos.

I also added a pong 🏓 game option in choose mode, from which there you can choose whether you want to play the pong game with the 2 buttons or use ai assistant. The difficulty in pong game gradually increases as per your score.

The Kill Switch🔴- There is also an option for a kill switch which returns to you to the mode choosing page. This is for exiting the ai or pong mode.

The Information Page 📃- There is also an information page that comes after the IoT HUB animation. This tells about the features and functions and the uses of buttons to make the control easy.

Pin Connections-

Arduino Nano ESP32 Connections

Power & Ground

3.3V → OLED VDD

3.3V → MAX4466 VCC

GND → OLED GND

GND → MAX4466 GND

GND → One side of both push buttons (common ground line)

OLED Display (SSD1306 – I²C)

SCK / SCL → A5 (Nano ESP32)

MAX4466 Microphone Module

OUT → A7 (Nano ESP32)

Push Buttons (2 Buttons)---

Button 1 (Record Button)

One pin → D8 (Nano ESP32)

Other pin → GND

Button 2 (Scroll Button)

One pin → D6 (Nano ESP32)

Other pin → GND

How you can build it-

For simplicity, I can provide you with a pre-compiled, customized binary file that you can upload using any ESP32 flasher tool, such as ESP TOOL.

So, if you’re interested in the complete binary code or want to build this project yourself, you can mail me at garageiot98@gmail.com 📩.

Also, If you want, you can flash the attached code using Arduino labs for Micropython which works the same.

This project is not about replacing your phone 🚫📱. It’s about exploring a different way of interacting with intelligence—one that’s tactile, focused, and built with your own hands 🔧.

Sometimes the best ideas start small—small enough to fit in your pockets.

Code

Micro PY. Code

# Forged with passion by IoT HUB

import network
import urequests as requests
import ujson
import time
from machine import Pin, ADC, I2C
import struct
import ssd1306
import random

SSID = " *WiFi SSID* "
PASSWORD = " *WiFi Password* "

HF_TOKEN = " *Hugging Face API KEY* "

WHISPER_URL = "https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3"

LLM_MODEL = "google/gemma-2-9b-it"
LLM_URL = "https://router.huggingface.co/v1/chat/completions"

# Faster, shorter answers
WORD_LIMIT = 60  # was 120

MIC_PIN = 4
SAMPLE_RATE = 12000
MAX_DURATION_SEC = 3  # was 6, shorter clip = faster STT
MAX_SAMPLES = SAMPLE_RATE * MAX_DURATION_SEC

I2C_SDA = 11
I2C_SCL = 12
OLED_WIDTH = 128
OLED_HEIGHT = 64
OLED_ADDR = 0x3C

BTN_REC_PIN = 8
BTN_SCROLL_PIN = 6

i2c = I2C(0, scl=Pin(I2C_SCL), sda=Pin(I2C_SDA))
oled = ssd1306.SSD1306_I2C(OLED_WIDTH, OLED_HEIGHT, i2c, addr=OLED_ADDR)

oled.write_cmd(0xC8)
oled.write_cmd(0xA1)

oled.fill(0)
oled.show()

btn_rec = Pin(BTN_REC_PIN, Pin.IN, Pin.PULL_UP)
btn_scroll = Pin(BTN_SCROLL_PIN, Pin.IN, Pin.PULL_UP)

CHARS_PER_LINE = 16
LINES_ON_SCREEN = 4

all_lines = []
answer_start_idx = 1
scroll_offset = 0
q_full = ""
q_marquee_idx = 0

current_mode = "HOME"
menu_sel = 0
kill_switch_cooldown = 0
kill_switch_triggered = False

def check_kill_switch():
    return btn_rec.value() == 0 and btn_scroll.value() == 0

def activate_kill_switch():
    global current_mode, kill_switch_cooldown, kill_switch_triggered
    print("Kill switch activated -> MENU")
    current_mode = "MENU"
    kill_switch_cooldown = time.ticks_ms() + 2000
    kill_switch_triggered = True
    while btn_rec.value() == 0 or btn_scroll.value() == 0:
        time.sleep_ms(50)
    time.sleep_ms(500)

def is_in_cooldown():
    return time.ticks_ms() < kill_switch_cooldown

def show_kill_timer(start_ms):
    elapsed = time.ticks_diff(time.ticks_ms(), start_ms)
    remaining = max(0, 4000 - elapsed)
    remaining_s = (remaining + 999) // 1000
    
    oled.fill(0)
    oled.text("Hold 4s...", 30, 20)
    oled.text("Exit in: %d" % remaining_s, 35, 40)
    oled.show()

def startup_splash():
    print("Startup splash: IoT HUB see-saw")
    duration_ms = 2500
    start = time.ticks_ms()

    x_center = OLED_WIDTH // 2
    x_iot = x_center - 40
    x_hub = x_center + 8
    dx = 3

    base_y = 30
    amp = 6

    while time.ticks_diff(time.ticks_ms(), start) < duration_ms:
        if check_kill_switch():
            time.sleep_ms(500)
            return "INFO"
        
        oled.fill(0)
        x_iot += dx
        x_hub -= dx
        
        if x_iot <= 5 or x_hub >= OLED_WIDTH - 24:
            dx = -dx
        
        phase = (x_iot - (x_center - 40)) / 40
        offset = int(amp * phase)
        yi = base_y - offset
        yh = base_y + offset
        
        oled.text("IoT", int(x_iot), yi)
        oled.text("HUB", int(x_hub), yh)
        
        oled.show()
        time.sleep_ms(60)
    
    return "INFO"

def show_info_page():
    print("Showing info page")
    info_lines = [
        "BTN A: Scroll",
        "      Move",
        "BTN B: Record",
        "      Select",
        "Both 4s: Menu",
    ]
    
    start = time.ticks_ms()
    
    while time.ticks_diff(time.ticks_ms(), start) < 4000:
        if check_kill_switch():
            time.sleep_ms(500)
            print("Info exit via kill switch")
            return "MENU"
        
        oled.fill(0)
        oled.rect(0, 0, 128, 64, 1)
        oled.rect(1, 1, 126, 62, 1)
        
        for i, line in enumerate(info_lines):
            oled.text(line[:16], 6, 6 + i*9)
        
        oled.show()
        time.sleep_ms(100)
    
    return "MENU"

def show_menu():
    global menu_sel
    print("Menu: sel=", menu_sel)
    
    oled.fill(0)
    oled.text("Choose mode:", 0, 0)
    
    options = ["PONG", "AI   "]
    for i in range(2):
        y = 20 + i * 12
        if i == menu_sel:
            oled.text("> " + options[i], 0, y)
        else:
            oled.text("  " + options[i], 0, y)
    
    oled.text("- IoT HUB", 54, 56)
    oled.show()

def menu_loop():
    global menu_sel, current_mode
    
    print("Entering menu loop")
    last_scroll = 1
    last_rec = 1
    both_pressed_start = None
    timer_shown = False
    
    while True:
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        
        rec_val = btn_rec.value()
        scroll_val = btn_scroll.value()
        
        if rec_val == 0 and scroll_val == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
                print("Both buttons pressed in menu")
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
            elif elapsed >= 4000:
                activate_kill_switch()
                show_menu()
                both_pressed_start = None
                timer_shown = False
        else:
            both_pressed_start = None
            timer_shown = False
        
        if scroll_val == 0 and last_scroll == 1:
            menu_sel = 1 - menu_sel
            print("Scroll: menu_sel now", menu_sel)
            show_menu()
        last_scroll = scroll_val
        
        if rec_val == 0 and last_rec == 1:
            if menu_sel == 0:
                current_mode = "PONG"
                print("PONG selected")
            else:
                current_mode = "AI"
                print("AI selected")
            return
        last_rec = rec_val
        
        time.sleep_ms(50)

def pong_game():
    global current_mode
    
    print("Starting pong game")
    score1 = 0
    score2 = 0
    paddle1_y = 24
    paddle2_y = 24
    ball_x = 64.0
    ball_y = 32.0
    ball_dx = random.choice([-3, 3])
    ball_dy = random.choice([-2, 2])
    
    paddle_size = 10
    paddle_speed = 4
    
    both_pressed_start = None
    timer_shown = False
    last_update = time.ticks_ms()
    
    while True:
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        
        rec_val = btn_rec.value()
        scroll_val = btn_scroll.value()
        
        if rec_val == 0 and scroll_val == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
                    continue
            elif elapsed >= 4000:
                activate_kill_switch()
                return
        else:
            both_pressed_start = None
            timer_shown = False
        
        if scroll_val == 0:
            paddle1_y = max(4, min(60, paddle1_y - paddle_speed))
        
        if rec_val == 0:
            paddle1_y = max(4, min(60, paddle1_y + paddle_speed))
        
        ai_speed = 1.5 + (score2 * 0.15)
        ai_speed = min(2.8, ai_speed)
        
        ai_error = random.randint(-3, 3) if random.random() < 0.4 else 0
        
        target_y = ball_y + ai_error
        if ball_y < paddle2_y - 1:
            paddle2_y = max(4, min(60, paddle2_y - ai_speed))
        elif ball_y > paddle2_y + 1:
            paddle2_y = max(4, min(60, paddle2_y + ai_speed))
        
        ball_x += ball_dx
        ball_y += ball_dy
        
        if ball_x <= 4:
            score2 += 1
            print("AI scores:", score2)
            if score2 >= 5:
                print("AI wins!")
                oled.fill(0)
                oled.text("AI WINS!", 40, 28)
                oled.show()
                time.sleep(2)
                current_mode = "MENU"
                return
            ball_x = 64.0
            ball_y = 32.0
            ball_dx = 3
            ball_dy = random.choice([-2, 2])
        elif ball_x >= OLED_WIDTH - 5:
            score1 += 1
            print("Player scores:", score1)
            if score1 >= 5:
                print("Player wins!")
                oled.fill(0)
                oled.text("YOU WIN!", 35, 28)
                oled.show()
                time.sleep(2)
                current_mode = "MENU"
                return
            ball_x = 64.0
            ball_y = 32.0
            ball_dx = -3
            ball_dy = random.choice([-2, 2])
        
        if ball_y <= 0 or ball_y >= OLED_HEIGHT - 1:
            ball_dy = -ball_dy
        
        if (ball_x <= 8 and abs(ball_y - paddle1_y) <= paddle_size):
            ball_dx = -ball_dx + random.uniform(-0.5, 0.5)
            ball_x = 10
        elif (ball_x >= OLED_WIDTH - 8 and abs(ball_y - paddle2_y) <= paddle_size):
            ball_dx = -ball_dx + random.uniform(-0.5, 0.5)
            ball_x = OLED_WIDTH - 10
        
        oled.fill(0)
        
        oled.text("%d  %d" % (score1, score2), 45, 0)
        
        for y in range(max(0, int(paddle1_y - paddle_size)), min(OLED_HEIGHT, int(paddle1_y + paddle_size + 1))):
            for x in range(1, 4):
                oled.pixel(x, y, 1)
        
        for y in range(max(0, int(paddle2_y - paddle_size)), min(OLED_HEIGHT, int(paddle2_y + paddle_size + 1))):
            for x in range(OLED_WIDTH - 4, OLED_WIDTH - 1):
                oled.pixel(x, y, 1)
        
        for bx in range(max(0, int(ball_x) - 1), min(OLED_WIDTH, int(ball_x) + 2)):
            for by in range(max(0, int(ball_y) - 1), min(OLED_HEIGHT, int(ball_y) + 2)):
                oled.pixel(bx, by, 1)
        
        oled.show()
        
        time.sleep_ms(30)

def oled_show_current_view():
    global q_full, q_marquee_idx
    
    oled.fill(0)

    if all_lines:
        prefix = "You: "
        if q_full:
            pad = " " * CHARS_PER_LINE
            base = q_full + pad

            if q_marquee_idx >= len(base):
                q_marquee_idx = 0

            window = base[q_marquee_idx:q_marquee_idx + CHARS_PER_LINE]
            if len(window) < CHARS_PER_LINE:
                window = window + base[:CHARS_PER_LINE - len(window)]

            q_line = prefix + window
        else:
            q_line = all_lines[0]

        oled.text(q_line[:CHARS_PER_LINE], 0, 0)

    for i in range(1, LINES_ON_SCREEN):
        line_idx = answer_start_idx + scroll_offset + (i - 1)
        y = i * 16
        if 0 <= line_idx < len(all_lines):
            oled.text(all_lines[line_idx][:CHARS_PER_LINE], 0, y)

    oled.show()

def word_wrap_to_lines(text):
    text = text.replace("\r", " ").replace("\n", " ")
    words = text.split()

    lines = []
    cur = ""
    for w in words:
        if not cur:
            if len(w) <= CHARS_PER_LINE:
                cur = w
            else:
                lines.append(w[:CHARS_PER_LINE])
                cur = ""
        elif len(cur) + 1 + len(w) <= CHARS_PER_LINE:
            cur += " " + w
        else:
            lines.append(cur)
            if len(w) <= CHARS_PER_LINE:
                cur = w
            else:
                lines.append(w[:CHARS_PER_LINE])
                cur = ""
    if cur:
        lines.append(cur)

    if not lines:
        lines = [""]
    return lines

def build_display_lines(question_text, answer_text):
    global all_lines, answer_start_idx, scroll_offset, q_full, q_marquee_idx

    print("Question transcript:", question_text)
    print("LLM answer:", answer_text)

    q_display = "You: " + question_text
    q_lines = word_wrap_to_lines(q_display)
    q0 = q_lines[0]

    prefix = "You: "
    if q_display.startswith(prefix):
        q_full = q_display[len(prefix):]
    else:
        q_full = q_display
    q_marquee_idx = 0

    a_lines = word_wrap_to_lines("AI: " + answer_text)

    all_lines = [q0] + a_lines
    answer_start_idx = 1
    scroll_offset = 0

def show_home():
    global all_lines, answer_start_idx, scroll_offset, q_full, q_marquee_idx
    all_lines = [
        "Gemma Voice",
        "Assistant"
    ]
    answer_start_idx = 1
    scroll_offset = 0
    q_full = ""
    q_marquee_idx = 0

    oled.fill(0)
    oled.text("Gemma Voice", 0, 0)
    oled.text("Assistant", 0, 16)
    oled.text("- IoT HUB", 54, 46)
    oled.show()
    print("Home screen shown")

def wifi_connect():
    print("WiFi: connecting to", SSID)
    wlan = network.WLAN(network.STA_IF)
    wlan.active(True)
    if not wlan.isconnected():
        all_lines[:] = ["WiFi...", "", "", ""]
        oled_show_current_view()
        wlan.connect(SSID, PASSWORD)
        while not wlan.isconnected():
            time.sleep(0.25)  # was 0.5
            print(".", end="")
        print()
    cfg = wlan.ifconfig()
    print("WiFi connected:", cfg)
    all_lines[:] = ["WiFi OK", "", "", ""]
    oled_show_current_view()
    time.sleep(0.4)  # was 0.7
    show_home()

def record_while_button():
    print("Waiting for record button...")
    adc = ADC(Pin(MIC_PIN))
    adc.atten(ADC.ATTN_11DB)
    adc.width(ADC.WIDTH_12BIT)

    buf = bytearray(MAX_SAMPLES * 2)
    idx = 0

    show_home()

    last = btn_rec.value()
    both_pressed_start = None
    timer_shown = False
    
    while True:
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        rec_val = btn_rec.value()
        scroll_val = btn_scroll.value()
        
        if rec_val == 0 and scroll_val == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
                    continue
            elif elapsed >= 4000:
                activate_kill_switch()
                return None
        else:
            both_pressed_start = None
            timer_shown = False
        
        v = btn_rec.value()
        if v == 0 and last == 1:
            print("Record button pressed, starting recording")
            break
        last = v
        time.sleep_ms(10)

    all_lines[:] = ["Recording...", "Release button", "to stop", ""]
    oled_show_current_view()
    start = time.ticks_ms()
    both_pressed_start = None
    timer_shown = False

    while btn_rec.value() == 0 and idx < len(buf):
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        if btn_rec.value() == 0 and btn_scroll.value() == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
                    continue
            elif elapsed >= 4000:
                activate_kill_switch()
                return None
        else:
            both_pressed_start = None
            timer_shown = False
        
        v = adc.read()
        v16 = (v - 2048) << 4
        if v16 < -32768:
            v16 = -32768
        if v16 > 32767:
            v16 = 32767
        struct.pack_into("<h", buf, idx, v16)
        idx += 2
        time.sleep_us(1000000 // SAMPLE_RATE)

    dur = time.ticks_diff(time.ticks_ms(), start)
    print("Recording done, duration ms:", dur, "bytes:", idx)

    if idx == 0:
        print("No audio captured")
        return None

    all_lines[:] = ["Processing audio", "", "", ""]
    oled_show_current_view()
    return memoryview(buf)[:idx]

def make_wav(pcm_bytes, sample_rate=SAMPLE_RATE, num_channels=1, bits_per_sample=16):
    byte_rate = sample_rate * num_channels * bits_per_sample // 8
    block_align = num_channels * bits_per_sample // 8
    subchunk2_size = len(pcm_bytes)
    chunk_size = 36 + subchunk2_size

    header = struct.pack(
        "<4sI4s4sIHHIIHH4sI",
        b"RIFF", chunk_size, b"WAVE", b"fmt ", 16, 1, num_channels,
        sample_rate, byte_rate, block_align, bits_per_sample, b"data", subchunk2_size,
    )
    print("WAV built, size:", len(header) + len(pcm_bytes))
    return header + pcm_bytes

def whisper_stt(wav_bytes):
    # single quick attempt, no retries
    headers = {"Authorization": "Bearer " + HF_TOKEN, "Content-Type": "audio/wav"}

    print("STT: sending to Whisper, len:", len(wav_bytes))
    all_lines[:] = ["Sending to STT", "", "", ""]
    oled_show_current_view()
    try:
        r = requests.post(WHISPER_URL, headers=headers, data=wav_bytes)
        print("STT HTTP status:", r.status_code)
        txt = r.text
        print("STT raw response:", txt)

        transcript = None
        try:
            js = ujson.loads(txt)
            if isinstance(js, list) and js and isinstance(js[0], dict) and "text" in js[0]:
                transcript = js[0]["text"]
            elif isinstance(js, dict) and "text" in js:
                transcript = js["text"]
        except Exception as e:
            print("STT JSON error:", e)

        r.close()
        print("STT transcript:", transcript)
        return transcript
    except Exception as e:
        print("STT HTTP error:", e)
        return None

def limit_words(text, max_words):
    words = text.split()
    if len(words) > max_words:
        return " ".join(words[:max_words]) + "..."
    return text

def llm_answer(question_text):
    payload = {
        "model": LLM_MODEL,
        "messages": [
            {"role": "system", "content": "You are a helpful chatbot running on an ESP32 Nano voice assistant. Answer very concisely in 2-4 short sentences."},
            {"role": "user", "content": question_text}
        ],
        "max_tokens": 120,   # was 200
        "temperature": 0.6   # slightly lower
    }

    headers = {"Authorization": "Bearer " + HF_TOKEN, "Content-Type": "application/json"}

    print("LLM: sending question to Gemma:", question_text)
    all_lines[:] = ["Sending to LLM", "", "", ""]
    oled_show_current_view()

    try:
        r = requests.post(LLM_URL, headers=headers, data=ujson.dumps(payload))
        print("LLM HTTP status:", r.status_code)
        raw = r.text
        print("LLM raw response:", raw)

        answer = None
        try:
            js = ujson.loads(raw)
            if "choices" in js and js["choices"]:
                msg = js["choices"][0].get("message", {})
                content = msg.get("content", "")
                if content:
                    answer = limit_words(content, WORD_LIMIT)
        except Exception as e:
            print("LLM JSON error:", e)

        r.close()
        print("LLM final answer:", answer)
        return answer
    except Exception as e:
        print("LLM HTTP error:", e)
        return None

def wait_scroll_mode():
    global scroll_offset, q_marquee_idx, current_mode

    last_scroll = btn_scroll.value()
    last_rec = btn_rec.value()

    max_offset = max(0, len(all_lines) - answer_start_idx - (LINES_ON_SCREEN - 1))
    last_anim = time.ticks_ms()
    last_marquee_step = time.ticks_ms()
    both_pressed_start = None
    timer_shown = False

    print("Entering scroll mode, max_offset:", max_offset)

    while True:
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        
        rec_val = btn_rec.value()
        scroll_val = btn_scroll.value()
        
        if rec_val == 0 and scroll_val == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
                    continue
            elif elapsed >= 4000:
                activate_kill_switch()
                return

        if time.ticks_diff(now, last_anim) > 50:
            oled_show_current_view()
            last_anim = now

        if time.ticks_diff(now, last_marquee_step) > 250:
            if q_full:
                q_marquee_idx += 1
            last_marquee_step = now

        v_scroll = btn_scroll.value()
        if v_scroll == 0 and last_scroll == 1:
            scroll_offset += 1
            if scroll_offset > max_offset:
                scroll_offset = 0
            print("Scroll offset:", scroll_offset)
            oled_show_current_view()
        last_scroll = v_scroll

        v_rec = btn_rec.value()
        if v_rec == 0 and last_rec == 1:
            print("Exit scroll mode button pressed")
            while btn_rec.value() == 0:
                time.sleep_ms(10)
            break
        last_rec = v_rec

        time.sleep_ms(10)

def ai_mode_loop():
    global current_mode
    print("AI mode started")
    wifi_connect()
    
    last_rec = 1
    last_scroll = 1
    both_pressed_start = None
    timer_shown = False
    
    while True:
        if is_in_cooldown():
            time.sleep_ms(50)
            continue
        
        now = time.ticks_ms()
        
        rec_val = btn_rec.value()
        scroll_val = btn_scroll.value()
        
        if rec_val == 0 and scroll_val == 0:
            if both_pressed_start is None:
                both_pressed_start = now
                timer_shown = False
            
            elapsed = time.ticks_diff(now, both_pressed_start)
            
            if elapsed > 2000 and elapsed < 4000:
                if not timer_shown or elapsed % 1000 < 100:
                    show_kill_timer(both_pressed_start)
                    timer_shown = True
                    continue
            elif elapsed >= 4000:
                activate_kill_switch()
                return
        else:
            both_pressed_start = None
            timer_shown = False
        
        if rec_val == 0 and last_rec == 1:
            pcm = record_while_button()
            if current_mode == "MENU":
                return
            if not pcm:
                all_lines[:] = ["No audio", "Hold BTN8", "to record", ""]
                oled_show_current_view()
                print("Loop: no audio, back to idle")
                time.sleep(0.6)  # shorter pause
                last_rec = rec_val
                continue

            wav = make_wav(pcm, sample_rate=SAMPLE_RATE)

            transcript = whisper_stt(wav)
            if current_mode == "MENU":
                return
            if not transcript:
                all_lines[:] = ["STT failed", "", "", ""]
                oled_show_current_view()
                print("Loop: STT failed, back to idle")
                time.sleep(0.6)
                last_rec = rec_val
                continue

            answer = llm_answer(transcript)
            if current_mode == "MENU":
                return
            if not answer:
                all_lines[:] = ["LLM failed", "", "", ""]
                oled_show_current_view()
                print("Loop: LLM failed, back to idle")
                time.sleep(0.6)
                last_rec = rec_val
                continue

            build_display_lines(transcript, answer)
            oled_show_current_view()

            wait_scroll_mode()
            if current_mode == "MENU":
                return
            show_home()
            print("Loop: finished one Q&A cycle\n")
        
        last_rec = rec_val
        last_scroll = scroll_val
        time.sleep_ms(40)  # was 50

def main():
    global current_mode
    print("Boot: starting multi-mode assistant")
    mode = startup_splash()
    print("After splash, mode:", mode)
    
    mode = show_info_page()
    print("After info, mode:", mode)
    
    current_mode = "MENU"
    show_menu()
    
    while True:
        if current_mode == "MENU":
            menu_loop()
        elif current_mode == "PONG":
            pong_game()
            if current_mode == "MENU":
                show_menu()
        elif current_mode == "AI":
            ai_mode_loop()
            if current_mode == "MENU":
                show_menu()
        elif current_mode == "HOME":
            show_home()
            time.sleep(2)
            current_mode = "MENU"
            show_menu()
        
        time.sleep_ms(100)

main()