AI Reveals Your Innermost Thoughts
Mind captioning uses AI to turn brain activity into text that reveals all of your inner thoughts, even visual images.
We are all entitled to have our own private thoughts on every matter, but that may not be the case for much longer. A Japanese researcher at the Communication Science Laboratories of NTT, Inc. just demonstrated a brain decoder that translates visual thoughts into text. By measuring brain activity, this system has the potential to reveal your innermost thoughts — even if you don’t want to reveal them.
The new method, called mind captioning, describes what a person is currently looking at, or remembering, without requiring them to speak, or even attempt to speak, as other similar systems have in the past. This does offer some very exciting possibilities in the area of assistive technology, but it also raises some very serious privacy-related concerns for the future.
The technique works by combining functional magnetic resonance imaging (fMRI) — which tracks changes in blood flow to measure brain activity — with deep language models, the same kind of AI systems that power tools like ChatGPT. When a person watches a video or recalls it later, fMRI captures their brain’s responses. These signals are then translated into semantic features (numerical representations of meaning) derived from a pretrained language model.
Using these features as a bridge between brain signals and words, the system generates descriptions that mirror the person’s mental content. Rather than pulling from a database of prewritten sentences, the researcher built a linear decoding model that maps patterns of brain activity directly to these semantic features. Then, an AI language model iteratively refines the text, replacing and adjusting words so that the meaning of the evolving sentence aligns as closely as possible with what the brain data suggests.
This results in coherent, detailed sentences that accurately describe what the subject is seeing or remembering. In tests, the AI-generated text captured the essence of short video clips, including objects, actions, and interactions, even when specific details were slightly off. For instance, if the model failed to identify an object correctly, it still conveyed the relationships between multiple elements, such as one object acting upon another.
This process didn’t rely on the brain’s language network (the regions responsible for speech and writing), meaning the method can tap into visual and conceptual thought directly. This opens the door to a potential new form of communication for individuals who cannot speak, such as people with aphasia or severe paralysis.
It was also demonstrated that when words were shuffled in the generated sentences, accuracy dropped sharply, confirming that the brain-decoded features contain genuine structured semantic information, not just a list of objects, but an understanding of how those objects relate in a scene. This suggests the system captures something deeper than just recognition. It reflects the relational and contextual structure of human thought.
If machines can reconstruct mental content with this level of precision, questions of mental privacy and consent will become more pressing than ever. For now, mind captioning remains a research tool, not a mind reader, but it is a significant step toward decoding the human imagination itself.