What’s on Your Mind?

AI can reconstruct a person's private mental images by analyzing their brain activity.

Nick Bild
1 year agoMachine Learning & AI
Original images (top) and images predicted from brain activity (bottom) (📷: Y. Takagi et al.)

In recent groundbreaking studies, researchers have proved that it is possible to use machine learning algorithms to reconstruct a person’s visual experiences from their brain activity. These developments could have far-reaching implications in our understanding of how the human visual system works and may also help to tease out the connection between computer vision models and our visual system.

While the results of these studies have been fairly impressive, they have been plagued with some issues that have limited their ability to produce accurate images. Brain activity measured by functional Magnetic Resonance Imaging (fMRI) is not the expected input for existing generative models. This means that in order to leverage brain activity as an input, networks need to be trained from scratch, or, at a minimum, be fine-tuned for the specific stimuli used in the fMRI experiment.

Both of these options are prohibitively expensive for most applications due to the massive size of these generative models. Moreover, when working with fMRI data, the number of available samples is quite small — far smaller than is needed to produce a model that performs with acceptable accuracy.

A duo of researchers from Osaka University and the Center for Information and Neural Networks recently devised a new approach that sidesteps the issues that have hindered progress in past efforts to reconstruct mental images from brain activity. They have shown that it is possible to develop a framework that requires no training or fine-tuning of complex deep learning models. Moreover, they have demonstrated that their approach can produce very impressive results using the popular latent diffusion model called Stable Diffusion.

The team’s approach begins by showing a person an image. While the person views the image, measurements are captured from an fMRI machine. A small linear model is used to predict the latent representation of the image (a compressed version of the image containing only the most relevant and informative features) from the fMRI data. This yields an image that coarsely represents the fMRI data. The coarse image is then processed by the encoder of an autoencoder, then noise is added to it via a diffusion process.

A latent text representation is then decoded from the fMRI signals captured from the higher visual cortex. Both the coarse image and text representation is fed into a denoising U-Net, then finally is forwarded into the decoding module of an autoencoder to produce the final image. The fidelity of the sample predictions presented in the work is in many cases incredibly impressive — definitely worthy of a tip of the tin foil hat.

In the future, this work could play a crucial role in improving the computer vision technologies that we increasingly rely on for a wide range of applications, from facial recognition technology to self-driving cars. Perhaps these systems will become more efficient and resilient in the face of unexpected data, as the human visual system is. More importantly, women may finally be able to get an adequate answer from their husbands when they ask them what they are thinking about.

But the idea of reconstructing the private visual experiences of an individual also comes with some serious implications for privacy. It is not difficult to imagine serious abuses of a perfected version of this technology. It is an important technological advancement, but should also serve as a reminder that we need to watch where we are going.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles