This project's goal was to combine a magic mirror like device for use as a monitor for rendering stable diffusion modified images back to users. This is accomplished by having a Jetson AGX Orin handle run an instance of ComfyUI for generating stable diffusion XL rendered images. A script on the device has a thread used for updating the rendered image based on a captured frame and the current prompt. The current prompt can also be updated at will via a voice recording provided.
To capture the recording a Xiao ESP32S3 (from Seeed Studio) is used to trigger an API endpoint surfaced by the script.
Mirror Physical BodyFor the mirror we opted to reuse the mirror we created for an earlier Hackster project the Jetson MagnaMirror. It has an acrylic sheet that allows for it to act as a mirror when the screen is off but as a display (with slightly darkened visuals) when a screen is projecting an image.
I've gone ahead and added the steps here from the original article to make it easier to follow along with the setup. In general it's fairly straight forward you'll need to: 1) open shadowbox, 2) remove bevel holding the clear acrylic, 3) cutting black foam core for the area surrounding monitor, 4) insert foam bumpers, 5) cut and place back piece, 6) add additional bumpers as offsets, and 7) use a cord to ensure back is secure.
With that in place you have a mirror that can be used with the monitor hidden inside of it. From the perspective of someone entering your house normally it looks like a generic wall mirror but once you turn it on you can see that it's not but instead if a working screen.
ComponentsThe two major components of this device are the Jetson AGX Orin Developer Kit acting as the server and machine learning hub and the Xiao ESP32S3 Sense, a powerful yet very tiny, microcontroller from Seeed Studio. Together while both are rather small in size act as quite the combination delivering fast renders to the mirror screen.
JetsonAGXOrin:
- Running ComfyUI as a server in a background process
- Runs main script providing API endpoints, communication with ComfyUI using a json payload, and updating the display
- API endpoints include one for updating the underlying image being transformed and for processing voice content
- Provides power to the Xiao ESP32S3 via a USB cord
- Has the ability to output to a display for rendering
- Has a Logitech webcam attached for taking photos
- Plenty of USB inputs for keyboard / mouse as needed during development
XiaoESP32S3 Sense:
- Has an ESP32S3 as the onboard SoC
- Small form factor / Xiao sizing
JetsonAGXOrin:
- Install Jetpack (I used JetPack 6.2.1) following guides (I used the Jetson SDK Manager)
- Install and setup an NVME SSD for additional storage space (one guide I found may be helpful here) as otherwise you may run out of storage downloading models
- On the SSD make a folder as your workspace to store everything, upload the code here to that folder
JetsonAGXOrin - ComfyUI:
- Inside of that folder fetch the ComfyUI code via their git repository)
- Inside of the ComfyUI folder create a python virtual environment (the sh file expects one named venv so unless you change there follow this:
python3 -m venv venv) - You can activate that venv from inside your ComfyUI folder now via `source venv/bin/activate`
- Once activated install pytorch `pip install torch torchvision torchaudio --index-url https://pypi.jetson-ai-lab.io/jp6/cu126`
- Next install requirements from ComfyUI
pip install -r requirements.txt - With that set you should be able to run ComfyUI inside your virtual environment via
pythonmain.py - If you'd like to test this over SSH ensure you are using the flag
-Llocalhost:8188:localhost:8188to forward the port for the ComfyUI to your connecting machine - Next step is to prepare ComfyUI for use with this workflow so you'll need to install some model files and some custom nodes.
JetsonAGXOrin - ComfyUI - Custom Nodes:
- ComfyUI Manager - makes it easier to install nodes
- ComfyUI ControlNet Aux - used for ControlNet nodes
- ComfyUI Tooling Nodes - provides API based nodes such as loading base64 image and sending images over a websocket
Additionally I made a small change to my local version of the LoadBase64Image node from ComfyUI Tooling Nodes adding this logic:
+
+ @classmethod
+ def IS_CHANGED(s):
+ return float("NaN");IIRC the reason I added this was that before I did the websocket only was sent an image if it was a new one making the development process frustrating as duplicate images sent would not go through the full flow. This likely is not be needed for normal use but documenting it here.
JetsonAGXOrin - ComfyUI - Model Files:
- SDXL - should be placed in the ComfyUI/models/checkpoints folder
- Hyper SDXL 2 Steps LoRA - should be placed in the ComfyUI/models/loras folder
- ControlNet++ SDXL - should be placed in the ComfyUI/models/controlnet folder
Note: DepthAnythingV2S should automatically download on your first usage of this workflow from the controlnet nodes. You do not need to download it yourself ahead of time but be prepared for a slower first render and the need for internet connectivity initially.
JetsonAGXOrin - ComfyUI - Testing:
Attached to this project is the json I used for testing. It's very similar to the end json used by the script although that one does not use a preview image nor the load image nodes in favor of just API based nodes. This one can be used for demoing the flow I used though and for making modifications if you can come up with a better approach for isolating the target.
Once you've downloaded the demo json file you can open it from the ComfyUI via the top left menu -> Open.
The workflow is fairly simple albeit it looks complicated at a first glance. The model loads the base model utilizes the Hyper SDXL 2 step LoRA and then uses the positive and negative prompts along with the ControlNet code to shape the conditioning. Once that's handled the output from the ControlNet passes into a KSampler node which handles the generation and then the VAE decode to create the end image. For the demo the load image node is used but the base64 image one is used for the API allowing it to avoid the need to upload the machine first. For generation the model uses the input image VAE encodes it and uses it as the latent image. To further control the generation process DepthAnythingV2S is used to create a depth map which is used as the input image for the ControlNet while additionally put through a threshold check to remove the background, shrunk, feathered, inverted, and used as a latent mask to prevent noise from affecting the underlying image. This was my approach for modifying the background while keeping the core subject mostly the same. Given the shrinking and feathering you can still do fun things like make a dog into a bear or give yourself some head wear but it maintains the feel for the look of the user vs a completely new version of yourself staring back.
JetsonAGXOrin - ComfyUI - Notes:
- SDXL and Hyper SDXL were used here for speed purposes, my goal was to make a very fast render on the Jetson AGX Orin device. With these two combined and only 2 steps involved I see renders around 2s in time. This allows me to rapidly update the screen and give a better user experience. I'm a big fan of Flux Dev/Schnell and other models but the speed is just too slow for this sort of project. SD 1.5 would have been faster but I find the larger size image better given I'm using a monitor screen so I don't want it to become blurry. It was a tradeoff here.
- ComfyUI has made the process of iterating on the API a lot easier for me in general than Standard Diffusion Web UI as I used with the Infinite Sands project. That other UI can work as well with the API extension but ComfyUI just feels a lot more natural when layering effects.
- The nodes structure used here may be confusing at first but once you've worked with ComfyUI a bit you'll get the hang of it as it's really about thinking iteratively through the refinement process and the direction of the data through the nodes. For example, one benefit is you can hand off to another model for additional refinement allowing complex processes that would be a lot harder outside of custom code to support in some other UIs.
Seeed Xiao ESP32S3 Sense:
The Xiao line of microcontrollers happens to be my favorite so I was happy to integrate it as a component used by this project. They are super tiny and the team at Seeed Studio has been great about documenting various use cases, example projects, and have spent a lot of time thinking about end users and how they'd integrate with their projects. Many of my custom boards (Hive Helper, Glowbug Mini, Level-Up Board) use both a Xiao as the microcontroller board along with JST-HY 2.0/Grove connectors as they make the whole process of building out a hobby project easy.
For this project I'm mostly using the board without modifications. I am using a button and a buzzer, from Seeed as well as a Grove Shield to make the connections secure. The Grove shield is nice as it has the headers to connect to the Xiao sized pins and then multiple Grove connectors for use with those pins without needing to solder or worry about lose headers. Additionally I am using double sided sticky tack (not sure of the real name) to fix the Grove Shield with below the mirror box. One could do it above but you may need some longer USB cables for powering your device. For the button I'm using the "0, 1, 3V3, GND" labeled port on the Grove Shield which corresponds with the A0 and A1 on the board. For the buzzer I'm using the "SCL - 5, SDA - 4, 3V3, GND" labeled port on the shield but only utilizing pin A5 as the Grove buzzer uses only three wires.
Seeed Xiao ESP32S3 Sense-Code:
The code for this project is in the attached github repository in the associated xiao-nrf52840-sense folder. It's a platform.io project. The project expects src/secrets.h file to be constructed with a structure like this (with values according to your specific network and the internal IP address associated with your Jetson):
#pragma once
#define WIFI_SSID "SSID"
#define WIFI_PASS "Password"
#define JETSON_SERVER "192.168.1.189"Once that has been built and firmware has been deployed the device will try once every minute to connect to the status endpoint on the API the Jetson AGX Orin is running. After a connection is made successfully the device will continue trying to send a photo to it on the network. If the user presses the button the device the buzzer signals a noise indicating it's listening allowing the user to then record their message. Once the user presses the button again another signal is activated via the buzzer and the device submits its sound recording to the Jetson's API endpoint for processing with whisper and the updated prompt.
Running the CodeThe final step after preparing the Jetson AGX Orin and Xiao ESP32S3 Sense firmware is to run the server logic on the Jetson. I setup this logic such that I'd be able to set prompts manually as well as via the API based approach aforementioned so having the terminal setup for the connection allows me to do that easily.
- Go into the code location (parent of the ComfyUI directory) and create a venv there like you did earlier with ComfyUI just at the high level:
python3 -m venv venv - Every time you use this project you'll need to activate it, do so now:
sourcevenv/bin/activate - Install requirements below (I believe I captured them all here but feel free to reach out to me if you run into any issues or find you're missing a requirement and I can assist)
sudo apt-get install libgtk2.0-dev
sudo apt-get install portaudio19-dev
sudo apt install screen
sudo apt install git
sudo apt install ffmpeg
pip install soundfile
pip install screeninfo
pip install websocket-client
pip install flask
pip install openai-whisper sounddevice numpy
pip install --pre torch torchvision torchaudio --index-url https://pypi.jetson-ai-lab.io/jp6/cu126
pip install transformers
pip install opencv-python-headless- With the requirements in place you can run the scripts now.
interactive-table.sh - Used to start ComfyUI, it assumes there's a venv virtual environment inside of the ComfyUI folder
infinite-mirror.py - Used to start the main server logic and handle all of the prompting to the ComfyUI server
infinite-mirror-api.json - JSON payload for ComfyUI request, includes nodes needed for input via a Base64 image and outputting an image via a websocketTo start a session you begin by SSHing into the Jetson, switch to the interactive-table folder, and then:
nohup ./interactive-mirror.sh &
screen -S display_session
source venv/bin/activate
export DISPLAY=:1
python3 interactive-mirror.pynohup with the shell script and the ampersand allows the script to run in the background. This shell script starts ComfyUI so it effectively prepares that to run for all requested needed by the project.
Screen is then used, the virtual env is activated, display 1 selected (for the Jetson monitor) and the script for the API is started. This completes the setup as once the script is running it is now available providing API endpoints and the Xiao ESP32S3 now can connect beginning the display process.
Journey / Background InformationMy initial approach for this project was very different than the end result. I had hoped to repurpose some logic from my Infinite Sands project and create a unique tabletop for generating images based on the content placed on the table. This had a major blocker though as the reflective surface proved to cause problems with the depthmap generation as images from the ceiling of the room were visible in it. After that I tried reusing the clear arylic from the shadow box as I still had it around but that proved to also cause reflections albeit a bit less so. The final issue with my attempt though came down to how it behaved in practice: while sand rests flat against the background here I was trying to place 3D objects so the generated image included the top view of those objects rendered under those objects. You can imagine having a monster truck car on the table and under its image you see the car from most angles as result; it just didn't look very good.
As such I felt the need to seriously reconsider or even scrap the project. That said, after some thinking, I realized that the core of this setup still would work well if I used it in a mounted manner and abandoned the idea of having the camera synced to the surface as was done with the sand project and my initial idea here. I had to significantly alter my python code but it mostly was just removing functionality like the ChArUco board and the need to clear the content each render.
Acknowledgements / License InformationComfyUI - GNU General Public License v3.0
SDXL - CreativeML Open RAIL++-M License
Hyper SDXL - Bytedance Inc. License
DepthAnything-V2-Small - Apache license 2.0
ControlNet++ for SDXL - Apache license 2.0
Fullscreen - MIT License
@article{depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv:2406.09414},
year={2024}
}







Comments