My five year old is just starting to learn to spell, using letters to form words and sounding out the words. She is also learning Spanish! I thought it would be fun to combine all those things with my computer vision learning path!
So I created Spelling Challenge for Kids! The below video shows a quick demo:
I set up an initial bank of 6 words for my daughter to solve, but the code can be expanded to more.
Here's an overall summary of how I went about doing this:1. Find a trained model that identifies written English letters similar to the mnist model for numerical digits.
2. I couldn't find that model (D'oh!), so I had to build my own. Crack open Colab!
3. Download the Kaggle A-Z dataset so I at least had a dataset to start with.
4. Develop, train, and save the model as a Tensorflow Lite model (see the Jupiter notebook in github). I also quantized the model but the tflite model was small enough already.
5. Start coding on my Jetson Nano Development Kit (with RPi camera and audio hat).
6. Lots of trial and error testing!
Please take a look at my github for the full source code. This was a lot of fun because I usually do transfer learning on pre-trained models. This was my first time generating a new model from scratch using Colab.
The time I spent coding was a good exercise in ensuring that the input data are the same dimensions (and same format) as the model data.
My initial model runs were not impressive. It turned out it was because the model data assumed a black background with white text, while I was passing in dark text on a white background. I had to invert the colors of the image before I passed it into the model. Also be sure you are scaling!
I further improved the performance of the model by filtering the pixel values of the input image. If the pixel value was less than a certain intensity (I picked 160), then that meant that it was likely the background, and not part of the letter. I set that value to 0 (black), providing a great contrast between the letter and the background.
NextSteps
So now that I have it working pretty well, I would like to expand the dataset to include more objects/animals for my child to spell out. I would also like to make it more portable. Right now I have my Jetson Nano hooked up to my monitor via HDMI. I would either get a small HDMI display, or port it back over to my Raspberry Pi 4 (I started on the RPi, but moved over to the Jetson Nano) and potentially use the Adafruit machine learning HAT that I have (which comes with a small display). Also, you can see that I am not UI/UX person. I cobble something together to get it working. If you have any suggestions on how to create nice UIs in OpenCV, please let me know!
If you have any ideas or questions, feel free to reach out! Thanks!












Comments