Published February 22, 2020 © CC BY

speakEZ

speakEZ uses USB MIDI and the i. MX RT1010 to enable vocoder solutions with polyphony in low-cost, small form factor devices.

BeginnerFull instructions provided1 hour1,886

Grand Prizes: 1st Place

Crossover Code Challenge

Things used in this project

Hardware components

100W, Hi-Z Dedicated Input

NXP MIMXRT1010-EVK Evaluation Board

USB Micro-B Male to USB A Female Adapter

NXP FRDM Board

3.5mm TRS Headphones

Any consumer headphones with a 3.5mm jack will do. A pair of mine with an integrated mic was muffled without the talk button depressed. Another set was fine. YMMV. Avoid phones that require an amplifier.

USB MIDI Keyboard and Cable

I used an Arturia Keystep sequencer during development and for the video demos. Also tested a Nektar Panorama T6. I recommend trying anything you have!

Story

I love making music. But gear often gets big, expensive, opaque, and functionally-limited. The i.MX RT series gives makers and manufacturers opportunities to craft their own flexible synthesizer applications into small, inexpensive and power efficient devices.

With speakEZ, a MIDI controller, a $4 USB adapter, and the MIMXRT1010-EVK, you have the building blocks for an interactive vocoder synth application that will empower creative musical techniques. This only scratches the surface of the musical and DSP applications for these MCUs.

For those without a MIDI keyboard/controller, have no fear! Hold the User Button (SW4) when you reset the EVK, and you can play through several prebuilt chords to try out the sound of "robot you". Change chords by pressing the same switch.

speakEZ demo on YouTube

speakEZ includes a bare bones MIDI driver using the framework of a USB host CDC example. Pending updates will include MIDI 2.0 class compliance. The standard has just been adopted by the MIDI Manufacturers Association. Future work is laid out in the "Limitations" section.

[UPDATE 11/13/2020: As I lack hardware to test MIDI 2.0, I'm deferring further work on this driver for the foreseeable future. Definitely leave a comment if you have tried such devices and think it's valuable for the open source community!]

Features Summary:

Wavetable synthesizer framework (selectable tables, full polyphony)
Headphone jack for audio out (3.5mm TRS)
Built-in microphone with vocoder modulation
USB OTG connection for USB MIDI controllers

i.MX RT1010 Peripherals Used:

LPI2C (1x)
SAI (1x)
LPUART (1x)
GPIOs (2x)
PIT (periodic interrupt timer) (1x)
USB OTG as Host (1x)

Terminology

Vocoder - A “voice coder” encodes human voice into frequency spectra for compression, transformation or encryption. In music, this transformed data can be used to modulate a carrier waveform with spoken formants, which makes it sound as though it is "talking".

Formants - The frequency components in spoken voice that distinguish vowel sounds. Ignoring the fundamental pitch of a word, the first three or four loudest frequency peaks are what identify an "uh" sound from an "æ" or an "u", etc.

MIDI - An abbreviation for “Musical Instrument Digital Interface”, which is a hardware and communication standard for transmitting and receiving electronic audio signals. The USB interpretation of MIDI is its own beast. See more info here.

Polyphony - Used here to describe sound generation with multiple simultaneous pitches and velocities. This translates to piano chords, a singing chorus, etc.

Sibilance - High-frequency voice content useful for identifying consonants (s, t, f, etc.) in words and for identifying unique human voices. This energy is captured poorly by vocoders.

Under The Hood

First, some more background. A musical vocoder listens to an audio input, usually a human voice, and analyzes the frequency content. It measures the sound intensity at various frequency bands, forming a power envelope. This envelope essentially maps the formants that make each spoken phoneme sound the way it does. In other words, this is the skeleton of the vowel sounds we use to talk.

This amplitude envelope is used to scale band pass filtered audio in real time. In this case, the band-passed carrier is a custom wavetable. Scaling these bands transfers the voice onto any audio you want, including polyphonic sounds controlled by MIDI! This process isn’t good at transferring sibilance. For that, we need to extract higher frequency sounds from our mic and mix them in with the vocoded audio directly.

With speakEZ on the MIMXRT1010-EVK, we use a WM8960 CODEC to receive voice audio and transmit our transformed synth audio.

Our synth is a wavetable synthesizer that is updated once per CODEC sample. It uses a pre-initialized array of points that represent a single cycle of our chosen waveform. Each time the synth is updated, it interpolates the appropriate points in the waveform depending on the sample rate and the desired note(s). This is how an array of fixed values can generate any frequency wave of any shape! The synthesizer sums any number of different keys at their assigned velocity (amplitude). And that's what makes it polyphonic.

Lastly, we are using MIDI-over-USB. The USB OTG peripheral is configured as a host to accept signals from our MIDI controller (device). Because a MIDI host traditionally has a full-size USB-A female port, we need our adapter to interface with our OTG jack. (Of course, if you have a cable with the appropriate ends, an adapter is not needed.) A custom driver is used to receive and parse the bulk packets for MIDI. This is a simplified explanation. I encourage you to explore the usbmidi.h and usbmidi.c files provided.

Instructions for Use

WARNING: This software has the potential to produce harmful sounds if amplified. Use caution when inserting headphones and powering on the device. Loud sounds into the microphone and clipping distortion can cause additional noise. For your safety, monitor your exposure to loud sounds.

A little voice goes a long way. Listen to your room acoustics to place the microphone and determine the appropriate speaking volume to achieve the desired effect.

First, make sure you have your MIMXRT1010-EVK board (with its USB cable) and a pair of headphones available. Download the most up-to-date commit archive at https://github.com/wandering-sounds/speakEZ/archive/master.zip.

Plug your EVK into your PC. The Micro-B end should go to the debug USB port (J41).
Open MCUXpresso IDE. Select "Import project(s) from file system..." in the Quickstart Panel.
Under "Project archive (zip)", select the "Browse..." button. Locate and select the archive ZIP you downloaded.
Click "Next" and then "Finish". The speakEZ project should appear in Project Explorer.
Click on the project folder. Then, click on the GUI Flash Tool in the top bar.
A window should display the debug probe for the EVK. Select "OK".
Under "Target Operations" > "Program" > "Options", find "File to program". Click on the "Workplace..." button.
Find and double-click on../Release/speakEZ.axf. Click "Run" to flash the chip.

Make sure to connect up your equipment as in the diagram below, then press the Reset Switch:

MIMXRT1010-EVK Connection Diagram for speakEZ

To try several demo chords without MIDI control, press and hold the User Button (SW4) while you press the POR Pin Reset (SW9) to restart the MCU. This will enter the no-MIDI demo. In this mode, pressing SW4 again will toggle the active chord being played. Any USB device connected to the OTG connector will be ignored. If you wish to exit this mode, press the Reset Switch by itself to enable the regular mode.

Connect your MIDI controller's USB cable to the adapter, and then attach the adapter to the USB OTG connector (J9). Pressing a note on the keyboard will play the corresponding note with the speakEZ wavetable synthesizer. Pressing multiple keys will play multiple notes simultaneously. Pressing SW4 in this mode will toggle the wavetable being used. The tables are:

Saw (default, standard vocoder sound)
Novel Waveform (harsh, gritty, good for single notes)
Sine (nearly silent, for demonstration purposes)
Triangle (slightly thicker than the sine wave)

Try out different methods of interacting with the vocoder. Try speaking into it, then singing into it. Try overemphasizing your vowels. Whispers can have a neat effect.

If you are willing to do some coding, try editing the band pass center frequencies (bandpassBiquadF0), bandwidths (analysisBiquadBWs, shapingBiquadBWs), and sibilance cutoff (kResample_Sibilance_HP) in speakEZ.h. These can all be changed to suit your needs. I used logarithmic spacing for the analysis/shaping bands. What if you made the bands denser below 1kHz? What if you added more bands? I found ~20 bands started causing performance bottlenecks.

A saw wave produces traditional sounds from vocoders, which is why it is the default tone. I have added one more unique wavetable shape to test. Why not add some more to the table bank? Rich tones full of harmonics create the most interesting sounds with vocoders.

Limitations

Even a simple saw wave has very high frequency components. These can alias on relatively low notes and result in unwanted harsh sounds. The wavetables used in any consumer or professional product should be band-limited to prevent this, either at initialization or on the fly.
Although this application works with MIDI devices, it does not incorporate control updates or modulation. MIDI is capable of so much more! When MIDI 2.0 is released this year, I intend on building a full driver for my own use and updating the project repository accordingly. [Please see the related update in the introductory section.]
The vocoder should have an efficient way of normalizing the envelope follower to prevent peaking. Maybe smoothing a transient max with a cheap ADSR. I haven't landed on an algorithm yet, but that's in the cards.
Adaptations of this logic in solid professional products would benefit from line-level audio outputs. Unused peripherals on this EVK should be removed, and components should be improved when desired at higher cost.
There are plenty more GPIO and analog inputs available on unused pins. The possibilities with a custom PCB are numerous. Why not add potentiometer control, a quadrature encoder, or external control voltages? The core structs and signal chain with speakEZ could be augmented with additional control methods, so experiment away!

License

The contents of this project are licensed under the 3-Clause BSD license, shown below:

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.