What Color Are Your Vowels?

One of the most exciting trends in technology is the ability to hook up our sensory and motor systems in splendid new ways. One example is Soundself, a virtual reality “technodelic” that puts you into an audiovisual feedback loop with your own voice. Soundself promised to turn your voice into a psychedelic experience, and I wrote a review explaining my experience with it and why I don’t think it quite lived up to the promise.

In this post I introduce a little demo I made that I call “What Color Are Your Vowels?”, and my hope is that it can illustrate what I think is possible with technodelics. Without further ado, you can try the demo out here. I have only tested it on MacOS (Safari/Chrome/Firefox), Windows 10 (Internet Explorer/Firefox), and iOS (Safari) so no promises it’ll work elsewhere. You start by sustaining three vowels and capturing the background noise level (works better with total silence though) for calibration, and then you’re free to make any other sounds you want and see what color they make.

How it Works

The color space that humans see is three-dimensional because humans have three types of cones in our retinas. If you fix brightness, however, then color space is two dimensional, and you can call it chromaticity space.

Fig 1. Chromaticity space is two dimensional because it fits on an XY plane (Wikipedia)

The dimensionality of vowel space has more caveats, but it’s more-or-less two dimensional, corresponding acoustically to the two frequencies of the first and second formants in the frequency spectrum and anatomically to the two directions of where high-to-low and back-to-front the tongue is.

Fig 2. Vowel space is also two-dimensional (Wikipedia)

What I do with my demo is overlay these two space, mapping each point in vowel space to a point in chromaticity space.

It took me ~15 hours and ~200 lines of Javascript to make this demo. This is the kind of thing that I wanted from Soundself. Take my voice, do some linguistically-aware processing, and turn it into compelling visuals that represent it faithfully even in the radically different medium. To do this well you need to have some idea of the parameters that are generated in speech (formants, aspiration, sibilation, rhoticity, etc), some of the parameters that are used in sound visualizers (symmetry, repetition, color, shape, motion, etc) and an artistic flourish in mapping the former set of parameters to the latter set of parameters. There is a lot of potential here to make something truly splendid!

[Footnote] A slightly more technical “How it Works”: I record frequency spectra for your /u/ /a/ and /i/ phonemes, and then take the dot-product of future spectra with those recordings for the R, G, and B levels, respectively. Feel free to dive into the source code, also on my GitHub.

How it Works

Share this:

Leave a comment Cancel reply