Detecting pitch with the Web Audio API and autocorrelation

fxtentacle · on March 22, 2022

As someone who has worked in pitch recognition for a piano app, I'm doubtful that autocorrelation can work. Typically, the energy in overtones is much larger than the energy in the base pitch, which is also why phantom bass works so well in EDM.

Also, the plucking sound when the hammer hits the string is much closer to pink noise than to its actual pitch. I'd expect an autocorrelation to have plenty of false positives there.

FFT with remapping between bins works well. The remapping can accumulate the energy of overtones into the frequency bins of possible base pitches, thereby resolving the ambiguity.

omnicognate · on March 22, 2022

Autocorrelation is the simplest and worst pitch detection algorithm. It's highly prone to "octave errors" (not necessarily off by an actual octave, it's just the term for detecting a harmonic/subharmonic of the "true" pitch).

Pitch detection algorithms are a fascinating rabbit hole, and designing a good one for a given set of requirements is a real art.

Edit: One thing autocorrelation is quite effective for is autotune. Here, you need to snap to the nearest (12-tone equal temperament) note, and it turns out the ratio you calculate to perform that correction is unaffected by the most common octave errors. Eg. If I detect your slightly flat D4 as a slightly flat D5, the correction to get to the nearby D is the same.

lostmsu · on March 22, 2022

Any good place to get a summary of pitch detection algorithms?

omnicognate · on March 22, 2022

Not that I know of unfortunately. When I needed one I read a bunch of papers, tried out various open source ones and ended up designing one myself.

kotxig · on March 22, 2022

You can compute an autocorrelation with FFT's by applying the convolution theorem which IIRC the audio api can do the FFTs for you. I also found the the YIN estimator is a lot better as a time domain estimator http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf and some years ago I worked out how to compute that estimator with FFTs also.

joren- · on March 22, 2022

For a Javascript YIN implementation, see https://github.com/peterkhayes/pitchfinder I have used it in a live spectrogram visualisation: https://0110.be/phd/presentation/spectrogram/live.html

dspig · on March 22, 2022

> One way to get around this would be to decrease bucket size by increasing the FFT size.

A better way is parabolic interpolation, which is in the source code but not mentioned in the article - and that works for finding the fractional position of peaks in the FFT or in the autocorrelation.

An even better way is by comparing the phase of the peak in two successive FFTs: If the signal phase has changed by X degrees after T seconds, what's the nearest frequency to the bin centre that can be true for? (this is the main thing a "phase vocoder" does)

PianoGym · on March 22, 2022

This is fantastic! I've been looking for a way to do this myself on my website https://pianogym.com!

Right now we rely heavily on MIDI input from the Web Audio API and it's been my dream to make it so any instrument can use the website!

You are so cool! Thank you for sharing this!

dr_dshiv · on March 22, 2022

How do you measure how dissonant a piece of music is, either at a given time or as a whole?

No one knows!

Assumptions:

1. Real music. I.e., should demonstrate that a punk song is more dissonant than a recording of Vivaldi.

2. Dissonance defined based on published literature.