Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Detecting pitch with the Web Audio API and autocorrelation (alexanderell.is)
52 points by otras on March 22, 2022 | hide | past | favorite | 9 comments


As someone who has worked in pitch recognition for a piano app, I'm doubtful that autocorrelation can work. Typically, the energy in overtones is much larger than the energy in the base pitch, which is also why phantom bass works so well in EDM.

Also, the plucking sound when the hammer hits the string is much closer to pink noise than to its actual pitch. I'd expect an autocorrelation to have plenty of false positives there.

FFT with remapping between bins works well. The remapping can accumulate the energy of overtones into the frequency bins of possible base pitches, thereby resolving the ambiguity.


Autocorrelation is the simplest and worst pitch detection algorithm. It's highly prone to "octave errors" (not necessarily off by an actual octave, it's just the term for detecting a harmonic/subharmonic of the "true" pitch).

Pitch detection algorithms are a fascinating rabbit hole, and designing a good one for a given set of requirements is a real art.

Edit: One thing autocorrelation is quite effective for is autotune. Here, you need to snap to the nearest (12-tone equal temperament) note, and it turns out the ratio you calculate to perform that correction is unaffected by the most common octave errors. Eg. If I detect your slightly flat D4 as a slightly flat D5, the correction to get to the nearby D is the same.


Any good place to get a summary of pitch detection algorithms?


Not that I know of unfortunately. When I needed one I read a bunch of papers, tried out various open source ones and ended up designing one myself.


You can compute an autocorrelation with FFT's by applying the convolution theorem which IIRC the audio api can do the FFTs for you. I also found the the YIN estimator is a lot better as a time domain estimator http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf and some years ago I worked out how to compute that estimator with FFTs also.


For a Javascript YIN implementation, see https://github.com/peterkhayes/pitchfinder I have used it in a live spectrogram visualisation: https://0110.be/phd/presentation/spectrogram/live.html


> One way to get around this would be to decrease bucket size by increasing the FFT size.

A better way is parabolic interpolation, which is in the source code but not mentioned in the article - and that works for finding the fractional position of peaks in the FFT or in the autocorrelation.

An even better way is by comparing the phase of the peak in two successive FFTs: If the signal phase has changed by X degrees after T seconds, what's the nearest frequency to the bin centre that can be true for? (this is the main thing a "phase vocoder" does)


This is fantastic! I've been looking for a way to do this myself on my website https://pianogym.com!

Right now we rely heavily on MIDI input from the Web Audio API and it's been my dream to make it so any instrument can use the website!

You are so cool! Thank you for sharing this!


How do you measure how dissonant a piece of music is, either at a given time or as a whole?

No one knows!

Assumptions:

1. Real music. I.e., should demonstrate that a punk song is more dissonant than a recording of Vivaldi.

2. Dissonance defined based on published literature.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: