As someone who has worked in pitch recognition for a piano app, I'm doubtful that autocorrelation can work. Typically, the energy in overtones is much larger than the energy in the base pitch, which is also why phantom bass works so well in EDM.
Also, the plucking sound when the hammer hits the string is much closer to pink noise than to its actual pitch. I'd expect an autocorrelation to have plenty of false positives there.
FFT with remapping between bins works well. The remapping can accumulate the energy of overtones into the frequency bins of possible base pitches, thereby resolving the ambiguity.
Autocorrelation is the simplest and worst pitch detection algorithm. It's highly prone to "octave errors" (not necessarily off by an actual octave, it's just the term for detecting a harmonic/subharmonic of the "true" pitch).
Pitch detection algorithms are a fascinating rabbit hole, and designing a good one for a given set of requirements is a real art.
Edit: One thing autocorrelation is quite effective for is autotune. Here, you need to snap to the nearest (12-tone equal temperament) note, and it turns out the ratio you calculate to perform that correction is unaffected by the most common octave errors. Eg. If I detect your slightly flat D4 as a slightly flat D5, the correction to get to the nearby D is the same.
You can compute an autocorrelation with FFT's by applying the convolution theorem which IIRC the audio api can do the FFTs for you. I also found the the YIN estimator is a lot better as a time domain estimator http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf and some years ago I worked out how to compute that estimator with FFTs also.
> One way to get around this would be to decrease bucket size by increasing the FFT size.
A better way is parabolic interpolation, which is in the source code but not mentioned in the article - and that works for finding the fractional position of peaks in the FFT or in the autocorrelation.
An even better way is by comparing the phase of the peak in two successive FFTs: If the signal phase has changed by X degrees after T seconds, what's the nearest frequency to the bin centre that can be true for? (this is the main thing a "phase vocoder" does)
Also, the plucking sound when the hammer hits the string is much closer to pink noise than to its actual pitch. I'd expect an autocorrelation to have plenty of false positives there.
FFT with remapping between bins works well. The remapping can accumulate the energy of overtones into the frequency bins of possible base pitches, thereby resolving the ambiguity.