Because of noise in general, "best case" seems always like the best metric to me. Over a large number of run, you're likely to hit the "perfect" measurement with on a microbenchmark.
Otherwise, for an "adaptive" number of runs till enough time is spent to have some "confidence" on the measure, I've been fairly happy with: https://github.com/google/benchmark/
Otherwise, for an "adaptive" number of runs till enough time is spent to have some "confidence" on the measure, I've been fairly happy with: https://github.com/google/benchmark/