Thanks I that case my conclusion is that all the people saying that these models are "distilling SOTA models" are, by extension, also speculating. How can you distill what you don't have?
Only way I can think of is paying for synthesizing training data using SOTA models yourself. But yeah, I'm not aware of anyone publicly sharing that they did so it's also speculation.
The economics probably work out though, collecting, cleaning and preparing original datasets is very cumbersome.
What we do know for sure is that the SOTA providers are distilling their own models, I remember reading about this at least for Gemini (Flash is distilled) and Meta.