Reasons for using Python: it is easier to find code on github for reuse and tweaking, most novel research publishes in PyTorch, there is a significant network effect if you follow cutting edge.
Second reason - to fail fast. No sense in sculpting novel ideas in C++ while you can muddle with Python 3x faster, that's code intended to be used just a few times, on a single computer or cluster. That was an era dominated by research, not deployments.
Llama.cpp was only possible after the neural architecture stabilized and they could focus on a narrow subset of basic functions needed by LLMs for inference.
Second reason - to fail fast. No sense in sculpting novel ideas in C++ while you can muddle with Python 3x faster, that's code intended to be used just a few times, on a single computer or cluster. That was an era dominated by research, not deployments.
Llama.cpp was only possible after the neural architecture stabilized and they could focus on a narrow subset of basic functions needed by LLMs for inference.