Pandas is terrific, yet even its original author has noted inherent shortcomings [1], and there exist alternatives.
Polars seems to be the most prominent competitor in the Python DataFrame space, and DuckDB appears to pursue an approach similar to SQLite, but columnar.
I am personally working on a solution to a broader problem, which can also be viewed as an alternative to Pandas [2].
Ive used it but definitely ran into issues where Ibis couldnt handle a transformation and had to move back into Polars or DuckDB to do. I just eventually stripped it out.
Build this as an interactive tool for our popular 101 Pandas exercises. The code runs entirely in local in your browser. Would love feedback on the ease of use and the editor UX.
Dope. I've just started using Pandas in some personal projects, and am quickly hitting my knowledge ceiling. I think this will be useful. I'll check it out properly after work.
If I were investing effort into acquiring knowledge in this domain, I'd skip straight to Polars. Before I made the switch, I had been using Pandas on and off for more than a decade. I'm not sure how representative this is, but most of the people I know who were Pandas users have also made this switch. I initially did it for the performance improvements but the API (according to my subjective opinion) is much more logical and has far fewer surprises compared to Pandas and it would be my default choice for this reason alone at this stage despite my years of Pandas experience.
I'd second this, especially if its just for personal use!
The data world owes a lot to pandas, but it has plenty of sharp edges and using it can sometimes involve pretty close knowledge of how things like indexing/slicing/etc work under the hood.
If I get stuck in polars, its almost always just a "what's the name of the function to use?" type problem rather than needing lots of knowledge about how things are working under the hood.
Ehhh, there are still plenty of pandas idioms for which there is only a clunky polars equivalent. Sure, I like the strictness of polars, but it is missing some day-to-day functionality. Which might never change - the polars team is trying to be disciplined about having a consistent and performant API, which does mean some functionality may be left behind.
It's less about performance and more about ecosystem lockin. It's a bit like imperial vs metric units. Why would you ever chose to learn imperial if you had the option to only ever use metric to begin with?
That's exactly why I am reluctant to do anything with Polars. They are actively running a company and trying to sell a product. At any point they could be acquired and change the license for new releases. Sure you could fork it, or stay on an older version, but if what they offer isn't compelling enough for you then why take the risk?
Pandas on the other hand has been open source for almost two decades, and is supported by many companies. They have a governance board, and an active community. The risk of it going off the rails into corporate nonsense is much lower.
- Pandas is interwoven into downstream projects. So it will be here to stay for a long time. This is good for maintenance and stability. Advantage: Pandas.
- OTOH, the Pandas experience is awful; this was obvious to many from the outset, and yet it persisted. I haven't tracked the history. But my guess would be the competition from Polars was a key pressure for improvement. Edge: Polars.
- Lots of Python projects are moving to Rust-backed tooling: uv, Polars, etc. Front-end users get the convenience of Python and tool-developers get the confidence & capabilities of Rust. Edge: Polars.
- Pandas has a governance structure not tied to one company. Polars does not. (comment above said this) Advantage: Pandas.
But this could change. Polars users could (and may already be?) pressing for company-independent governance.
Because these are silly personal scripts. I'm not going to make sensible architectural decisions on something I run every now and then on my laptop. That's optimising too early.
For short scripts and interactive research work, pandas is still much better than polars. Polars works well when you know what you want.
When you are still figuring out things step by step, pandas does a lot of heavy lifting for you so you don't have to think about it.
E.g. I don't have to think about timeseries alignment, pandas handles that for me implicitly because dataframes can be indexed by timestamps. Polars has timeseries support, but I need to write a paragraph of extra code to deal with it.
> [Polars] is much more logical and has far fewer surprises compared to Pandas
A kind understatement imo. For me, the following experiences are highly coupled in my brain: "I'm using Pandas" + "I'm feeling a weird combination of confusion and pain" + "This is a dumpster fire".
I second this blog post. I worked with Tom on a project several years ago and he's brilliant. Started doing python more frequently after that project and I found his blog to be very helpful in finding a good way to conceptualize pandas and python data structures in general.
You'll get a lot of responses saying Polars is better than Pandas. I argue those people are missing the point and don't understand Pandas' real strength or why people choose Pandas today.
Pandas was never meant to be a technologist's tool. It was meant to be a researcher's tool and was unfortunately coopted to be a technical solution as well. It has not well escaped it's roots.
Pandas is fantastic for doing iterative and interactive research on semi-structured data. It has a lot of QoL facilities and utility functions for seamlessly dealing with exploratory timeseries analytics for in-core data. Data that fits into memory.
For example, I can take two time series and calculate their product:
ts3 = ts1 * ts2
This one line does a huge amount of heavily lifting by automatically aligning the timestamps and columns between the two inputs so that I'm not accidentally multiplying two entries that have the same ordinal but not the same timestamp or column label.
Can I do the same with Polars? Yes, but it comes with exponentially more cognitive overhead. And this is just one example.
Pandas is ultimately a flawed product as it's origin's go back more than a decade where R's dataframe was cutting edge. A lot of innovation happened since then and the API and internals of Pandas mean that certain choices that were made early on are nontrivial to change.
This doesn't change the fact that Pandas is still immensely useful. Eventually perhaps Polars will come close to it, but so far the focus wasn't on interactive use ergonomics unfortunately.
As it stands, I use pandas for research and polars for production systems.
Polars seems to be the most prominent competitor in the Python DataFrame space, and DuckDB appears to pursue an approach similar to SQLite, but columnar.
I am personally working on a solution to a broader problem, which can also be viewed as an alternative to Pandas [2].
[1] https://wesmckinney.com/blog/apache-arrow-pandas-internals/
[2] https://github.com/ronfriedhaber/autark
reply