Real talk. Polars is all the rage. People love Spark. People use Spark for small data, but data is too big for Pandas. Spark runs on a local machine. Polars runs on a local machine. What do I choose, Spark or Polars? Does it matter?
I’ve written about Polars at different points, here, and here when discussing wider topics. I mean honestly, I think Polars is the best tool to come out in the last 5 years of Data Engineering. But I find it unwaveringly boring. Which is why it’s so popular.
It’s boring for anyone who has used Pandas, Spark, or other Dataframe tools a lot. Sure, it can be a cool breeze in the face of some poor sap who’s been chained down to Pandas by some boss hanging around from a bygone era. You know what I’m talking about.
But honestly, overall, if you’re just an average engineering piddling around with datasets on your machine, what should you choose? Spark or Polars. Let’s talk some real talk.