July 2024 - Confessions of a Data Guy

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format.

So, today we are going to do an experiment.

What if instead of writing a Data Pipeline in Polars, or another tool … that uses Arrow under the hood … what if we actually write a data pipeline with Arrow?

July 25, 2024

Big Data, Data, Data Engineering, Ramblings

The Abstractions Are Making You Dumb (rise of the Shallow Expert)

When I was young and full of myself, writing Perl and PHP, while your ma was still reading you a bedtime story and giving you a stuffy to fall asleep with, I had to program uphill, both ways, in the rain and snow. Not like you milk toast Data Engineers clickty clicking around Databricks and Snowflake UIs.

You want a server? Spin up your own Apache. Need a database? MySQL was the only game in town. Need a backend language? Perl was the cat’s meow.

July 10, 2024

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

The Abstractions Are Making You Dumb (rise of the Shallow Expert)

Interesting links

Pages

Categories

Archive