It is a Brave New World out there these days. The new tools and features come out faster than your mom on Sunday morning getting you ready for church. The same goes for the context and advice being produced on a myriad of platforms, the ole’ Like and Subscribe, and all that bit. It does make you wonder after a while, what you can trust, who has your best interest in mind, and who is selling you a bottle of snake oil, doesn’t it?

Today we talk about Data Modeling. Specifically Data Modeling in the new world we all live in christened The Lakehouse by our benevolent Vender Overlords.

Read more

Did you know there are only 3 types of Data Engineers? It’s true. I hope you are the right one.

Over the many years I’ve been pounding my keyboard … Perl, PHP, Python, C#, Rust … whatever … I, like most programmers, built up a certain disdain for what is called Low Code / No Code solutions. In my rush to worship at the feet of the code we create, I failed, in the beginning, to recognize some important axioms …

Read more

Polars is the hot new Rust based Python Dataframe tool that is taking over the world and destryoing Pandas even as we speak. You want the quick and dirty introduction to Polars? Look no farther.

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large datasets … to today’s world, with Databricks, where Pandas-sized data is crunched with Spark.

Read more

Recently, for some unknown reason, I was pursuing the new Stackoverflow … called Reddit, for Data Engineering … and I ran across an interesting question … more or less it was related to “what makes a good Software Engineer … in a Data Engineering context.

Read more

One thing I find myself doing these days (I am unsure how I feel about this), is teaching others to solve problems … Data Engineering problems to be specific. It’s not a hard stretch for most to imagine that what a person does at Senior+ software-type levels is just write good code all day.

I assure you, this is not the case typically.

Read more

I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format.

So, today we are going to do an experiment.

What if instead of writing a Data Pipeline in Polars, or another tool … that uses Arrow under the hood … what if we actually write a data pipeline with Arrow?

Read more