Data Engineering Archives - Page 3 of 23 - Confessions of a Data Guy

Data, Data Engineering, Data Warehousing

Data Modeling in the Brave New Lakehouse World

It is a Brave New World out there these days. The new tools and features come out faster than your mom on Sunday morning getting you ready for church. The same goes for the context and advice being produced on a myriad of platforms, the ole’ Like and Subscribe, and all that bit. It does make you wonder after a while, what you can trust, who has your best interest in mind, and who is selling you a bottle of snake oil, doesn’t it?

Today we talk about Data Modeling. Specifically Data Modeling in the new world we all live in christened The Lakehouse by our benevolent Vender Overlords.

September 19, 2024

Big Data, Data, Data Engineering

The 3 Types of Data Engineers.

Did you know there are only 3 types of Data Engineers? It’s true. I hope you are the right one.

September 13, 2024

Big Data, Data, Data Engineering

Streaming Postgres data to Databricks Delta Lake in Unity Catalog

Over the many years I’ve been pounding my keyboard … Perl, PHP, Python, C#, Rust … whatever … I, like most programmers, built up a certain disdain for what is called Low Code / No Code solutions. In my rush to worship at the feet of the code we create, I failed, in the beginning, to recognize some important axioms …

September 4, 2024

Data, Data Engineering, Python, Rust

Introduction to Polars in 2 Minutes

Polars is the hot new Rust based Python Dataframe tool that is taking over the world and destryoing Pandas even as we speak. You want the quick and dirty introduction to Polars? Look no farther.

September 4, 2024

Big Data, Data, Data Engineering

Apache Spark’s Most Annoying Use Case

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large datasets … to today’s world, with Databricks, where Pandas-sized data is crunched with Spark.

August 29, 2024

Data, Data Engineering, Ramblings

What is a “Good” Data or Software Engineer?

Recently, for some unknown reason, I was pursuing the new Stackoverflow … called Reddit, for Data Engineering … and I ran across an interesting question … more or less it was related to “what makes a good Software Engineer … in a Data Engineering context.”

August 20, 2024

Big Data, Data, Data Engineering, Ramblings

How to Solve Data Engineering Problems

One thing I find myself doing these days (I am unsure how I feel about this), is teaching others to solve problems … Data Engineering problems to be specific. It’s not a hard stretch for most to imagine that what a person does at Senior+ software-type levels is just write good code all day.

I assure you, this is not the case typically.

August 7, 2024

Big Data, Data, Data Engineering, Python

Snowflake is Dying??!! Data Breach!!

August 2, 2024

Big Data, Data, Data Engineering, Python

Daft: Distributed Dataframes with Python.

August 1, 2024

Data, Data Engineering, Python

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format.

So, today we are going to do an experiment.

What if instead of writing a Data Pipeline in Polars, or another tool … that uses Arrow under the hood … what if we actually write a data pipeline with Arrow?

July 25, 2024

Data Modeling in the Brave New Lakehouse World

The 3 Types of Data Engineers.

Streaming Postgres data to Databricks Delta Lake in Unity Catalog

Introduction to Polars in 2 Minutes

Apache Spark’s Most Annoying Use Case

What is a “Good” Data or Software Engineer?

How to Solve Data Engineering Problems

Snowflake is Dying??!! Data Breach!!

Daft: Distributed Dataframes with Python.

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Interesting links

Pages

Categories

Archive