Home - Confessions of a Data Guy

Big Data, Data, Data Engineering, Data Warehousing, Python

Why Data Engineer’s should use AWS Lambda Functions.

When I used to think of lambda functions on AWS my eyes would glaze over, I would roll my eyes and say, “I work with big data, what in the world can a silly little AWS lambda function offer me?” I’ve had to eat my own words, those little suckers come in handy in my […]

June 2, 2021

Big Data, Data, Data Engineering, Data Warehousing

The Elusive Idempotent Data Load/ETL

This is a topic I’ve been musing about lately. The idempotent data load has been a source of much pain and suffering in the lives of many a data engineer and data warehouse developers. Apparently somethings don’t change with the passage of time. My first job in tech was working on a data warehouse team […]

May 24, 2021

Data, Data Engineering, Data Warehousing

Data Modeling in DeltaLake (DataBricks)

Time to open a can of worms. I’ve recently been working with DataBricks, specifically DeltaLake (which I wrote about here). DeltaLake is an amazing tool that when paired with Apache Spark, is like the juggernaut of Big Data. The old is new, the new is old. The rise of DataBricks and DeltaLake is proof of […]

May 10, 2021

Big Data, Data, Data Engineering, Python

Airflow vs Dagster

Dagster, the first few times I read the name, I just couldn’t take the tech stack seriously …. it’s still kinda hard. Today I want to compare Airflow vs Dagster, mostly explore what Dagster is and does. But I want to compare it to the popular Apache Airflow project so people have some context for […]

April 26, 2021

Data, Data Engineering, Ramblings

What the Marooned Ben Gunn Teaches us about Solo Data Engineers

I always envied Ben Gunn in Treasure Island a little bit. Alone all those years, digging up gold and treasure, hunting wild goat, and living in a nice little cave. Living off the land, king of his island, gone half mad, but somewhat still there. Happy to see other people, but always a little bit […]

April 25, 2021

Big Data, Data, Data Engineering, Python

Introduction to Apache Flink for Data Engineers

Not going to lie. I’ve been trying to figure out for awhile where Apache Flink fits in the Data Engineering world for awhile now. A year or two ago I didn’t seem much content posted about it, but it seems to be picking up stream. I’ve mostly managed to avoid understanding what Flink is or […]

April 21, 2021

Ramblings

Contract to Hire (Take a Chance With Us …. We Won’t With You).

LIfe’s no fun if you don’t keep things interesting. It’s time to ruffle a few feathers.

April 17, 2021

Big Data, Data, Data Engineering, Data Warehousing, Ramblings

The 3 Types of Data Engineers, Which One Are You?

Every good story starts with a few different characters right? It’s like the spice of life, little bit of this, little bit of that. It’s the way of the world. In all my data wandering I’ve come across lot’s of different types of data engineers. I can usually put them into three different categories, somewhat […]

April 7, 2021

Data, Data Engineering, Ramblings

A Piece of DevOps that most Data Engineer’s Ignore.

I am always amused by the apparent contradictory nature of working in the world of data. There is always bits and pieces that come and go, the popular, the out of style … new technology driving new approaches and practices. One of the hot topics the last decade has produced is DevOps, a now staple […]

April 2, 2021

Data, Data Engineering, Python

Introduction to Unit Testing with PySpark.

There are few things in life that are worse then cracking open some serious PySpark pipeline code, and then realizing there isn’t a single function written to encapsulate logic … wondering if some change you are about to make will bring down the whole pipeline. When you are new to a codebase you don’t know […]

March 26, 2021

Why Data Engineer’s should use AWS Lambda Functions.

The Elusive Idempotent Data Load/ETL

Data Modeling in DeltaLake (DataBricks)

Airflow vs Dagster

What the Marooned Ben Gunn Teaches us about Solo Data Engineers

Introduction to Apache Flink for Data Engineers

Contract to Hire (Take a Chance With Us …. We Won’t With You).

The 3 Types of Data Engineers, Which One Are You?

A Piece of DevOps that most Data Engineer’s Ignore.

Introduction to Unit Testing with PySpark.

Interesting links

Pages

Categories

Archive