There are many a day when I find myself scrolling through the subreddit for r/dataengineerg, it’s a fun place to stalk. Lot’s of people with lots of opinions make for interesting times. I see one question or a variation of it come up over and over again. How do I learn data engineering skills, how […]

What is the standard for most data engineers these days? Turns out SQL and Python are still running the show pretty much across the board. There’s always a variety of skills in those areas, some better, some worse, although with a little work and repetition it’s pretty easy to master both SQL and Python. I’ve […]

Good ole’ string slicing. That’s one thing that never changes in Data Engineering, working with strings. You would think we would all get to row up some day and do the complicated stuff, but apparently you can’t outrun your past. I blame this mostly on the data and old schools companies. Plain text and flat […]

Data Lake, Data Warehouse, Lake House, Data Mart, it’s always something isn’t it? Don’t get me started on Data Mesh. Yikes, it’s hard to keep up these days. I want to explore the Data Lake vs the Data Warehouse and what it really all boils down to, what is the real difference. Is it data […]

As someone who worked around the classic Data Warehouses back in the day, before s3 took over and SQL Server and Oracle ruled the day … I love sitting on the sidelines watching new … yet old battle-lines being re-drawn. I could probably scroll back in StackOverflow 12 years and find the same arguments and […]

One of the reoccurring complaints you always see being parroted by the smarter-then-anyone-else-on-the-internet Reddit lurkers is the slowness of Python. I mean I understand the complaint …. but I don’t understand the complaint. Python is what is is, and usually is the best at what it is, hence its ubiquitous nature. I’ve been dabbling with […]

The two coolest kids in class … I mean seriously … every other post in Data Engineering world these days is about Apache Airflow or DataBricks. It’s hard to kick against the goad. Just jump on the band wagon before you get left in the dust. I’ve used both DataBricks and Apache Airflow, they both […]

Apache Druid, kinda like that second cousin you know about … but don’t really know. When you see them for the first time in 10 years you kinda look at them out of the corner of your eye. That’s how I feel about Apache Druid, I’ve always known it has been there, lurking around in […]

When I used to think of lambda functions on AWS my eyes would glaze over, I would roll my eyes and say, “I work with big data, what in the world can a silly little AWS lambda function offer me?” I’ve had to eat my own words, those little suckers come in handy in my […]

This is a topic I’ve been musing about lately. The idempotent data load has been a source of much pain and suffering in the lives of many a data engineer and data warehouse developers. Apparently somethings don’t change with the passage of time. My first job in tech was working on a data warehouse team […]