Home - Confessions of a Data Guy

What is a Data Platform?

You know, for all the hoards of content, books, and videos produced in the “Data Space” over the last few years, famous or others, it seems I find there are volumes of information on the pieces and parts of working in Data. It could be Data Quality, Data Modeling, Data Pipelines, Data Storage, Compute, and […]

January 8, 2025

Uncategorized

Building a Fast, Light, and CHEAP Lake House with DuckDB, Delta Lake, and AWS Lambda

Building fun things is a real part of Data Engineering. Using your creative side when building a Lake House is possible, and using tools that are outside the normal box can sometimes be preferable. Checkout this video where I dive into how I build just such a Lake House using Modern Data Stack tools like […]

January 8, 2025

Uncategorized

Using DuckDB to read JSON files in S3

I’ve been playing around more and more lately with DuckDB. It’s a popular SQL-based tool that is lightweight and easy to use, probably one of the easiest tools to install and use. I mean, who doesn’t know how to pip install something and write SQL? Probably the very first thing you learn when cutting your […]

January 7, 2025

Data, Data Engineering

Simplicity in the Modern Data Stack

We have all come to live in the Modern Data Stack, and whether we like it or not, our lives are no longer as simple as they were in the days of SQL Server and SSIS. Things have changed A LOT. There are good and bad sides to that coin. The Modern Data Stack has […]

January 1, 2025

Data, Data Engineering, DuckDB, Python

DuckDB reading CSVs from S3.

Recently, I was working on a little learning around DuckDB and AWS Lambda, which included some work with S3. It had been some time since I had tried working with files in S3, and it was kinda clunky the last time I tried it, whether it was DuckDB’s fault or mine, I was unsure. It […]

December 31, 2024

Big Data, Data, Data Engineering

Data Contracts were a LIE!

Today we talk about what is really going on with Data Contracts, they came in like a rocket a few years ago, but then died on the vine. What’s the deal?

December 13, 2024

Big Data, Data, Data Engineering

AWS S3 Tables. Technical Introduction.

Well, everyone is abuzz with the recently announced S3 Tables that came out of AWS reinvent this year. I’m going to call fools gold on this one right out of the gate. I tried them out, in real life that is, not just some marketing buzz, and it will leave most people, not all, be […]

December 7, 2024

Uncategorized

Turkey Day Is Here – Black Friday Sale – %50 Off

Well, another turkey day has come upon us all. I trust you are getting at least a day or two off from your overlords from writing code and taking names. While the rest of you will be slicing up that turkey with your friends and family, clinking your glasses and giving toasts to each other, […]

November 29, 2024

Big Data, Data, Data Engineering, Python, SQL

DuckDB … reading from s3 … with AWS Credentials and more.

In my never-ending quest to plumb the most boring depths of every single data tool on the market, I found myself annoyed when recently using DuckDB for a benchmark that was reading parquet files from s3. What was not clear, or easy, was trying to figure out how DuckDB would LIKE to read default AWS […]

November 18, 2024

Data, Data Engineering, Python

Testing DuckDB’s Large Than Memory Processing Capabilities.

I am a glutton for punishment, a harbinger of tidings, a storm crow, a prophet of the data land, my sole purpose is to plumb the depths of the tools we use every day in Data Engineering. I find the good, the bad, the ugly, and splay them out before you, string ’em up and […]

October 31, 2024

What is a Data Platform?

Building a Fast, Light, and CHEAP Lake House with DuckDB, Delta Lake, and AWS Lambda

Using DuckDB to read JSON files in S3

Simplicity in the Modern Data Stack

DuckDB reading CSVs from S3.

Data Contracts were a LIE!

AWS S3 Tables. Technical Introduction.

Turkey Day Is Here – Black Friday Sale – %50 Off

DuckDB … reading from s3 … with AWS Credentials and more.

Testing DuckDB’s Large Than Memory Processing Capabilities.

Interesting links

Pages

Categories

Archive