Simplicity in the Modern Data Stack
We have all come to live in the Modern Data Stack, and whether we like it or not, our lives are no longer as simple as they were in the days of SQL Server and SSIS. Things have changed A LOT. There are good and bad sides to that coin. The Modern Data Stack has brought us amazing innovations and tools and made things possible that were simply unheard of before.
One of the main downsides that has seen some backlash is the complexity of some Data Platforms. One could argue that it’s the plethora of new tools all mixed and mashed together like your Grandma’s potatoes.
Sometimes, it’s just that the tools themselves become one-stop shops and, therefore, unmanageable and unusable. Either way, to be cool, smart, and hip now as a data builder, you need to find an easy and simple solution.
Simplicity in AWS Lambda, DuckDB, and Delta Lake.
I recently did a little project, to prove a point, that it is possible to accomplish things these days in a simple fashion, to get back to the old days where Snowflake and Databricks aren’t required for every single thing.
I took a normal task of …
- a CSV file landing in S3
- needing to process that file into a Lake House
Now, today, if you say Lake House than before you finish the sentence someone is spinning up a EMR, Databricks, or Snowflake cluster to burn some compute moving that tiny CSV file into the Lake House in Delta Lake or Iceberg. Not me.
I go against the grain.
No fancy infrastructure needed here, just an AWS Lambda combined with some DuckDB is more than able to do the job. File hits S3, Lambda kicks off and picks up file, inserts some data into one Delta Lake, calculate some analytics and write those metrics to another Delta Lake.
The best part about solutions like this is that you don’t have complicated infrastructure to manage and maintain. Sure, this is just a toy example, but the whole idea is to show you what is possible. That simplicity is possible, that you can think outside the box that you have been thinking in for years now.
Simple is good, it gets the job done, is usually cheaper, and requires less effort to maintain, debug, and develop.