Interesting links
Here are some interesting links for you! Enjoy your stay :)Pages
Categories
Archive
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- May 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
Reducing Complexity with Databricks + Delta Lake COPY INTO
As the years drag by in Data Engineering, there are a few things that I have come to appreciate more and more. One of those topics that is close to number one on the list is complexity reduction. Today’s modern data stacks are filled to the brim with technologies and tools, full to the brim, […]
Data Pipelines 101 – The Basics.
I’ve been getting a lot of questions lately about data pipelines, how to design them, what to think about, and what patterns to follow. I get it, if you’re new to Data Engineering it can be hard to know what you don’t know. There is a lot of content specific to certain technologies, but not […]
Golang – Useful for everyday Data Engineering?
I periodically try to pick up a new programming language on my journey through Data Engineering life. There are many reasons to do that, personal growth, boredom, seeing what others like, and helping me think differently about my code. Golang has been on my list for at least a year. I don’t hear much about […]
Data Quality – Great Expectations for Data Engineers
Mmmm … Data Quality … it is a thing these days. I look forlornly back to the ancient days of SQL Server when nobody cared about such things. Alas, we live in a different world, where hundreds of terabytes of data are the norm, and Data Quality becomes a thing. I’ve been meaning to give […]
Databricks Access Control – The 3 Most Important Steps
It’s not often I yearn for the good old days of SQL Server, but I’ve had a few of those moments lately. Some things I miss, some I don’t, and it’s probably because I’m getting old and crusty, stuck in my ways, by permissioning is one of those topics where I think about the good […]
boto3 or aws cli for s3 … Python vs Bash and other thoughts.
For any Data Engineer working on aws for any length of time, there is one task that always seems to come up and never go away. Manipulating files on s3 a bucket on aws is something I’ve had to do for years, it just never goes away. It’s always something … listing files, moving files, […]
Part 4 – Keys To Success – Idempotency and Partitioning.
As the road winds on we come to Part 4, of our 5 Part Series on Data Warehouses, Lakes, and Lake Houses. Finally, we are getting to some fun topics after all the boring stuff. Today I want to talk about the two keys to success in your Data Lakes … Idempotency and Partitioning. I […]
Databricks + Delta Lake MERGE duplicates – Deterministic vs Non-Deterministic ETL.
Is there any problem more classic to the Data Lakes and Data Warehouses than duplicate records? You would think after doing the same ETL for over a decade I could avoid the issue, apparently not. It’s good never to think too highly of one’s self, the duplicates can get us all. Today I want to […]
End-to-End Pipeline Integration Testing for Databricks + Delta Lake.
The testing never ends. Tests tests tests, and more tests. When it comes to data engineering and data pipelines it seems good practices are finally catching up after years. In the past, the data engineering community took a lot of heat, and rightly so, for not adopting good software engineering principles, especially in data pipelines. […]
Part 3 – Data Modeling in Data Warehouses, Data Lakes, and Lake Houses.
Now we are getting to the crux of the matter. I would say Data Modeling is probably one of the most unaddressed, yet important parts of Data Warehousing, Data Lakes, and Lake Houses. It raises the most questions and concerns and is responsible for the rise and fall of many Data Engineers. This is what […]