Home - Confessions of a Data Guy

Big Data, Data, Data Engineering, Ramblings

Real-Life Example of Big O(n) Notation (and other such nonsense) for Data Engineering.

In the beginning, I always thought the humdrum Big O Notation discussions should be reserved for Software Engineers who enjoyed working on such things. I mean, what could it possibly have to do with Data Engineering? I mean, if you are the person writing the Spark application, by all means, have at it, but if […]

August 15, 2022

Data, Data Engineering, Data Warehousing

5 things I wish I knew about Databricks … before I started.

How many times in your life, that is but a mist, have you thought, “If I had only known that in the beginning?” I feel as if I’ve committed that cardinal sin as a developer and Data Engineer … falling in love with a tool to the exclusion of all else. I mean truly, Databricks […]

August 5, 2022

Big Data, Data, Data Engineering, Data Quality, SQL

You Only Need 2 Data Validations, That’s It.

I mean, I’m sort of being facetious and sort of not. I mean there is some truth that rings out in those words. I’m sure someone selling Data Observability tools, or writing Great Expectations all-day will not like the idea of relying on only 2 data validations. But honestly, these two are probably more than […]

August 1, 2022

Data, Data Engineering, Golang, Ramblings, Rust

Thoughts on Saint Augustine, Rust vs Golang. Complexity, verbosity, and other matters.

I’ve always enjoyed reading Mr. Augustine of Hippo, particularly “Confessions.” Ahead of his time in many ways. Although, you have to be into that sort of thing to find such topics interesting. It can be sort of dry, drawn out, verbose, and not for the faint of heart. Much like learning new programming languages. I’ve […]

July 15, 2022

Big Data, Data, Data Engineering, Data Warehousing

Exploring Delta Lake’s ZORDER, and Performance. On Databricks.

I think Delta Lake is here to stay. With the recent news that Databricks is open-sourcing the full feature-set of Delta Lake, instead of keeping the best stuff for themselves, it probably has the most potential to be the number one go-to for the future of Data Lakes, especially within those organizations that are heavy […]

July 7, 2022

Data, Data Engineering, Golang

Thoughts on HTTP and JSON with Golang. And other Headaches.

I’ve been playing with Golang off and on for a few weeks, when I find the time, which is every few weeks between kids and fishing. I have become a little bit of a fan, wishing for more projects to take on with Go. It seems like a fairly straightforward language to pick up, the […]

July 5, 2022

Big Data, Data, Data Engineering, Python

A Few Wonderful PySpark Features.

Just when I think it cannot get more popular, it does. I have to admit, PySpark is probably the best thing that ever happened to Big Data. It made what was once a myth, approachable to the average person. No need for esoteric Java skills, no more MapReduce, just plain old Python. Another amazing thing […]

June 24, 2022

Big Data, Data, Data Engineering, Ramblings

Quick Guide to Data Engineering on AWS

June 9, 2022

Big Data, Data, Data Engineering, Data Quality, Data Warehousing, Python

Great Expectations with Databricks and Apache Spark. A Tale of Data Quality.

It still seems like the wild west of Data Quality these days. Tools like Apache Deque are just too much for most folks, and Data Quality is still new enough to the scene as a serious thought topic that most tools haven’t matured that much, and companies dropping money on some tool is still a […]

June 2, 2022

Big Data, Data, Data Engineering, Ramblings

My Journey from Data Analyst to Senior Data Engineer

My newsfeed these days is chock-full of “how to break into Data Engineering” these days. It’s made me a bit nostalgic, to say the least. I’ve been dreaming about those days gone by when I started out in the data world. I would say my experience was not so much “breaking in”, but more of […]

May 12, 2022

Real-Life Example of Big O(n) Notation (and other such nonsense) for Data Engineering.

5 things I wish I knew about Databricks … before I started.

You Only Need 2 Data Validations, That’s It.

Thoughts on Saint Augustine, Rust vs Golang. Complexity, verbosity, and other matters.

Exploring Delta Lake’s ZORDER, and Performance. On Databricks.

Thoughts on HTTP and JSON with Golang. And other Headaches.

A Few Wonderful PySpark Features.

Quick Guide to Data Engineering on AWS

Great Expectations with Databricks and Apache Spark. A Tale of Data Quality.

My Journey from Data Analyst to Senior Data Engineer

Interesting links

Pages

Categories

Archive