I’ve always been a firm believer in using the right tool for the job. Sometimes I look at a piece of code … and ask … why? I mean just because you can do something doesn’t mean that you should. I see a lot of my job as someone who writes code … as not […]

In the beginning, I always thought the humdrum Big O Notation discussions should be reserved for Software Engineers who enjoyed working on such things. I mean, what could it possibly have to do with Data Engineering? I mean, if you are the person writing the Spark application, by all means, have at it, but if […]

How many times in your life, that is but a mist, have you thought, “If I had only known that in the beginning?” I feel as if I’ve committed that cardinal sin as a developer and Data Engineer … falling in love with a tool to the exclusion of all else. I mean truly, Databricks […]

I mean, I’m sort of being facetious and sort of not. I mean there is some truth that rings out in those words. I’m sure someone selling Data Observability tools, or writing Great Expectations all-day will not like the idea of relying on only 2 data validations. But honestly, these two are probably more than […]

I’ve always enjoyed reading Mr. Augustine of Hippo, particularly “Confessions.” Ahead of his time in many ways. Although, you have to be into that sort of thing to find such topics interesting. It can be sort of dry, drawn out, verbose, and not for the faint of heart. Much like learning new programming languages. I’ve […]

I think Delta Lake is here to stay. With the recent news that Databricks is open-sourcing the full feature-set of Delta Lake, instead of keeping the best stuff for themselves, it probably has the most potential to be the number one go-to for the future of Data Lakes, especially within those organizations that are heavy […]

I’ve been playing with Golang off and on for a few weeks, when I find the time, which is every few weeks between kids and fishing. I have become a little bit of a fan, wishing for more projects to take on with Go. It seems like a fairly straightforward language to pick up, the […]

Just when I think it cannot get more popular, it does. I have to admit, PySpark is probably the best thing that ever happened to Big Data. It made what was once a myth, approachable to the average person. No need for esoteric Java skills, no more MapReduce, just plain old Python. Another amazing thing […]

It still seems like the wild west of Data Quality these days. Tools like Apache Deque are just too much for most folks, and Data Quality is still new enough to the scene as a serious thought topic that most tools haven’t matured that much, and companies dropping money on some tool is still a […]