If you’ve ever been in the market for a Data Engineering job, or you’re alive and on Linkedin, you’ve probably been constantly inundated with job postings and requests pounding on your emails like a constant mountain stream even bubbling down a hill.

If that’s not the case, then head over to the quarterly salary discussion on r/dataengineering and cruise around the comments for an hour or two. One thing that will become clear quickly is that there is a huge range in pay scales, for apparently the same jobs.

One person is making 190K base and another one on the same stack is making 80K, it’s enough to make you pull your hair out. Heck, I get a constant stream of messages for “great opportunities”  for a good 50K less than I make at this moment.

What’s the deal?

Sure, there is always going to be some disparity depending on the person who is the Data Engineer, some people simply deliver twice as much as others and are compensated accordingly. Some have over a decade of experience, some only a few.

Straight to the point.

I will tell you what the deal is.  It’s the company(s).

The truth is that this sort of pay disparity exists all across the board, not only in Data Engineering, it’s a human phenomenon, and the Data field is no exception. Companies are simply different and act differently towards employees. Sometimes you’re a widget no one cares about, easily replaceable, and sometimes you end up in a place where people are core to what happens, and are paid accordingly.

  • Companies that pay well below market value don’t care and probably have bad cultures.
  • Companies that pay at or above market value are investing in people and have good cultures.
  • Companies that don’t care about data don’t pay well and have crappy infrastructure and tooling.
  • Companies that care about data pay well and have great infrastructure and tooling.

A lot of it boils down to two things.

  1. Does this company care about people generally, or not so much? Do they invest in people or see them as liabilities that are easily replaceable?
  2. Does this company care about “data” in a real way? Do they invest in their data?

If you’re a Data Engineer you should find a place to work that cares about people and data, the perfect cross-section. This will maximize your earnings. Although we should not complain as a whole as Data Engineers, salaries generally speaking on average or more than enough to live a good life on.

Overall, data engineers can expect to earn between $96,673 and $130,026 on average annually, with higher earnings possible in certain locations and for those with specialized skills and experience. – AI

How to make more money.

Do you want to make more money, are you underpaid? The answer is simple. Find a new job.

Humans don’t like change, many people get stuck and afraid to move on, and this is what keeps them operating in poor cultures where they are treated poorly, overworked, and underpaid. I’m here to tell you data is still the new oil, even more so with the rise of AI.

You can double you income simply by changing jobs every 1.5 years or so.

Nothing will give you a 20-30% pay bump quicker than changing jobs. You can work hard, even at a good company, and only make that sort of increase after working for a decade. Don’t do it.

Data Engineering pays well AT GOOD Companies. Go get paid.

I recently did a challenge. The results were clear. DuckDB CANNOT handle larger-than-memory datasets. OOM Errors.  See link below for more details.

DuckDB vs Polars – Thunderdome. 16GB on 4GB machine Challenge. 

 

Are lambdas one of those tools that everyone uses and no one talks about? I guess I’ve taken them for granted over the years, even though they are incredibly useful. For a lot of my Data Engineering career I didn’t really think about or use AWS lambdas, I just saw them as little annoying flies on the wall, incapable of “real” use in Data Engineering pipelines.

But, I’ve changed my evil ways, and come to love the little buggers. Easy to use, cheap, and no infrastructure to worry about, I mean, they are little jewels in the rough.

I guess now that I look back and reflect, I’m surprised that I don’t see more content produced singing the praises and glories of AWS lambdas, why is the world silent? Those workhorses chug along in the background, millions and millions of times every second.

Today will not be earth-shattering, just a 10,000-foot view of AWS lambdas to inspire you to use them. With a focus on their use in Data Engineering.

Read more

There are a few things in life I both love and hate. Let’s see …. hot weather, cold weather, working for a living, and …. LeetCode. I mean it is totally fun to push yourself and try to solve hard problems, but then the other side of me is like … well I’ve been writing code for years and 80% of this stuff is nothing like writing code in real life. I think the LeetCode platform itself is an amazing tool, and has provided both people and companies with an elegant way to showcase and practice skills. But is there too much of a good thing? Of course.

Read more
Apache Beam for Data Engineers.

What is this thing? What’s it good for? Who’s using it and why? That’s pretty much what I ask myself once a month when I actually see the name Apache Beam pop up in some feed I’m scrolling through. I figured it has to be legit to be Apache incubated, but I’ve never run across anyone in the wild using it yet. On the surface it appears to be semi-pointless since it runs on-top of other distributed systems like Spark, but I’m sure there is more to it. Today, I’m going to run through an overview of Apache Beam and then try installing and running some data through it, kick the tires as it were. And see if my mind changes about the pointless bit.

Read more

I want to interrupt your semi-regularly scheduled technical blog post for this public service announcement. I mean the url does say “confessions” does it not? For better or worse I’ve been thinking a lot lately about what it means to be a Data Engineer, what’s like to be a Data Engineer, and what makes a good Data Engineer. Just the life of a Data Engineer in general. The Battlefield of the Data Engineer is fought in the labyrinth of nested SQL queries. It rages to the depths of distributed computing clusters. It vies for victory on the crags and peaks of DevOps. It attacks for precious ground amid the chaos of the perfect OOP and Functional code bases. Phew… and all that just to keep your head above water.

Read more
Apache Parquet vs Apache Avro

There comes a point in the life of every data person that we have to graduate from csv files. At a certain point the data becomes big enough or we hear talk on the street about other file formats. Apache Parquet and Apache Avro are two of those formats that been coming up more with the rise of distributed data processing engines like Spark.

Read more