The Future of Data Engineering, and the Tale Of Two Data Warehouses.
I am going to peer into the crystal-ball, the seeing stone, looking into the murky future of Data Engineering to see what mysteries it holds. I’ve seen a story, a tale of two Data Warehouses, I’ve seen Machine Learning, Streams, Distributed Systems, Storage, the eternal SQL. A lot has changed in the world of Data Engineering in the last few years, but a lot has not changed in the data world as well. Articles about the end of ETL the rise ELT, Hadoop being dead, new data paradigms, no code data flows, managed services, yet very little has actually changed, or it does at a snails pace. Yet, inevitably the store and future of data engineering can be told through the tale of two data warehouses.
The Tale of Two Data Warehouses.
Why does data warehousing tell the future of data engineering? Because in the end 80%+ of what data engineers do is wrangle data in and out of storage systems. Data Warehouse, Data Mart, Data Lake, blah blah blah. Of course there are Kimball dimensional model kool-aid drinkers who think that anything else isn’t a Data Warehouse, but we are past that now, let them live in the past. And that my friend, is where the Tale of Two Data Warehouses begins….
This tells the future of Data Engineering doesn’t it? Data Warehousing … storage, Data Lakes, Data Marts … call them what you want to call them … the data storage solutions that act as the source of truth for the business have the greatest influence on Data Engineering and where it’s going. Both the Ye Ole’ Data Warehouse and the Hot New Stuff Data Warehouse have personalities of their own, their own problems and dependencies.
We have to step back and remember something though …. Data Engineers work in data systems to essentially do a few things ….
- Provide business and product value with analytics.
- Be the source of truth.
- Enforce governance, security, schema across data and time.
- Make data accessible and processable.
But, part of the Tale Of The Two Data Warehouses is about the characters who are in this journey, I guess you could call it a Fellowship of sorts … a fellowship of data characters. Characters in the eternal but same struggle, pulling insights and data together to provide value.
I think something else becomes evident here, the future of the Data Engineer is to absorb more and more of the old ways, along with the new. Inevitably the organizations the move towards the New Hot Stuff Data Warehouse have a different team structure usually than the old ways. It’s common now for Data Engineering and Data Science teams to work in tandem to provide the needed data and value to the business.
The Golden Age of the Database Administrator is waning, where once Data Analyst and Business Intelligence Engineering reigned supreme, the Data Engineer now is supposed to pick up that slack.
These new team members and new technologies lead to different skillsets becoming the more and more important. It’s usually a different mindset although of course there is some overlap, many of skills and design/architecture though processes are just different.
Of course your mother taught you that not all that is shiny is gold? Don’t always do what your friends are doing? One of greatest pitfalls of Data Engineering and architecture is believing that the “new technology” will solve all our problems. Ye Ole’ Data Warehouse and Hot New Stuff Warehouse will both come with their baggage, they are fundamentally different and come with different challenges. Every tale has its hardships…. the grass is always greener on the other side.
The collision of the New and Old worlds has lead to the rise of managed technologies that provide the performance of massive Distributed Systems, with the simplicity of SQL from the Old World. The future of the Data Engineer lies down this road as well, even if they don’t like it. Running your own Spark and Hadoop clusters … even on EMR can turn into the same nightmare that teams of DBA’s have dealt with for decades.
Once you have a fleet of Data Engineers working on a Spark Cluster someone will eventually ask if there is an easier way. Below are some of the new technologies and companies that have started to step in to fill in that gap.
- DataBricks (managed Spark)
- Astronomer (managed Airflow)
- Snowflake. (managed distributed SQL)
Conclusion
Of course now we can see what skills are important in the future of Data Engineering. Distributed systems, distributed file systems, architecture design, ability to debug complicated systems. The future for Data Engineers is all about being part Software Engineer, DevOps, Architect, DBA, and Analyst. Some of these skills will become more important over time as more organizations start the journey from Ye Ole’ Data Warehouse to Hot New Stuff Warehouse.
Someone who writes SQL and SSIS all day long has nothing to fear, that job will be there for a long time to come. But, it’s probably going to be shrinking arena. It’s inevitable and unstoppable, technology changes, the data follows … and so do the skills and jobs.
The Data Engineer of the Future should learn ….
- Python, Scala, Java … at the Software Engineer level.
- Distributed Systems, design and debugging.
- Architecture, how to do it well.
- Data Types / Structures / Storage Systems … data partitioning.
- Data Science / Machine Learning / Analytics.
There are a lot of Data Engineers already in the future … there are a lot that are not. It usually depends on the businesses they have worked with, and the data architecture that is present. It has a lot to do with the people in charge and the willingness to cast off the old and try on the new. There will always be a place for a Kimball data warehouse. But the appetite for expensive SQL Server licenses with gobs of DBA’s and Data Engineerings writing SSIS packages is clearly starting to wane.
If you want to know where the future of Data Engineering is going … follow the data, that is why the Tale Of The Two Data Warehouses is so important. Data will always flow into some storage system, how people and companies choose to interact with that data dictates the future skills and technology that will be used.
Trackbacks & Pingbacks
[…] For information on this topic and more please visit https://www.confessionsofadataguy.com/the-future-of-data-engineering-and-the-tale-of-two-data-wareho… […]
Comments are closed.