The two coolest kids in class … I mean seriously … every other post in Data Engineering world these days is about Apache Airflow or DataBricks. It’s hard to kick against the goad. Just jump on the band wagon before you get left in the dust. I’ve used both DataBricks and Apache Airflow, they both are pretty important and integral tools for data engineers these days. Apache Airflow makes overall complex pipeline dependencies, orchestration, and management intuitive and easy. DataBricks has delivered with AWS and EMR could not, easy to use Spark and DeltaLake functionality without the management and config nightmares of running Spark yourself.
Recently I worked on an Airflow and DataBricks/DeltaLake integration, time to talk what it looks like and options when doing this type integration.
Read more