March 2025 - Confessions of a Data Guy

Big Data, Data Engineering, Data Warehousing

dbt on Databricks

Running dbt on Databricks has never been easier. The integration between dbtcore and Databricks could not be more simple to set up and run. Wondering how to approach running dbt models on Databricks with SparkSQL? Watch the tutorial below.

March 28, 2025

Big Data, Data, Data Engineering, Data Warehousing, Python

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

There are things in life that are satisfying—like a clean DAG run, a freshly brewed cup of coffee, or finally deleting 400 lines of YAML. Then there are things that make you question your life choices. Enter: setting up Apache Polaris (incubating) as an Apache Iceberg REST catalog.

Let’s get one thing out of the way—I didn’t want to do this.

March 26, 2025

Data, Data Engineering, Python

Reading Excel (.xlsx) Files with Polars

I make it my duty in life to never have to open an Excel file (xlsx); I feel like if I do, then I made a critical error in my career trajectory. But, I recently had no choice but to open an Excel on a Mac (or try) to look at some sample data from a client.

March 18, 2025

Data, Data Engineering

dbt on Databricks.

Context and Motivation

dbt (Data Build Tool): A popular open-source framework that organizes SQL transformations in a modular, version-controlled, and testable way.
Databricks: A platform that unifies data engineering and data science pipelines, typically with Spark (PySpark, Scala) or SparkSQL.

The post explores whether a Databricks environment—often used for Lakehouse architectures—benefits from dbt, especially if a team heavily uses SQL-based transformations.

March 4, 2025

Big Data, Data, Data Engineering, Data Warehousing

Apache XTable. Delta vs Iceberg vs Hudi.

The blog post reviews an Apache Incubating project called Apache XTable, which aims to provide cross-format interoperability among Delta Lake, Apache Hudi, and Apache Iceberg. Below is a concise breakdown from some time I spend playing around this this new tool and some technical observations:

March 4, 2025

dbt on Databricks

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

Reading Excel (.xlsx) Files with Polars

dbt on Databricks.

Context and Motivation

Apache XTable. Delta vs Iceberg vs Hudi.

Interesting links

Pages

Categories

Archive