, , , ,

How I (Barely) Survived Setting Up Polaris as an Iceberg REST Catalog

There are things in life that are satisfying—like a clean DAG run, a freshly brewed cup of coffee, or finally deleting 400 lines of YAML. Then there are things that make you question your life choices. Enter: setting up Apache Polaris (incubating) as an Apache Iceberg REST catalog.

Let’s get one thing out of the way—I didn’t want to do this.

I’m a Delta Lake fan. Loyal to a fault. But when every LinkedIn post, conference talk, and tech bro tweet started chanting “Iceberg! Iceberg!”, I cracked. I caved. I decided to stop yelling at clouds and give it a real shot.

But if I was going to do this, I was going to do it my way—production-like, remote server, secure, AWS Lambda integration, S3-backed tables. No localhost sorcery. No toy data on my laptop. I wanted pain. I got it.

Polaris: Just Add Suffering

The plan was simple on paper:

  • Set up Polaris on a remote Ubuntu server (I used Linode).

  • Lock down the firewall with UFW.

  • Create a REST catalog.

  • Hook it up to S3.

  • Use PyIceberg, Polars, or Daft to actually read/write data.

  • Wrap it in a Lambda that responds to S3 file drops.

The Polaris docs gave me two options: install locally or use Docker.

Cool. Docker it is.

Except… the Docker instructions lied.

bash
docker run -p 8181:8181 apache/polaris:latest

That image? Doesn’t exist. It’s vaporware. A beautiful dream.

I scoured the GitHub repo for a working docker-compose.yml. Ran them all. Same story: broken image reference to apache/polaris. Nothing worked.

So I rolled up my sleeves, installed Java 21, and built the whole thing from source with Gradle. Pro tip: 2GB RAM won’t cut it. Polaris chokes. Go with 4GB minimum. (You’re welcome.)

Security Theater, But Make It Real

Polaris runs as a web service, so naturally I locked it down with UFW. Only my IP could hit the server.

The docs were vague, but eventually I got the Polaris Quarkus server up and running. It even exposed a REST endpoint. Things were looking good.

Until they weren’t.

PyIceberg vs Polaris: Who Will Break First?

I figured once Polaris was running, PyIceberg would connect easily. Wrong.

Even with the token copied directly from the Polaris logs, I kept getting 401s. No matter how I passed it—header, credential field, magic incantation—Polaris refused me.

Turns out, you need a very specific combo of uri, credential, token type, scope, and warehouse path. None of this is clearly documented. You’re just expected to “know.”

Eventually, I got it working. I have no idea how many combinations I tried. Somewhere between “too many” and “my soul left my body.”

Creating a Table: The Final Boss

Okay, now we’re in. Polaris is running. PyIceberg is talking. Let’s create a table and write some data.

LOL.

Access control came back to bite me. I didn’t have the right roles. Polaris spit out a giant error about principals, roles, and grants. It felt like I was configuring IAM, but with less guidance and more despair.

Eventually, I found a notebook in the Polaris quickstart that walked through role setup. Copy-paste salvation. Finally, I had a catalog, a namespace, and permission to create a table.

Writing to the Iceberg Table

I tried using Polars to write data. No dice.

So I turned to Daft, which handled Iceberg writes like a champ.

  • Read CSV from S3 local disk.

  • Write to Iceberg via Polaris.

  • Read it back.

It worked. It finally worked. The circle was complete.

Closing Thoughts from the Wreckage

If you’re thinking of standing up Polaris to use as an Iceberg REST catalog, just know this:

  • You can do it.

  • It will hurt.

  • The docs are bad.

  • The tooling is half-baked.

  • But when it finally works, you’ll feel like you climbed Everest… without oxygen.

And for all my Delta Lake folks out there—don’t worry, I haven’t abandoned the faith. But I’ve got to admit, Iceberg + Polaris showed up when others didn’t.

Still, OSS catalogs have a long way to go before they’re ready for the casual data engineer. They’re not plug-and-play. They’re not “five minute setup.” But if you’re serious about scaling a lakehouse, Polaris might just be worth the pain.

And hey—if you made it this far, pour yourself a drink. You earned it.