Apache XTable. Delta vs Iceberg vs Hudi.

The blog post reviews an Apache Incubating project called Apache XTable, which aims to provide cross-format interoperability among Delta Lake, Apache Hudi, and Apache Iceberg. Below is a concise breakdown from some time I spend playing around this this new tool and some technical observations:

1. What is Apache XTable?

Not a New Format: It’s explicitly not another “table format.” Instead, it translates existing Lakehouse table metadata so that one physical dataset can be recognized as Delta, Hudi, or Iceberg.
“Omni-directional Interop”: The goal is to let you read/write the same physical data files from any of the three major table formats without duplicating data.

2. How It Works at a High Level

Reads your existing metadata (e.g., Delta log or Hudi .hoodie folder).
Generates metadata for the other table formats (e.g., it creates an metadata/ folder for Iceberg or .hoodie/ folder for Hudi).
No Physical Copy of the actual data files—only the metadata gets duplicated or converted.
Query using the familiar spark.read.format("delta|hudi|iceberg").load("...") syntax. In theory, you can point to the same data location and pick a format.

3. Setup and Configuration

Requires Java 11: The project uses Maven, Java, and has various version constraints (like for Spotless, a code formatting plugin).
Configuration via YAML:

yaml

sourceFormat: DELTA targetFormats: - HUDI - ICEBERG datasets: - tableBasePath: s3://path/to/source/data tableDataPath: s3://where/you/want/the/data tableName: mytable namespace: my.db
Command-Line Invocation:

bash

java -jar xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar \ --datasetConfig config.yaml \ [--hadoopConfig hdfs-site.xml] \ [--convertersConfig converters.yaml] \ [--icebergCatalogConfig catalog.yaml]

4. Building from Source (Pain Points)

Maven & Java Versions: The blog highlights issues with Java 17 or later vs. Java 11, causing Spotless or plugin mismatches.
Dockerfile: The provided Dockerfile in the project apparently had syntax or versioning issues. I had to create a custom Dockerfile with Java 11 + Maven to build the project successfully.

5. Trying It Out

I tested XTable with:
- A Delta Lake table in S3 (both a Unity Catalog–managed table and a regular “unmanaged” table).
- Ran the XTable conversion to produce Hudi (.hoodie/) and Iceberg (metadata/) folders.
Reading the Converted Tables:
- Databricks Spark environment often complained about “overlapping” or “incorrect path” errors when trying to read the newly created Iceberg/Hudi metadata from the same base directory.
- I had better luck with Polars locally, though they had to point directly to the Iceberg metadata JSON file (e.g., v2.metadata.json), rather than just the parent directory.

6. Current Observations

Early-Stage or “Incubating”: This is an incubating Apache project, so friction and bugs are expected:
- Building can be finicky.
- Reading with Spark or other engines may require extra steps or fail silently.
- Docs are still sparse, especially around whether (and how) incremental sync is supposed to work in production.
Daily/Continuous Sync?: XTable offers “incremental” and “full” sync, implying you might need to re-run these conversions regularly to keep each format’s metadata up to date.
I find the concept worthwhile—XTable could solve real problems for teams who need to maintain multiple table formats without duplicating large data sets.
However, the execution is not yet smooth:
- Confusing build process.
- Some docs are missing or incomplete about how to read the newly created metadata in Spark or other engines.
- Possibly incompatible or untested with certain Databricks or Unity Catalog configurations out of the box.

Apache XTable is a promising idea—letting you keep data in one physical location but read it as Delta, Hudi, or Iceberg. However, it is still rough around the edges: building requires exact tooling, reading the converted metadata can fail depending on how you do it, and the documentation leaves gaps. If you’re interested in cross-format interoperability and you’re comfortable with Java-based incipient open-source tools, it’s worth experimenting with—but it’s probably not production-ready in its current form.

1. What is Apache XTable?

2. How It Works at a High Level

3. Setup and Configuration

4. Building from Source (Pain Points)

5. Trying It Out

6. Current Observations

Interesting links

Pages

Categories

Archive