New Feature
21 days ago

Project Nessie catalog (versioned, Git-like data catalog)

A new, versioned data catalog you can use instead of (or alongside) Hive Metastore. Nessie adds branches/tags and Git-like operations for tables, so you can isolate changes, test safely, and roll back if needed.
  • What you can do
    • Create branches (e.g., feature_x, qa) to develop pipelines without touching main.
    • Switch active branch per workspace/job and run SQL against that branch.
    • Merge a branch back to main once validated, tag important points for reproducibility.
    • Use it directly from the UI (branch selector/management) and from the SQL editor (run queries against the selected branch).
    • Keep Hive Metastore for legacy jobs while moving new/changed tables to Nessie incrementally.
  • Why it helps
    • Safe, zero-copy experimentation on datasets.
    • Repeatable runs (pin to a tag) and quick rollback if a job/regression slips through.
    • Cleaner dev→test→prod promotion with explicit merges instead of ad-hoc table swaps.
  • Compatibility
    • Recommended with Iceberg tables, other formats depend on engine support.
    • Can run side-by-side with Hive Metastore, choose catalog per job/workspace.
  • How to start
    • Enable the Nessie module and add a Nessie catalog entry in ILUM.
    • In the UI, pick your active branch before running SQL or jobs.
    • Update orchestrated jobs to reference the intended catalog/branch.
  • Operational notes
    • Lineage & versions record the catalog + branch for every read/write.
    • Include the Nessie metadata store in your backup/DR plan (same RPO/RTO targets as your tables).
    • If you switch the default catalog from Hive to Nessie, review jobs that assume hive paths/catalog names.
https://ilum.cloud/docs/features/catalogs/nessie
https://projectnessie.org/

Available in version: 6.6.0