New Feature

Announcing Mage.ai Integration for Ilum Data Lakehouse

We’re excited to unveil a new integration that makes building and operating data workflows on Ilum even faster.
Mage.ai - the open-source, Python-first platform for building, running, and monitoring modern data pipelines.

Mage brings a notebook-style development experience, modular code “blocks,” and first-class support for batch, integration, and real-time streaming pipelines. It also includes native triggers, backfills, and reusable data quality checks (via Great Expectations).

Why this matters

Ilum gives you a unified, multi-cluster Spark platform with observability for logs and metrics. Pairing it with Mage’s developer-friendly orchestration lets your teams:

Ship pipelines faster: Build pipelines as readable Python blocks in a notebook-like UI, then run them reliably on Ilum-managed Spark.
Handle real-time and batch in one place: Orchestrate Spark batch jobs alongside Kafka-driven streaming workloads, all monitored through Ilum.
Reprocess safely: Use backfills to rerun historical windows after schema changes or late-arriving data.
Automate with confidence: Configure cron, event, API, or webhook triggers in UI or code; promote from dev to prod with repeatable configs.
Trust your data: Attach Great Expectations-powered validations to pipeline blocks for consistent, auditable data quality.

What you can do now

Streaming analytics: Ingest from Kafka with Mage and process on Ilum’s Spark clusters for real-time dashboards and alerts.
Batch ELT at scale: Use Mage to orchestrate PySpark transformations that land in your lakehouse tables, with lineage via Ilum’s logs/metrics.
Data quality gates: Add expectations to loader/transformer/exporter blocks so bad data fails fast before it reaches consumers.
Catch-up processing: Backfill a month of sessions after fixing a UDF; Mage spins up windowed runs against Ilum and tracks results.

Highlights at a glance

Python-first pipelines with modular blocks
Batch, integration, and streaming pipeline types
Triggers (cron/events/API/webhooks) & backfills in UI or code
Reusable Great Expectations test suites for data quality
Runs cleanly alongside Airflow or Kestra if you’re hybrid today

https://ilum.cloud/docs/features/mage/
https://www.mage.ai/