New
Feature
Announcing Mage.ai Integration for Ilum Data Lakehouse
We’re excited to unveil a new integration that makes building and operating data workflows on Ilum even faster.
Mage.ai - the open-source, Python-first platform for building, running, and monitoring modern data pipelines.
Mage.ai - the open-source, Python-first platform for building, running, and monitoring modern data pipelines.
Mage brings a notebook-style development experience, modular code “blocks,” and first-class support for batch, integration, and real-time streaming pipelines. It also includes native triggers, backfills, and reusable data quality checks (via Great Expectations).
Why this matters
Ilum gives you a unified, multi-cluster Spark platform with observability for logs and metrics. Pairing it with Mage’s developer-friendly orchestration lets your teams:
- Ship pipelines faster: Build pipelines as readable Python blocks in a notebook-like UI, then run them reliably on Ilum-managed Spark.
- Handle real-time and batch in one place: Orchestrate Spark batch jobs alongside Kafka-driven streaming workloads, all monitored through Ilum.
- Reprocess safely: Use backfills to rerun historical windows after schema changes or late-arriving data.
- Automate with confidence: Configure cron, event, API, or webhook triggers in UI or code; promote from dev to prod with repeatable configs.
- Trust your data: Attach Great Expectations-powered validations to pipeline blocks for consistent, auditable data quality.
What you can do now
- Streaming analytics: Ingest from Kafka with Mage and process on Ilum’s Spark clusters for real-time dashboards and alerts.
- Batch ELT at scale: Use Mage to orchestrate PySpark transformations that land in your lakehouse tables, with lineage via Ilum’s logs/metrics.
- Data quality gates: Add expectations to loader/transformer/exporter blocks so bad data fails fast before it reaches consumers.
- Catch-up processing: Backfill a month of sessions after fixing a UDF; Mage spins up windowed runs against Ilum and tracks results.
Highlights at a glance
- Python-first pipelines with modular blocks
- Batch, integration, and streaming pipeline types
- Triggers (cron/events/API/webhooks) & backfills in UI or code
- Reusable Great Expectations test suites for data quality
- Runs cleanly alongside Airflow or Kestra if you’re hybrid today