New Feature
9 days ago

Announcing Mage.ai Integration for Ilum Data Lakehouse

We’re excited to unveil a new integration that makes building and operating data workflows on Ilum even faster.
Mage.ai - the open-source, Python-first platform for building, running, and monitoring modern data pipelines.

Mage brings a notebook-style development experience, modular code “blocks,” and first-class support for batch, integration, and real-time streaming pipelines. It also includes native triggers, backfills, and reusable data quality checks (via Great Expectations).

Why this matters
Ilum gives you a unified, multi-cluster Spark platform with observability for logs and metrics. Pairing it with Mage’s developer-friendly orchestration lets your teams:
  • Ship pipelines faster: Build pipelines as readable Python blocks in a notebook-like UI, then run them reliably on Ilum-managed Spark.
  • Handle real-time and batch in one place: Orchestrate Spark batch jobs alongside Kafka-driven streaming workloads, all monitored through Ilum.
  • Reprocess safely: Use backfills to rerun historical windows after schema changes or late-arriving data.
  • Automate with confidence: Configure cron, event, API, or webhook triggers in UI or code; promote from dev to prod with repeatable configs.
  • Trust your data: Attach Great Expectations-powered validations to pipeline blocks for consistent, auditable data quality.
What you can do now
  • Streaming analytics: Ingest from Kafka with Mage and process on Ilum’s Spark clusters for real-time dashboards and alerts.
  • Batch ELT at scale: Use Mage to orchestrate PySpark transformations that land in your lakehouse tables, with lineage via Ilum’s logs/metrics.
  • Data quality gates: Add expectations to loader/transformer/exporter blocks so bad data fails fast before it reaches consumers.
  • Catch-up processing: Backfill a month of sessions after fixing a UDF; Mage spins up windowed runs against Ilum and tracks results.
Highlights at a glance
  • Python-first pipelines with modular blocks
  • Batch, integration, and streaming pipeline types
  • Triggers (cron/events/API/webhooks) & backfills in UI or code
  • Reusable Great Expectations test suites for data quality
  • Runs cleanly alongside Airflow or Kestra if you’re hybrid today
https://ilum.cloud/docs/features/mage/
https://www.mage.ai/

mage ai and ilum