Give feedback and suggest new ideas for Ilum.

SQL Editor: Redesigned UI, Tab Management & Cross-Engine SQL Translation

The SQL Editor has received a major UI/UX overhaul, making it faster to navigate multi-query workflows and easier to move between engines without rewriting SQL.

Tab management

You can now open multiple queries and notebooks as tabs across the top of the editor. Switch between active work, reference queries, and tutorials without losing your place. The left sidebar organizes all saved queries by engine, so you can search, browse, and open anything in one click.

SQL Translate: convert between engines instantly

This is the big one. Hit Translate and Ilum will convert your SQL from one dialect to another, across all three supported engines: DuckDB, Trino, and Spark SQL. Moving a quick prototype from DuckDB to a production Spark job? Translate handles the syntax differences so you don't have to. It works in every direction: DuckDB → Spark, Trino → DuckDB, Spark → Trino, and so on.

What else is new in the editor

Format - auto-format your SQL for readability with one click.
Explain - view the query execution plan before running, useful for catching expensive operations early.
Engine picker - switch between DuckDB, Trino, and Spark SQL per query tab. Different tabs can target different engines simultaneously.
Inline results - query output appears directly below the editor with row counts, execution time, and an Export option. Multiple result sets are tabbed so you can compare runs side by side.

Why it matters

Teams working across engines no longer need to context-switch between tools or manually translate SQL dialects. Write exploratory queries in DuckDB for speed, translate to Spark SQL when the workload scales, and keep everything in one place, same editor, same catalog, same lineage.

Introduced in 6.7.0

New Feature

duckdb-query-profile-ilum.png 89.29 KB

Inline Query Profiling: Column Stats & Data Distribution on Every Result

Run a query, get a full data profile, automatically. The new Profile view turns any query result into an instant statistical breakdown with per-column distributions, completeness checks, and issue detection. No extra queries, no separate profiling tools.

What you see at a glance

Every column in your result set is listed with its data type, value range, a mini distribution chart, and a completeness indicator, all in one scrollable view. A summary bar at the top shows overall completeness, flags columns with potential issues, and breaks down column types (numeric, string, date, etc.) so you can spot problems before they reach downstream.

Drill into any column

Expand a column to get the full picture: non-null and null counts, completeness percentage, distinct values, min/max, mean, median, quartiles (Q1, Q3), IQR, standard deviation, skewness, sum, and an interactive histogram showing value distribution. For categorical columns, you can load top values to see frequency breakdowns.

Why it matters

Data profiling usually means writing dedicated queries or exporting to a separate tool. Now it's built into the query workflow, run your SQL, click Profile, and immediately understand the shape of your data. Catch missing values, skewed distributions, or unexpected ranges before they become pipeline bugs. It works with every engine: DuckDB, Trino, and Spark SQL.

Where to find it

Run any query in the SQL Editor, then switch to the Profile tab in the results panel (next to the table and chart views).

Introduced in 6.7.0

duckdb-query-profile-ilum2.png 130.97 KB

Visual Query Plan & Optimization Hints: Understand and Fix Queries Before They Run

The Explain view now renders your query's execution plan as a fully interactive visual graph, paired with a Plan Insights summary and Optimization Hints that point to specific problems and tell you how to fix them.

Plan Insights: the big picture in seconds

The right-hand panel gives you a structural overview of your query: total operators, tree depth, max fan-out, CTE count, exchanges, filters, aggregates, and columns read. Below that, a Join Types breakdown shows exactly how many of each join strategy the planner chose, hash joins, nested loop joins, cross products, and others, so you can spot expensive patterns immediately.

Interactive plan graph

The full execution plan renders as a zoomable, navigable tree. Each node shows its operator type (scan, filter, projection, window, CTE, aggregate), the expression or columns involved, and the estimated cardinality. Click any node to inspect it. Zoom out to see the full shape of a 100+ operator plan, or zoom in to trace a specific branch.

Optimization Hints: actionable, not generic

This is where it gets practical. Ilum analyzes the plan and surfaces concrete issues with severity indicators:

Duplicate scan detection - flags when a table is scanned multiple times (e.g., 7×) and suggests caching or CTE restructuring to eliminate redundant reads.
Deep plan warning - alerts when the plan tree is unusually deep, recommending you break the query into stages with temporary tables or cached CTEs.
Each hint includes a "Go to node" link that jumps directly to the problematic operator on the graph, so you don't have to hunt through a complex plan manually.

Why it matters

Most SQL users run EXPLAIN and get a wall of text. This turns that into a visual debugging tool with built-in intelligence - you see the plan, understand the shape, and get told what to fix, all in one view. Catch redundant scans, expensive joins, and over-complex plans before they waste compute. Works with DuckDB, Trino, and Spark SQL.

Where to find it

Write your query in the SQL Editor and click Explain. The plan graph opens in the results panel with Plan Insights and Optimization Hints visible on the right.

Introduced in 6.7.0

Query & File Management: Context Menus, Organization, and Quick Actions Everywhere

Managing queries and files in Ilum is now faster and more consistent. Every item across the platform, queries, notebooks, folders, storage files, now has a right-click context menu with the full set of actions available in one place. No more hunting through toolbars or modal dialogs.

Query management

The left sidebar organizes all your saved queries and notebooks with folder grouping (e.g., by engine) and a Pinned section for frequently used items. Right-click any query to Open, Rename, Duplicate, Move to…, Copy to…, Download, or Delete, all from a single menu. Search across all queries from the top of the sidebar.

File browser sidebar

A new compact file tree gives you direct access to your S3 storage hierarchy from inside the SQL Editor. Browse buckets and folders, right-click any file to Download, Copy Path, or Create SQL Query directly from the file, turning a raw data file into a queryable starting point in one click.

File Manager improvements

The full File Manager view now supports Create File alongside existing Upload and Create Folder actions, giving you a way to create files directly in storage. Right-click any file for the complete set of operations: Preview, Edit, Download, Copy Path, Copy Name, Rename, Move to…, Delete — plus data-oriented actions like Create Table from File, Analyze with SQL, and Profile Data. When files are selected, a bottom action bar appears with the most common bulk actions for quick access.

Why it matters

Small workflow friction adds up. Context menus mean fewer clicks, fewer page switches, and a more natural way to work with your data assets. Whether you're reorganizing queries into folders, copying a storage path into a notebook, or profiling a CSV you just uploaded, the action is always one right-click away.

Introduced in 6.7.0

Data Lineage: Richer Dataset Cards, Inline Lineage Preview & Version History

The lineage experience gets a major upgrade: more context visible upfront, version tracking built into every dataset, and a redesigned dataset catalog that lets you assess data assets without clicking into each one.

Dataset catalog with inline schema & lineage preview

The new Datasets view presents every tracked dataset as a card showing its storage location, table format (Delta, Parquet, etc.), and the full schema spec with column names and types, all visible without opening the dataset. Each card also includes a Lineage Preview mini-diagram showing upstream and downstream connections at a glance, so you can see a dataset's position in the pipeline before drilling in. Jump straight to Details or Column Lineage from any card.

Version History: schema, execution, and storage in one timeline

Open any dataset and switch to the Versions tab to see a complete version history. Each version entry shows the event type (CREATE, OVERWRITE, etc.), a timestamp, a Change Log describing what happened, the full Schema at that point in time, and Execution details, including state, duration, and run ID. Storage metadata (format, size) is tracked per version too. You can now answer "what changed, when, and what job caused it" from a single panel.

Lineage graph improvements

The lineage canvas itself gets more controls for navigating complex pipelines: Smart Job Clustering groups related jobs to reduce visual noise (with configurable min/max thresholds), Merge Edges and Orthogonal Edges toggles clean up the layout, and Depth controls let you expand or collapse how many hops are visible. Switch between Standard Pipeline and Smart layouts depending on the complexity of the graph. Layer badges (Bronze, Silver, Gold) and field-level schema remain visible on every node.

Why it matters

Lineage is only useful if you can navigate it quickly. The dataset cards eliminate the "click into everything to understand anything" problem. Version history closes the loop between lineage and governance, you know not just where data flows, but how it changed over time and which job execution caused each change. Together, these improvements make lineage a practical daily tool rather than a diagram you look at during audits.

Introduced in 6.7.0

New Feature

Edit Files Directly in Storage: From File Manager and Jupyter Notebooks

You can now edit files in place on your object storage - both from Ilum's built-in File Manager and from Jupyter Notebooks using our new open-source extension. No more downloading, editing locally, and re-uploading.

Edit in File Manager

Select any text-based file (JSON, CSV, SQL, config files, etc.) in the File Manager and open it in a full code editor modal with syntax highlighting, line numbers, and code folding. The editor shows file metadata (size, last modified, editing state) at the top, and you can Save changes directly back to storage or Cancel to discard. The file stays in place, same path, same bucket, with the modification timestamp updated automatically.

Edit in Jupyter Notebooks

The same storage files are now accessible from within Jupyter through our open-source JupyterLab Bucket Explorer extension. A file tree in the left sidebar shows your S3 buckets and folders with file metadata (name, path, writable status) on hover. Open any file as a tab in JupyterLab's editor alongside your notebooks, edit, save, and keep working without switching back to the Ilum UI. The extension connects to the same storage backend, so changes are immediately visible across both interfaces.

Open source: github.com/ilum-cloud/jupyterlab-bucket-explorer

Our JupyterLab extension is fully open source. Use it standalone with any S3-compatible storage, or let Ilum configure it automatically when Jupyter is enabled as a module.

Why it matters

Small edits to config files, schema definitions, or seed data shouldn't require a download-edit-upload cycle. Whether you're fixing a JSON schema in the File Manager or tweaking an OpenLineage spec alongside a notebook in Jupyter, the file is editable right where you found it.

Introduced in 6.7.0

file-edit-ilum-jupyter-notebook.png 225.1 KB

New Feature

DuckDB SQL Engine Lightweight Analytics with Full Lineage & Catalog Integration

Ilum now supports DuckDB as a third SQL engine alongside Spark and Trino, giving you a fast, in-process analytics engine that's fully wired into the platform, not a standalone toy.

What makes this different from standalone DuckDB

DuckDB in Ilum isn't just an embedded database running in isolation. Every query you run is tracked through data lineage, connected to the internal data catalog, and backed by the new DuckLake catalog (created automatically when you enable the module). Nothing lives only in memory, your tables, schemas, and metadata are persisted and visible across the platform, just like Spark and Trino workloads.

What you can do

Run ad-hoc analytical queries with sub-second response times - aggregations, joins, window functions - without spinning up a cluster.
Query existing tables from Ilum's data catalog directly. No need to re-register or recreate anything - if it's in the catalog, DuckDB can read it.
Track every DuckDB operation in data lineage, the same way Spark and Trino jobs are tracked. Full visibility into what was read, written, and transformed.
Use DuckDB as a query engine in Apache Superset and other BI tools connected to Ilum, dramatically speeding up dashboard queries on catalog data.
Work with the DuckLake catalog out of the box - it's provisioned automatically and integrated with the platform's metadata layer.

Three engines, one platform

With this release, Ilum offers a clear engine strategy: DuckDB for lightweight, interactive workloads, Trino for federated and medium-scale queries, Spark for heavy batch processing and ETL.

Pick the right tool per query, all from the same SQL editor, same catalog, same lineage graph. In upcoming releases, DuckDB will become the default engine for smaller workloads, making everyday data exploration even faster.

Where to find it

Open the SQL Editor, select DuckDB from the engine picker, and start querying. Your existing catalog tables are already available.

Introduced in 6.7.0

New Feature

File Analytics (from preview → quick checks → generated SQL)

You can now run lightweight analytics on a file directly from the preview modal and promote it to a table without leaving ILUM.

What you get

Create Table (generated SQL):
- Infers schema and sample values.
- Generates CREATE TABLE + INSERT … SELECT SQL (Delta/Iceberg/Hudi or plain Parquet/CSV).
- One click to Open in SQL Editor for execution and scheduling.
Analyze Data:
- Quick row count and column sniffing to confirm delimiter, header, and basic types.
- Shows the exact SQL used so results are reproducible.
Data Profile:
- Column-level stats: distincts, nulls, min/max, quartiles/IQR, skewness/kurtosis, top values, simple histograms.
- All profiling queries are visible as SQL and can be copied or opened in the editor.
Quality Checks:
- Summary dashboard (issues by type/severity, completeness by column, type distribution).
- Flags typical problems (missing values, mixed types, constant columns, weak IDs) and surfaces suggestions.
- Outputs the SQL used for checks so you can version it alongside your pipelines.
Spark Optimizations:
- Contextual hints and config snippets for faster loads (e.g., repartition/coalesce, AQE).

How to use

Open File Manager → select a file → Preview.
Use the top tabs: Create Table, Analyze Data, Data Profile, Quality Checks.
When ready, click Open in SQL Editor to run or schedule the generated statements.

Operational notes

Profiling/quality checks default to sampling to keep runs fast.

Available in version: 6.6.0

Improved Feature

File Manager (uploads, preview, profiling, “create table”)

The File Manager now covers the day-to-day work you do with objects without leaving ILUM. You can browse buckets and folders with clear breadcrumbs, quickly filter or sort by name, size, or modified time, and create new folders where you have write access. Uploads support multiple files at once with visible progress, and downloads work for single files or multi-select from the action bar.

Browse & context
- Clear breadcrumbs for cluster → storage → bucket → folder.
- Filter & Sort by name/extension, size, or modified time; quick search box.
Upload
- Multi-file upload from the toolbar (Upload).
- Progress indicator per file, safe server-side size checks.
- Works in any bucket/folder you have write access to.
Create folder
- Create new folders anywhere you have permissions (Create Folder).
Preview (no download needed)
- Inline Preview badges next to files; opens a modal viewer.
- View CSV / JSON / XML / Text content directly in the UI. Parquet/Delta support in next release.
- Streamed, size-limited preview to avoid browser timeouts, shows basic file metadata (size, modified, path).
Download
- Download single files from the list or from the preview modal.
- Bulk download from the bottom action bar when multiple items are selected.
Copy path / name
- One click to copy the fully qualified object path (for notebooks, SQL, Airflow, NiFi).
- Copy just the file name when you need it for scripts or configs.
Delete
- Delete single or multiple items via the bottom action bar.
- Requires write permissions, confirmation dialog to prevent mistakes.
Bulk actions bar
- When you select files, a bottom bar appears with: Download, Preview, Copy Path, Copy Name, Delete.
- Works across mixed selections (where the action applies).
Create Table (from file)
- From the preview modal, Create Table to register a file or folder in a chosen catalog (Hive or Nessie) and table format (e.g., Iceberg/Delta/Hudi).
- Useful for quick data onboarding without leaving the File Manager.
Permissions & safety
- All actions honor ILUM RBAC and underlying storage ACLs.
- Preview is read-only, destructive operations (delete) are gated by permissions and confirmation.

Available in version: 6.6.0

New Feature

Apache NiFi module (managed data flows)

A managed Apache NiFi runtime inside ILUM for building and operating data flows (ingest, route, transform) with NiFi’s visual canvas. Includes cluster orchestration on Kubernetes, multi-user access, and optional NiFi Registry for versioned flows.

What you can do
- Build ingestion pipelines (files, HTTP, SFTP, JDBC, Kafka, MQTT, etc.) with back-pressure, retries, and dead-letter queues.
- Parameterize flows with Parameter Contexts; promote the same flow across dev → test → prod.
- Use NiFi Registry for versioning and rollbacks of flow definitions.
- Emit operational metrics to ILUM/Prometheus; view logs and bulletins in one place.
- Land data to HDFS/S3/MinIO/Ceph; hand off to Spark/SQL jobs for table writes (Iceberg/Delta/Hudi).
- (Optional) Capture read/write events from NiFi processors for ILUM lineage (per your processors/targets).
Why it helps
- Standardizes edge and bulk ingestion without writing glue code.
- Makes operational controls (retry, rate limit, back-pressure) a first-class part of pipelines.
- Shortens time to wire up sources → landing zones → curated tables.
Compatibility
- NiFi 2.x with optional NiFi Registry.
- Works with ILUM storage backends (HDFS / object storage).
- For lakehouse table formats, use the pattern: NiFi → landing zone → Spark job → Iceberg/Delta/Hudi.
How to start
- Enable the module (Helm): --set nifi.enabled=true (adjust values for sizing, TLS, auth).
- (Recommended) Deploy NiFi Registry and connect it to the cluster.
- Set up Parameter Contexts, controller services (JDBC, SSLContext), and secrets.
- Point outputs to your landing buckets/paths, schedule downstream Spark/SQL jobs from ILUM.
Operational notes
- NiFi state and Registry need to be included in backup/DR, align RPO/RTO with your data zones.
- Configure back-pressure thresholds and bulletin alerts to avoid overload.
- Multi-tenant RBAC via NiFi/SSO, restrict controller services by environment.

https://ilum.cloud/docs/features/nifi/
https://nifi.apache.org/

Available in version: 6.6.0

New Feature

Project Nessie catalog (versioned, Git-like data catalog)

A new, versioned data catalog you can use instead of (or alongside) Hive Metastore. Nessie adds branches/tags and Git-like operations for tables, so you can isolate changes, test safely, and roll back if needed.

What you can do
- Create branches (e.g., feature_x, qa) to develop pipelines without touching main.
- Switch active branch per workspace/job and run SQL against that branch.
- Merge a branch back to main once validated, tag important points for reproducibility.
- Use it directly from the UI (branch selector/management) and from the SQL editor (run queries against the selected branch).
- Keep Hive Metastore for legacy jobs while moving new/changed tables to Nessie incrementally.
Why it helps
- Safe, zero-copy experimentation on datasets.
- Repeatable runs (pin to a tag) and quick rollback if a job/regression slips through.
- Cleaner dev→test→prod promotion with explicit merges instead of ad-hoc table swaps.
Compatibility
- Recommended with Iceberg tables, other formats depend on engine support.
- Can run side-by-side with Hive Metastore, choose catalog per job/workspace.
How to start
- Enable the Nessie module and add a Nessie catalog entry in ILUM.
- In the UI, pick your active branch before running SQL or jobs.
- Update orchestrated jobs to reference the intended catalog/branch.
Operational notes
- Lineage & versions record the catalog + branch for every read/write.
- Include the Nessie metadata store in your backup/DR plan (same RPO/RTO targets as your tables).
- If you switch the default catalog from Hive to Nessie, review jobs that assume hive paths/catalog names.

https://ilum.cloud/docs/features/catalogs/nessie
https://projectnessie.org/

Available in version: 6.6.0

New Feature

Dataset Versioning & Schema Diff (Hive/Delta)

ILUM now tracks version history for datasets and shows field-level diffs so you can audit changes and catch breaking schema updates quickly.

What’s new

Version history timeline with event labels (e.g., CREATE, DROP) and timestamps.
Schema change summary per version: counts of additions / removals / modifications.
Field-level diff chips (e.g., FIELD REMOVED product_id (long), FIELD ADDED price (long)).
Format & engine tags surfaced with each version (e.g., PARQUET, DELTA).
Compare action to diff any two versions side-by-side.
Works for tables registered in the Hive Metastore (including Delta Lake tables).

Why it’s useful

Immediate visibility into breaking changes before downstream jobs run.
Clean audit trail for governance reviews and incident retros.
Faster root-cause analysis when KPIs or pipelines regress after a change.

Where to find it

Open a dataset → Versions tab.

Improved Feature

Data Lineage Search (find datasets, columns, and jobs instantly)

Navigating big lineage graphs is painful. We’ve added a search bar to Lineage (and unified it with Jobs / Datasets views) so you can jump straight to what you need.

What’s new

Global lineage search, type a table/dataset name, column (e.g., AccountID), job name/ID, or storage path (e.g., s3://ilum-data/...) and we’ll locate and highlight the node(s) on the lineage graph.
Namespace aware - toggle Search all namespaces or limit to the current namespace; results respect your scope.
Quick actions - from results: View Details, Open SQL, or Jump to Job.
Fresh index - we index the Hive Metastore, column metadata, and lineage edges after each successful run or schema change, so search stays current.

Where: open any dataset’s Lineage tab (search at the top).

Improved Feature

Improved Spark Job Statistics

We’ve rebuilt the Job Statistics view in ILUM to expose the key Apache Spark runtime metrics, useful for debugging, capacity planning, and post-run reviews on Kubernetes.

What’s new

Top-line job status
Completion %, total tasks, active executors, and allocated memory, with manual Refresh.
Resource gauges (driver/executor/total)
- Total: memory utilization vs. cluster allocation (e.g., 14.78% of 24 GB), total cores, total executors.
- Driver: memory utilization (e.g., 4.44% of 12 GB), driver cores.
- Executors: memory utilization (e.g., 25.13% of 12 GB), memory/cores per executor, active/dead executor count, and aggregate cores/memory.
Task Health Monitor
Clear outcome summary with counts for Completed / Failed / Skipped tasks (e.g., 108 completed, 98 skipped (optimized)) and visual split. Includes Avg Task Time and Total Task Time for quick SLA checks.
Shuffle operations panel
Read/Write records and bytes with a simple chart and a read↔write ratio indicator-handy for identifying skew and unnecessary shuffle I/O.

Why it helps

Fast detection of over-allocation (low memory/CPU utilization vs. requested resources).
Immediate visibility into optimizer effects (e.g., many skipped tasks/stages).
Quick sizing and tuning decisions without scraping Spark UI pages.

Where to find it

Job → Overview.

New Feature

Smart Performance Insights for Apache Spark

ILUM now surfaces actionable run-time insights for Spark jobs, so you can spot regressions, memory pressure, and planner optimizations without digging through logs.

What’s new

Job health summary
- Job Running Efficiently when no critical issues are detected.
- Slow Job Detected when a run is N× slower than its rolling baseline (e.g., 4.0× longer than average), with a pointer to investigate data skew, resource constraints, or inefficient operations.
Stage-level memory hotspots
Flags memory-intensive stages (e.g., “4 stages used significant memory; peak 1640 MB in stage 73”) so you can tune executor memory or operators before spills/OOM.
Planner/Optimizer signal
Stage Optimization Detected when Spark (e.g., AQE) skips stages or coalesces partitions (“42 stages skipped”)—evidence that your query plan is being optimized.
Inline recommendations
Each finding ships with a short, concrete recommendation (monitor, check skew, consider memory tuning). No fluff.

How it works (brief)

Builds a per-job baseline from recent runs and raises deviations using robust statistics (not single-run variance).
Reads stage/task metrics to locate peaks (memory, CPU time) and to count skipped/coalesced stages from the optimizer.
Keeps noise low with thresholds and minimum observation windows.

Where to find it

Job → Analytics (appears after the first complete run with metrics).

Why it helps

Faster triage of Spark performance regressions (latency spikes vs. baseline).
Early warning on memory-intensive stages that precede spills/OOM.
Confirmation that Adaptive Query Execution is producing real wins (skipped stages), not just toggled on.

New Feature

Savings Opportunities - right-sizing CPU & memory for Spark on Kubernetes

ILUM now recommends vCPU and memory reductions per job based on real utilization, with clear cost impact estimates.

What’s new

Savings Opportunities panel
Detects under-utilization (CPU %, memory %) and proposes an optimal resource set (vCPUs, memory, executors). Shows the exact delta, e.g. “reduce from 12 cores / 24 GB → 10 cores / 7 GB.”
Configuration Comparison
Side-by-side Current → Optimal for:
- vCPUs (per job or preset)
- Memory (executor/container)
- Executors (kept constant when parallelism requires it)
Cost impact math
Displays hourly and projected monthly savings from your rate card; highlights the driver (CPU vs memory). Uses observed runtime + price inputs; all figures are reproducible.
Actionable output
- Copy the exact settings (Spark/K8s)
- Apply to virtual-cluster preset or open a change PR for the pipeline
- Notes include headroom assumptions and the utilization window used

How it decides

Aggregates recent runs’ CPU active time, memory peaks, and task parallelism.
Picks the smallest config that avoids spill/GC pressure and preserves recorded concurrency.
Enforces guardrails (min cores/memory, executor floor) and won’t downsize below observed peaks.

Where to find it

Job → Analytics → Savings Opportunities

Notes

Estimates depend on your price model (per-vCPU/per-GB). Update rates to keep projections accurate.

Savings Opportunities in spark.png 59.76 KB

New Feature

Resource Efficiency & Config Recommendations for Apache Spark on Kubernetes

We’ve added a new optimization suite in ILUM that converts runtime telemetry into actionable Spark configuration, useful for data engineering teams tuning Spark on Kubernetes for performance and cost.

What’s new

Resource Efficiency Score (0–100)
Radar across Memory / Executor / I/O / Tasks with per-dimension panels (e.g., memory utilization %, executor active %, I/O data-processing ratio, task balance).
Live cost signals
- Accumulated Cost (runtime × cluster rate) with wall-clock duration.
- Cost Efficiency score with waste breakdown by Memory / CPU / Executors to spot over-provisioning.
Configuration Recommendations (modes: Balanced, Performance, Cost)
Each recommendation includes a rationale, expected impact (e.g., -15% execution time), and confidence. One-click copy of the exact Spark config.

Example recommendations (auto-derived from workload patterns)

# Adaptive Query Execution
spark.sql.adaptive.enabled = true

# Adaptive partition coalescing
spark.sql.adaptive.coalescePartitions.enabled = true

# Executor memory sizing (value depends on observed usage)
spark.executor.memory = 5g

Why it’s suggested: We correlate stage metrics, skew, shuffle sizes, and executor idleness to pick safe defaults. AQE reduces expensive shuffles; coalescing prevents tiny partitions; memory sizing balances spill risk vs. idle RAM.

Where to find it

Jobs → Optimization tab: left side shows Resource/Cost scores; right side lists recommendations with rationale and impact estimates.

Why this helps (practical outcomes)

Faster iteration on Spark tuning without trawling logs.
Concrete levers for cost optimization (reduce idle executors, right-size memory, coalesce partitions).
Transparent reasoning you can review and commit via your normal workflow.

Resource Efficiency & Config Recommendations.png 238.35 KB

New Feature

9 months ago

Announcing the ERD View in Ilum

Meet ERD View, a brand-new way to model your lakehouse in Ilum. It gives teams a clean, interactive entity-relationship diagram with live operational context, so you can design schemas, reason about joins, and communicate contracts without leaving the platform.

What’s new (based on the example above)

Layer-aware modeling (Bronze → Silver → Gold)
Tables carry clear layer badges, so you can see how models progress across refinement stages.
Keys, relations, and cardinality
Primary/foreign keys are visualized on each table, and connectors display 1:N (and other) relationships right on the canvas.
Field-level schema at a glance
Each table shows columns with data types (e.g., id: long → integer, birth_date: string → date) to make type standardization and evolution obvious.
Operational context in-place
Every node includes freshness and status (e.g., OPERATIONAL, “1 hour ago”), helping you validate that your model reflects what’s actually running.
Crystal-clear joins
Relationship anchors snap to the exact key fields, reducing guesswork when authoring queries.
Zoomable, uncluttered canvas
Smooth pan/zoom and tidy connectors make larger models easy to navigate.
One-click Lineage toggle
Switch to Lineage view to follow the same entities through jobs and runs, design in ERD, verify in Lineage.

Improved Feature

9 months ago

Announcing Next-Gen Data Lineage in Ilum

We’ve shipped a major upgrade to Ilum’s lineage experience, purpose-built to help data teams see how data moves across bronze → silver → gold layers, understand job health at a glance, and trace the blast radius of any change in seconds.

What’s new

Smart Job Clustering
No more spaghetti graphs. Ilum automatically groups similar jobs into compact clusters, so complex pipelines stay readable while preserving drill-down to the underlying runs.
Layer-aware lineage (Bronze / Silver / Gold)
Instantly track how entities evolve across refinement layers. Each table card shows key fields and types, with clear upstream/downstream edges.
Operational overlays on the graph
Every job node now surfaces last run, avg duration, and success rate right on the canvas—perfect for spotting hot paths and bottlenecks during incident review.
Version-aware datasets
Open any dataset and jump to the Versions tab to see schema changes and lifecycle events (e.g., OVERWRITE) before they surprise downstream consumers.
ERD ↔ Lineage toggle
Switch between an entity-relationship view for model design and a lineage view for runtime flow—two perspectives, one source of truth.
Faster navigation
Mini-map, zoom/pan, multi-select, and clean badges (e.g., OPERATIONAL) make large graphs effortless to explore.

New Feature

9 months ago

Announcing Mage.ai Integration for Ilum Data Lakehouse

We’re excited to unveil a new integration that makes building and operating data workflows on Ilum even faster.
Mage.ai - the open-source, Python-first platform for building, running, and monitoring modern data pipelines.

Mage brings a notebook-style development experience, modular code “blocks,” and first-class support for batch, integration, and real-time streaming pipelines. It also includes native triggers, backfills, and reusable data quality checks (via Great Expectations).

Why this matters

Ilum gives you a unified, multi-cluster Spark platform with observability for logs and metrics. Pairing it with Mage’s developer-friendly orchestration lets your teams:

Ship pipelines faster: Build pipelines as readable Python blocks in a notebook-like UI, then run them reliably on Ilum-managed Spark.
Handle real-time and batch in one place: Orchestrate Spark batch jobs alongside Kafka-driven streaming workloads, all monitored through Ilum.
Reprocess safely: Use backfills to rerun historical windows after schema changes or late-arriving data.
Automate with confidence: Configure cron, event, API, or webhook triggers in UI or code; promote from dev to prod with repeatable configs.
Trust your data: Attach Great Expectations-powered validations to pipeline blocks for consistent, auditable data quality.

What you can do now

Streaming analytics: Ingest from Kafka with Mage and process on Ilum’s Spark clusters for real-time dashboards and alerts.
Batch ELT at scale: Use Mage to orchestrate PySpark transformations that land in your lakehouse tables, with lineage via Ilum’s logs/metrics.
Data quality gates: Add expectations to loader/transformer/exporter blocks so bad data fails fast before it reaches consumers.
Catch-up processing: Backfill a month of sessions after fixing a UDF; Mage spins up windowed runs against Ilum and tracks results.

Highlights at a glance

Python-first pipelines with modular blocks
Batch, integration, and streaming pipeline types
Triggers (cron/events/API/webhooks) & backfills in UI or code
Reusable Great Expectations test suites for data quality
Runs cleanly alongside Airflow or Kestra if you’re hybrid today

https://ilum.cloud/docs/features/mage/
https://www.mage.ai/

New Feature

11 months ago

Announcing Kestra Integration in Ilum Data Lakehouse Platform

We're thrilled to announce a transformative upgrade to Ilum Data Lakehouse, the integration of Kestra, the modern, open-source workflow orchestration platform! 🎉 This powerful integration significantly enhances your data management capabilities, driving greater efficiency, automation, and ease of use.

Ilum already supports workflow scheduling with Apache Airflow, but integrating Kestra introduces a dynamic new approach to handling complex data pipelines. Kestra simplifies orchestration through its declarative syntax, real-time workflow monitoring, native event-driven triggers, and comprehensive data lineage tracking capabilities.

Benefits of Kestra Integration with Ilum:

Simplified management of complex data pipelines
Enhanced real-time workflow monitoring and alerting
Efficient handling of event-driven, high-frequency workflows
Improved data lineage visibility and traceability
Seamless multi-cluster workflow management

The Kestra integration perfectly complements Ilum’s advanced architecture and operational framework, offering users a robust alternative for efficient and scalable workflow automation. Optimize your data workflows, enhance operational control, and streamline your processes with Ilum and Kestra.

Discover the full potential of the Ilum-Kestra integration and revolutionize your workflow orchestration capabilities. Learn more at Kestra.io and stay tuned for continuous updates as Ilum evolves to meet your enterprise data management needs!

https://ilum.cloud/docs/features/kestra/
https://kestra.io/

New Feature

Say Hello to n8n in Ilum! 🤖

We’re excited to announce that n8n, the powerful automation tool, is now available as a module in Ilum!

With n8n, you can build custom workflows, automate data tasks, and connect external services, all through a simple, visual interface right inside the Ilum platform.

What You Can Do with n8n in Ilum:

⚙️ Automate Data Pipelines – Trigger Spark jobs, move data between systems, or schedule SQL queries without writing scripts.
🔌 Integrate with Anything – Connect Ilum to external APIs, messaging systems, cloud storage, Slack, Git, and more.
🧠 Build Logic Visually – Drag, drop, and configure nodes to create advanced automation workflows with ease.
📅 Schedule and Monitor – Set up workflows to run on cron schedules, webhook triggers, or conditionally based on data states.

This new module brings no-code/low-code automation to your data platform, giving teams the power to create dynamic, event-driven pipelines and integrations in minutes.

Explore the n8n module today and bring your workflows to life! 🚀

New Feature

Trino SQL Support in Ilum! 🔥

We’ve added Trino SQL engine support to Ilum’s SQL interface, giving you even more flexibility and power when querying your data lakehouse.

From now on, you can use both Apache Spark SQL and Trino SQL, even within the same SQL notebook, making it easier to choose the right engine for the job without switching tools.

Key Highlights:

🧠 Dual SQL Engine Support – Mix and match Spark SQL and Trino SQL inside a single workspace.
⚡ Faster Interactive Queries – Use Trino for low-latency SQL queries over your distributed data sources.
🔌 Same UI, More Power – No need to learn a new interface. Trino is fully integrated into Ilum’s existing SQL editor.
📁 Query External Data – Trino makes it easy to connect to external sources like Hive, Iceberg, S3, and more.
🧪 Flexible Execution – Choose the best engine per query or per cell, depending on performance or data access needs.

This feature unlocks new possibilities for real-time exploration, federated querying, and fine-tuned performance optimization—right inside Ilum.
Try it today and experience multi-engine querying done right! 🚀

Improved Feature

Improved SQL Editor with Selective Query Execution! 🎯

We’ve upgraded the Ilum SQL Editor with a powerful new feature: select and run specific queries directly from a single view—no need to execute the entire notebook or clean up your workspace first.

Key Enhancements:

🧩 Query Picker – Quickly choose which query to execute from a list of all defined queries in your notebook.
🎯 Focused Execution – Run only the SQL block you need, saving time and avoiding unnecessary data scans.
🧭 Better Navigation – Easily jump between queries and manage your workflow more efficiently.
💡 Cleaner Debugging – Isolate and test specific parts of your SQL logic without affecting the rest of the notebook.

This improvement brings more control and flexibility to your data exploration and debugging workflows. Whether you're refining a single query or testing logic step-by-step, Ilum makes it faster and easier.

Give it a try and streamline your SQL journey! 🚀

New Release

Version 6.3.1 Live ! 🎉😊

This release focuses on quality-of-life improvements, platform integrations, and powerful new features to help you work smarter with Apache Spark and your data lakehouse stack.

🚀 What’s New

✅ Trino now in AIO Stack

You asked, we listened, Trino is now included in the default All-in-One (AIO) Ilum deployment. Run blazing-fast distributed SQL queries across your data lake right out of the box.

✅ New Spark Session Form for Jupyter (Sparkmagic)

No more hand-crafted JSON. Create Spark sessions in Jupyter using a streamlined form:

Select cluster
Define memory & dynamic allocation
Add Python packages
Enable auto-pause on idle
This is our most user-friendly Spark session setup yet—and we’ll continue improving it!

✅ Markdown Support in SQL Editor

Turn your SQL editor into a lightweight data notebook. Add headers, notes, and explanations with Markdown for clear, documented analysis in-line with your queries.

✅ Resource Quotas & Limits for Spark Jobs

Gain fine-grained control over Spark resources:

Set CPU/memory limits per job
Apply namespace-level quotas
Prevent overuse and ensure team fairness across shared clusters

✅ Grafana in UI Modules

Monitor your workloads in real time. Grafana dashboards are now integrated into Ilum’s UI, providing insights into Spark usage, resource consumption, and system health.

✅ Extra Buckets for Spark Storage

Now you can define additional storage buckets to separate workloads, support multi-tenant setups, or expand available storage for Spark jobs.

🛠️ Under the Hood

📦 MinIO Chart Updated — Now using ^15.x version for better performance and Helm compatibility
🧪 Jupyter SessionConfig Adjustments — Minor tweaks to improve startup behavior and flexibility

This is a minor release, but it brings major improvements to everyday workflows. We’re continuously evolving Ilum to support modern data teams—from infrastructure to analysis.

Thanks for being part of the journey. More coming soon! 🚀💡

New Feature

New Spark Session Creation Form for Jupyter Notebooks! 🚀

We’ve completely redesigned the Spark session creation form for Jupyter Notebooks (Sparkmagic), making it faster and easier than ever to configure your sessions without dealing with raw JSON.

Key Improvements:

📝 Simplified Configuration – Set all session parameters through an intuitive form—no more manual JSON editing!
⏸ Auto-Pause Checkbox – Enable automatic session pausing during idle time to optimize resource usage.
🔄 Cluster Selection – Choose which cluster your Spark session should run on, right from the form.
⚙️ Memory & Dynamic Allocation – Fine-tune resource settings, including memory limits and dynamic allocation.
🐍 Python Extra Packages – Specify dependencies to install before your Spark job starts.

This is just the beginning! 🎉 We’ll be enhancing the form with every release, making Spark session management in Jupyter even more smooth.

spark magic new session form.png 59.03 KB

New Feature

Markdown Support in Ilum SQL Section is Here! 🚀

We’ve enhanced Ilum’s SQL section with full Markdown support, transforming it into a fully capable notebook solution and a powerful alternative to our Jupyter Notebooks module.

Now, you can mix SQL queries with Markdown annotations, making it easier to document insights, explain query logic, and create interactive data reports, all within Ilum.

Key Improvements:

📝 Markdown Integration – Add rich text, headings, lists, and code snippets alongside your SQL queries.
📊 Notebook-Style Experience – Use Ilum SQL as a structured, interactive workspace for data exploration.
🔍 Better Documentation – Enhance query readability with inline comments, explanations, and formatted outputs.
⚡ Alternative to Jupyter – Enjoy a lightweight, SQL-native notebook without switching environments.

ilum spark markdown support.png 188.15 KB

New Feature

Introducing Resource Quotas & Limits for Spark Jobs! 🚀

We’ve added Kubernetes Resource Quotas, CPU Limit Range, and Memory Limit Range to Ilum, giving you granular control over resource allocation within your cluster or namespace. Now, you can isolate workloads, prevent resource overuse, and ensure fair distribution across teams.

With this update, you can set per-job CPU and memory limits, so no more worrying about a single user consuming 1TB of RAM and starving other workloads. Instead, each team can have its own dedicated and controlled resource space, improving stability, efficiency, and cost management.

Key Improvements:

Namespace-Level Resource Quotas – Define team-specific resource limits to ensure fair usage across the cluster.
CPU & Memory Limit Ranges – Set per-job CPU and memory caps to prevent a single workload from overwhelming the system.
Isolated Workspaces – Assign dedicated resource pools per team, avoiding conflicts and ensuring predictable performance.
Better Cost Control & Stability – Keep Spark workloads optimized and prevent unexpected cluster slowdowns.

This enhancement brings more flexibility and governance to your Ilum-powered Spark environment, allowing you to manage big data workloads with confidence. Try it out today and experience smarter, more efficient resource management! 🔥

New Release

Version 6.3.0 Live ! 🎉😊

Ilum 6.3.0 is here, a robust update designed to supercharge your data workflows and elevate your Spark experience. This release brings a host of new features, performance enhancements, and essential integrations that empower advanced data processing and governance for Spark users.

New Features in This Release

1. Introducing Gitea Integration: Enhancing Version Control Inside Ilum
We’re excited to announce that Gitea—a lightweight, self-hosted Git repository manager—is now built into Ilum. Manage version control for Jupyter notebooks, Spark job scripts, Airflow DAGs, and more, all without leaving the platform. This seamless integration simplifies collaboration for data engineers, data scientists, and DevOps teams.

2. Internal Security System Improved
Our new internal security system strengthens authorization and authentication. User data is securely stored in a dedicated database, with Role-Based Access Control (RBAC) enabling you to organize users into roles and groups, ensuring every action is properly permissioned.

3. Enhanced SQL UI with Notebook-Style Execution
Experience a more interactive SQL Editor that now supports notebook-style operations. Write and execute SQL queries in structured cells—just like in Jupyter Notebook—so you can explore and analyze big data more efficiently.

Other Improvements

Changed ilum-ui Service Type:
Due to issues with kubectl port-forward, we now expose a NodePort by default.
Security Configuration Enhancements:
Security-related settings in ilum-core have been moved from the config map to a dedicated Kubernetes Secret, enhancing the protection of sensitive data.
Ilum Submit Configurations for Spark SQL Engines:
Launch Spark SQL engines via the Ilum Web Application or JDBC endpoint with automatic configuration application from the selected cluster—eliminating manual configuration steps.
Embedded Git Repository:
With Gitea integrated as a module, Ilum now provides a built-in Git server for seamless version control.
Kafka Address Configuration for ilum-core:
Added the ability to set a dedicated Kafka address for the ilum-core pod, separate from the global Kafka configuration for Spark jobs.

Happy Data Processing! 🚀✨

Improved Feature

Enhanced SQL UI with Notebook-Style Execution

We've upgraded Ilum’s SQL Editor to support notebook-style operations, making it easier than ever to explore, query, and analyze big data at scale. Starting now, you can write & execute SQL in structured cells, just like in Jupyter Notebook, instead of running your Apache Spark SQL queries one by one. This enables you to run SQL or SQLite queries similar to a Jupyter Notebook. So instead of running Apache Spark SQL queries one by one, you can write and execute SQL in structured cells.

Key Improvements:
- Notebook-style SQL execution – Run and organize queries in cells for a more interactive SQL analytics experience.
- Persistent query history – Save and revisit SQL jobs for improved workflow management.
- Multi-cell execution – Execute individual cells or entire SQL notebooks for fast, scalable data exploration.
- Integrated with BI & Data Platforms – Query data stored in cloud data lakes, data warehouses, and on-prem storage via JDBC integration.

With this update, it is easier to query SQL inside Ilum and manage data lakehouse, business intelligence and big data analysis. Check it out today and meet the better way to manage your structured and semi-structured data! 🚀

New Feature

Internal Security System Improved

Simply said, this version includes a new internal security system that aims to give Ilum better authorization and authentication. This feature keeps all user data within a dedicated database and servers make use of Role-Based Access Control (RBAC) which makes sure every action made on the server is permissioned. You can now organize Users into Roles and Groups which allow you to consistently assign permissions to them across the platform.

The new framework provides default roles such as ADMIN, USER, DATA_ENGINEER and DATA_SCIENTIST to define defaults for common users. Each role has its own set of permissions that allow users to perform jobs and run services. Access is also provided to various modules that include File Explorer, Jupyter/ Zeppelin Notebooks, History Server, etc. Admin can also see the logs which capture all the activity done in the app with details.

https://ilum.cloud/docs/security/internal_security/

New Feature

Introducing Gitea Integration: Enhancing Version Control Inside Ilum

We're excited to share that we've added Gitea right inside Ilum. Gitea is a lightweight and self-hosted Git repository manager. Our new module lets teams work together freely on projects with Jupyter notebooks, Spark job scripts, Airflow DAGs and more!

By adding Gitea into Ilum means you can now keep the version control of all your data and code artifacts within Ilum itself without relying on any other Git services. Consequently, it provides easy workflows for Data Engineers, Data Scientists and DevOps staff since they can now conveniently execute their work without having to switch contexts.

Utilizing GitOps principles allows you to automate your deployments, enforce approvals and manage your infra as code. With these capabilities, you can easily monitor updates, commit changes, and revert versions quickly, ensuring reproducibility of your experiments and smooth collaboration with your team.

Get started with this integration that adds more security and compliance and enhances your workflows in Ilum. Discover how Gitea integration allows you to track updates, commit changes, and revert to older versions quickly for reproducibility of experiments.

New Release

Version 6.2.1 Live ! 🎉😊

Ilum 6.2.1 is a robust update that will supercharge your data workflows and enhance your experience with Spark.

We’re excited to bring you one of our strongest releases to date, with new features, performance enhancements, and essential integrations that enable advanced data processing and governance for Spark users.

SQL API/UI Section: Making Data Easier with Ilum.
Initiate a New Era of Querying Data! Execute SQL queries directly on your cluster! Full SQL dialect support allows you to work natively with any modern open formats like Delta Lake, Apache Iceberg, and Apache Hudi. Also, you can also use multiple engines SparkSQL, Trino(Comming) and Apache Flink(Comming).

Advanced Spark Metrics Integration.
With enhanced metrics, you can analyze your spark jobs better. Check the resources, jobs, and health of running applications quickly and troubleshoot the issues quickly. Whether you're a data engineer or a developer or a scientist, you can drive efficiency with these metrics.

Auto-Scaling for Interactive Sessions.
Efficient Resource Utilization Is The Key! Inactive interactive jobs will automatically scale down to zero after inactivity, freeing up resources for other jobs. When the activity resumes, the sessions smoothly scale back up by themselves.

Automated Spark job scheduling with cron
No need to trigger jobs by hand! With our new cron-based scheduling, you can automate your Spark job executions using a familiar cron-like syntax. Our system is integrated with Kubernetes so that your jobs run on time with automated retries and notifications.

Integration with Superset.
Get better visualisation and analytics through Superset integration. You can check the insights in your data by using Analytics, and you can share them too.

Spark Memory Settings Section.
Customize Spark's settings for memory for better performance tweaks. Adjust Spark memory settings for better resource use and improved job performance.

Integration with MLflow.
Speed up your machine learning! Using MLflow Integration, manage your ML experiments, track result and help streamline model deployment within Ilum.

Prepare to take your data journey to the next level with Ilum 6.2.1 – enhancing your analytics, Spark job management, and data interactions like never before. Get the latest version and see!

Happy Data Processing! 🚀✨

New Feature

Introducing SQL API/UI Section: Simplifying Data Interaction with Ilum

We're excited to introduce a game-changer in the world of data interaction - the brand new SQL API/UI section in Ilum! 🚀 Now, you can effortlessly run SQL queries directly on the cluster, revolutionizing the way you interact with your data.

With full SQL dialect capabilities at your fingertips, including seamless integration with modern data formats like Delta Lake, Apache Iceberg, and Apache Hudi, data querying and manipulation have never been more straightforward. Plus, beyond just SparkSQL, Ilum now supports Trino and Apache Flink, offering you a versatile toolkit to meet your diverse data processing needs.

Say goodbye to complex data interactions and hello to streamlined analytics and data management tasks directly from our platform. Whether you're diving into different data formats or exploring various processing engines, Ilum's SQL API/UI section empowers you to navigate your data environment with ease.

Join us in embracing this new era of data interaction and experience firsthand how Ilum is reshaping the landscape of data processing. Get ready to elevate your data game and unlock new possibilities with our latest SQL API/UI feature. Your data journey is about to get a whole lot smoother - start exploring today!

https://ilum.cloud/docs/features/sql_viewer/

New Feature

New Feature: Advanced Spark Metrics Integration

We are thrilled to announce the latest update to Ilum - the addition of more advanced spark metrics! 🚀 Now, you can dive even deeper into monitoring the performance and efficiency of your Spark jobs with enhanced metrics at your fingertips.

With these new metrics, you can gain valuable insights into the resource utilization, task execution, and overall health of your Spark applications. Understanding these metrics will empower you to optimize your jobs, troubleshoot any issues more effectively, and ultimately boost your productivity.

Whether you are a data engineer, a developer, or a data scientist, these advanced spark metrics will provide you with the visibility you need to make informed decisions and drive better outcomes. Say goodbye to guesswork and hello to data-driven insights that will take your Spark experience to the next level!

Keep an eye out for these new metrics in the Ilum interface, and start harnessing the power of data to supercharge your Spark jobs today. We can't wait to see how these advanced metrics will elevate your Spark experience and help you achieve even greater success in your data projects. Happy Sparking! ✨

New Feature

Introducing Auto-Scaling for Interactive Sessions

We're excited to introduce a new feature that will streamline your interactive sessions on Ilum! Say goodbye to inactive sessions taking up unnecessary resources. With our latest update, we've implemented a smart solution to automatically pause interactive sessions that have been inactive for over a week.

Here's how it works: if an interactive session remains idle for a specified amount of time, Ilum will scale it down to zero, freeing up resources for other tasks. But don't worry about losing your progress or data. As soon as ilum-core detects any activity or a request to resume the session, it will promptly scale it back to one session, allowing you to pick up right where you left off.

This enhancement not only optimizes resource utilization but also ensures a smoother and more efficient experience for all Ilum users. No more manual intervention required to manage inactive sessions. Ilum takes care of it for you, so you can focus on your work without interruptions.

Experience the convenience of seamless session management with Ilum's latest update. Stay productive, stay efficient, and let Ilum handle the rest. We're committed to enhancing your Spark session management experience, making your workflow even more effortless and enjoyable. Try it out today and see the difference!

Introducing Advanced Spark Job Scheduling with Cron-Based Automation

We're thrilled to introduce a game-changing feature to Ilum that will revolutionize the way you schedule and automate your Spark jobs! Say hello to our brand-new "Advanced Spark Jobs Scheduling" functionality. 🎉

With Ilum's cron-based scheduling, you can now effortlessly set up and manage automated Spark job executions at specific times or intervals using familiar cron syntax. This seamless integration with Kubernetes not only ensures scalability but also offers the flexibility you need to streamline your data processing tasks effectively.

Say goodbye to manual job triggering and hello to a more efficient workflow. Our intuitive interface empowers you to configure, monitor, and handle errors in your scheduled Spark jobs with ease. Plus, with features like retries and notifications in place, you can rest assured that your data processing will run smoothly and on time, every time!

Experience the convenience of automated job scheduling like never before with Ilum's latest feature. Simplify your Spark job management, boost productivity, and stay ahead of your data processing needs effortlessly. Get started today and unlock a new level of efficiency with Ilum! ✨
https://ilum.cloud/docs/features/scheduler/

New Feature

Introducing New `tag` Field for Jobs Object

We are excited to announce a new enhancement to Ilum that will take your job management experience to the next level! Introducing the new `tag` field for jobs, designed to make job identification and organization a breeze.

With the addition of the `tag` field, you can now add custom tags to your jobs, helping you easily categorize and filter them based on your specific needs. Whether you want to label jobs by project, team, priority, or any other criteria, the `tag` field allows for seamless organization and quick retrieval of information.

New Feature

Ilum Update: Hive Metastore Integration Now Live!

We are excited to announce a major enhancement to Ilum that will revolutionize how you manage your big data environments - the official release of our Hive Metastore integration feature! 🚀 Say goodbye to the beta phase. This essential tool is now a standard part of Ilum, ready to supercharge your data management capabilities.

With the new Hive Metastore integration, you can efficiently handle metadata for your big data projects, ensuring a seamless way to manage schema and metadata across different storage solutions and formats. This means less time spent wrestling with data organization and more time focusing on what truly matters - deriving valuable insights from your data.

But that's not all - the Ilum team is already looking ahead to the future. We have ambitious plans to expand our data catalog capabilities further by integrating with Unity Catalog, Polaris, and Nessie. These upcoming additions will take Ilum to the next level, empowering you to manage and govern large-scale data across distributed environments with ease. Our goal? To keep Ilum at the forefront of data lakehouse technology, supporting your complex data operations and governance needs every step of the way.

Join us on this exciting journey as we continue to innovate and elevate your data management experience with Ilum. Stay tuned for more updates, and get ready to unleash the full potential of your big data projects like never before!

New Feature

almost 2 years ago

New Feature: Integration with Apache Sedona

We are excited to announce the latest feature update for Ilum - Integration with Apache Sedona! 🚀

With this new integration, you can now leverage the power of Apache Sedona within your Spark sessions on Ilum. Apache Sedona, formerly known as GeoSpark, is a cluster computing system specifically designed for processing large-scale spatial data. The combination of Apache Sedona and Ilum's interactive Spark sessions enables seamless analysis and visualization of spatial data.

Whether you're working with geospatial datasets, location-based services, or any other spatial data applications, the Integration with Apache Sedona on Ilum empowers you to handle complex spatial queries and operations with ease. Say goodbye to the hassle of managing spatial data separately and welcome a more integrated and efficient workflow.

Ready to take your spatial data analysis to the next level? Experience the Integration with Apache Sedona on Ilum now and discover new opportunities for spatial data processing and analysis in your Spark tasks.

Improved Fixed

about 2 years ago

Version 6.1.1

Added replica count to Helm charts.
Added node affinities for architectures to helm charts
Added the ability to browse attached cluster storages
Fixed a bug with Kafka consumer configuration

New Release

about 2 years ago

Version 6.1.0

Data Lineage Support Added: Enhanced data management with new lineage tracking capabilities.
Base Storage Support for HDFS/Azure/GCS: Expanded storage options to include HDFS, Azure Blob Storage, and Google Cloud Storage.
LDAP/OAuth2 Support: Strengthened security with added support for LDAP and OAuth2 authentication methods.
YARN Web UI Access: Improved monitoring and management with access to the YARN web user interface.
Prometheus Metrics in Spark Jobs: Enabled Prometheus metrics for detailed monitoring within Spark jobs.
Hadoop Config in Config Maps: Simplified configuration by allowing Hadoop settings to be passed to Spark jobs via Kubernetes config maps.
Cluster Storages Spark Config to Spark Jobs: Streamlined job configuration with the ability to pass cluster storage settings directly to Spark jobs.
Detecting Spark Driver Pod Failure: Increased reliability with new mechanisms to detect Spark driver pod failures.
Fetching Metrics from the History Server: Enhanced analytics and troubleshooting with the ability to fetch metrics from the Spark history server.
Tags for Jobs and Groups: Improved organization and management with tagging capabilities for Jobs and Groups.
Exposing YARN Jobs for Ilum's Prometheus: Expanded monitoring options by making YARN jobs visible to Ilum's Prometheus setup.
Tons of Small Improvements: Implemented numerous minor enhancements across the platform to boost performance and user experience.

New Feature

about 2 years ago

Introducing Data Lineage Integration with OpenLineage and Marquez

With the integration of open lineage and Marquez, every Spark job in our system now comes equipped with built-in Data Lineage capabilities. This means you can easily track the journey of your data, understand its origins, transformations, and destinations, all within the familiar environment of Ilum.

Understanding the flow of your data is crucial for making informed decisions and ensuring data integrity. With Data Lineage seamlessly integrated into Ilum, you can now visualize and trace the path of your data with ease, empowering you to make data-driven choices confidently.

Take advantage of this powerful feature today and gain deeper insights into your data processing workflows. Enhance your data management experience with Ilum’s Data Lineage, setting a new standard for transparency and control in your Spark jobs! ✨

New Feature

Enhanced Security in Ilum: LDAP Integration and OAuth2 Support

We understand the importance of strong authentication mechanisms. That's why we've introduced both internal and LDAP authentication methods. With internal authentication, you can now create a static list of users and grant them access to Ilum. On the other hand, LDAP authentication enables you to leverage your existing user directory, making user management a breeze.

We've also added OAuth2 support. With Enhanced Security, we've put your data protection first. We believe that security shouldn't be complicated, and with Ilum, it doesn't have to be. So go ahead, explore the new features, and experience the peace of mind that comes with knowing your data is in safe hands.

Planned for release in 6.1.0

New Feature

New Storage Options Added: HDFS/Azure/GCS

Hey there, Ilum users!

We have some awesome news to share with you today. We've just rolled out a new feature that we know you've all been eagerly waiting for, The support for other storage options!

Up until now, Ilum has been tightly integrated with S3 interface, and we've received numerous requests from our incredible user community to expand our horizons. We've listened, and we're excited to announce that Ilum now supports not only S3, but also GCS, HDFS and Azure Blob Storage!

What does this mean for you? Well, it means you now have the flexibility to choose the storage option that best suits your needs and seamlessly connect it with Ilum. Whether you're a fan of HDFS, use Azure for your data storage, or prefer Google Cloud Storage, Ilum has got you covered.

Planned for release in 6.1.0

New Improved Fixed Release

https://www.youtube.com/watch?v=6d27js-FNHU

Version 6.0.0

UI v3
Embedded Jupyter Notebook
Embedded Airflow
Encapsulate the spark-submit processes within separate kubernetes pod
Tons of small improvements

New Feature

Introducing Ilum UI v3: An Enhanced User Interface Experience

With this update, we've taken user feedback to heart and made significant improvements to the Ilum interface, making it even easier and more intuitive to manage your Spark sessions and jobs. We believe that a seamless user experience is the key to unlocking the full potential of Apache Spark and Kubernetes, and that's exactly what we're delivering with our enhanced UI.

Our team has been hard at work, refining every aspect of the user interface to ensure a smooth and efficient workflow. We've revamped the layout, simplified navigation, and added new features that will make interacting with Ilum a breeze. Whether you're a seasoned data scientist or just getting started with Spark, UI v3 will empower you to unleash the full power of your data with ease and confidence.

New Feature