New Feature

Smart Performance Insights for Apache Spark

ILUM now surfaces actionable run-time insights for Spark jobs, so you can spot regressions, memory pressure, and planner optimizations without digging through logs.

What’s new

Job health summary
- Job Running Efficiently when no critical issues are detected.
- Slow Job Detected when a run is N× slower than its rolling baseline (e.g., 4.0× longer than average), with a pointer to investigate data skew, resource constraints, or inefficient operations.
Stage-level memory hotspots
Flags memory-intensive stages (e.g., “4 stages used significant memory; peak 1640 MB in stage 73”) so you can tune executor memory or operators before spills/OOM.
Planner/Optimizer signal
Stage Optimization Detected when Spark (e.g., AQE) skips stages or coalesces partitions (“42 stages skipped”)—evidence that your query plan is being optimized.
Inline recommendations
Each finding ships with a short, concrete recommendation (monitor, check skew, consider memory tuning). No fluff.

How it works (brief)

Builds a per-job baseline from recent runs and raises deviations using robust statistics (not single-run variance).
Reads stage/task metrics to locate peaks (memory, CPU time) and to count skipped/coalesced stages from the optimizer.
Keeps noise low with thresholds and minimum observation windows.

Where to find it

Job → Analytics (appears after the first complete run with metrics).

Why it helps

Faster triage of Spark performance regressions (latency spikes vs. baseline).
Early warning on memory-intensive stages that precede spills/OOM.
Confirmation that Adaptive Query Execution is producing real wins (skipped stages), not just toggled on.