New
Feature
Smart Performance Insights for Apache Spark
ILUM now surfaces actionable run-time insights for Spark jobs, so you can spot regressions, memory pressure, and planner optimizations without digging through logs.
What’s new
-
Job health summary
- Job Running Efficiently when no critical issues are detected.
- Slow Job Detected when a run is N× slower than its rolling baseline (e.g., 4.0× longer than average), with a pointer to investigate data skew, resource constraints, or inefficient operations.
-
Stage-level memory hotspots
Flags memory-intensive stages (e.g., “4 stages used significant memory; peak 1640 MB in stage 73”) so you can tune executor memory or operators before spills/OOM. -
Planner/Optimizer signal
Stage Optimization Detected when Spark (e.g., AQE) skips stages or coalesces partitions (“42 stages skipped”)—evidence that your query plan is being optimized. -
Inline recommendations
Each finding ships with a short, concrete recommendation (monitor, check skew, consider memory tuning). No fluff.
How it works (brief)
- Builds a per-job baseline from recent runs and raises deviations using robust statistics (not single-run variance).
- Reads stage/task metrics to locate peaks (memory, CPU time) and to count skipped/coalesced stages from the optimizer.
- Keeps noise low with thresholds and minimum observation windows.
Where to find it
Job → Analytics (appears after the first complete run with metrics).
Why it helps
- Faster triage of Spark performance regressions (latency spikes vs. baseline).
- Early warning on memory-intensive stages that precede spills/OOM.
- Confirmation that Adaptive Query Execution is producing real wins (skipped stages), not just toggled on.