New
Feature
Apache NiFi module (managed data flows)
A managed Apache NiFi runtime inside ILUM for building and operating data flows (ingest, route, transform) with NiFi’s visual canvas. Includes cluster orchestration on Kubernetes, multi-user access, and optional NiFi Registry for versioned flows.
-
What you can do
- Build ingestion pipelines (files, HTTP, SFTP, JDBC, Kafka, MQTT, etc.) with back-pressure, retries, and dead-letter queues.
- Parameterize flows with Parameter Contexts; promote the same flow across dev → test → prod.
- Use NiFi Registry for versioning and rollbacks of flow definitions.
- Emit operational metrics to ILUM/Prometheus; view logs and bulletins in one place.
- Land data to HDFS/S3/MinIO/Ceph; hand off to Spark/SQL jobs for table writes (Iceberg/Delta/Hudi).
- (Optional) Capture read/write events from NiFi processors for ILUM lineage (per your processors/targets).
-
Why it helps
- Standardizes edge and bulk ingestion without writing glue code.
- Makes operational controls (retry, rate limit, back-pressure) a first-class part of pipelines.
- Shortens time to wire up sources → landing zones → curated tables.
-
Compatibility
- NiFi 2.x with optional NiFi Registry.
- Works with ILUM storage backends (HDFS / object storage).
- For lakehouse table formats, use the pattern: NiFi → landing zone → Spark job → Iceberg/Delta/Hudi.
-
How to start
- Enable the module (Helm): --set nifi.enabled=true (adjust values for sizing, TLS, auth).
- (Recommended) Deploy NiFi Registry and connect it to the cluster.
- Set up Parameter Contexts, controller services (JDBC, SSLContext), and secrets.
- Point outputs to your landing buckets/paths, schedule downstream Spark/SQL jobs from ILUM.
-
Operational notes
- NiFi state and Registry need to be included in backup/DR, align RPO/RTO with your data zones.
- Configure back-pressure thresholds and bulletin alerts to avoid overload.
- Multi-tenant RBAC via NiFi/SSO, restrict controller services by environment.
