Data platforms hold the crown jewels of every organization: customer records, financial transactions, personally identifiable information (PII), health records, and proprietary analytics. If you can query it, someone in compliance is worrying about who else can query it.
In regulated industries — banking, government, healthcare, telecom — security is not optional. It is a mandatory checkbox that gets evaluated before features, performance, or price. Three things drive this reality:
- CISOs and compliance teams have veto power. They may not choose the platform, but they can block any platform that fails their review. If a system can't answer their questions, it doesn't make it through procurement — regardless of how fast its queries run.
- Regulated industries are where enterprise-scale deployments live. Banks, government agencies, and telecoms require the highest standard of security as a baseline condition.
- Security breaches are career-ending events. Nobody gets fired for choosing the more secure option. Decision-makers are risk-averse on security for very personal reasons.
A platform that can't demonstrate security controls loses the deal regardless of features.
Security Concepts in Plain Language
Here are the core security terms that come up in any serious data platform evaluation.
Authentication
Proving your identity before you get access. Think of it as the security badge at the building entrance — you swipe your card, and the system confirms you're a real employee.
In practice: Keycloak with OIDC/OAuth2, integrating with Active Directory, LDAP, and any corporate SSO.
Authorization (RBAC)
Once you're in the building, which rooms can you enter? Role-Based Access Control means your access is defined by your role — "analyst," "data engineer," "admin" — not configured per-person.
In practice: Apache Ranger policies, centrally managed and applied across all query engines.
Row-Level Security
Different users see different rows in the same table. A regional manager in one city only sees that city's data; a manager in another city sees only theirs. Same table, different views — automatically enforced.
In practice: Ranger row-level filters applied consistently across StarRocks, Impala, Trino, and Spark.
Column-Level Masking
Sensitive columns — SSN, salary, email, phone number — are masked or hidden based on role. An analyst sees ***-**-1234 while an authorized HR user sees the full value.
In practice: Ranger column masking policies, with the same policy enforced across all engines.
Encryption at Rest
Data files stored on disk are encrypted. If someone steals a hard drive, they get gibberish. This is a compliance checkbox in virtually every regulation.
In practice: Storage-level encryption via S3/MinIO server-side encryption and Kubernetes Secrets encryption.
Encryption in Transit (TLS/mTLS)
Data moving between components — browser to server, engine to storage, service to service — is encrypted. Nobody can intercept it by sniffing the network.
In practice: TLS everywhere via cert-manager with auto-rotating certificates; mTLS between internal services.
Audit Trail
Every query, every data access attempt, every login is logged. When an auditor asks "who accessed this table in the last 90 days?" — you have the answer in seconds.
In practice: Full audit trail written to OpenSearch — every query and access attempt, searchable and exportable.
Data Sovereignty
Laws require data to stay within specific country borders. EU data must stay in the EU, GCC data must comply with local sovereignty requirements. The platform must guarantee data never leaves the jurisdiction.
In practice: On-premise Kubernetes — data never leaves the customer's data center. No cloud call-home.
Air-Gap Deployment
The system operates with absolutely zero internet connectivity. No outbound network access. No license phone-home. No telemetry. No update checks. The platform runs completely isolated from the outside world. This is mandatory for government, defense, and critical infrastructure.
In practice: Air-gapped by default. Helm charts plus a private container registry equals fully offline. No special tooling needed.
The Regulatory Landscape
Security requirements don't emerge in a vacuum — they are driven by specific regulations. Here is what's driving the conversation across industries.
Banking and Finance — Basel III, PCI DSS, Central Bank Regulations
Basel III mandates operational risk management including data security. PCI DSS governs any system touching payment card data — strict access control, encryption, audit trail. Every country's central bank adds local requirements on top.
Healthcare — HIPAA and Local Equivalents
HIPAA requires "minimum necessary" access to patient data, full audit trails, encryption, and breach notification. Other countries have similar medical data protection laws. Row-level security is critical — a doctor should only see their own patients' records.
Government and Defense — Air-Gap Mandatory, Sovereign Data
Classified and sensitive government systems must be air-gapped. No internet connectivity, period. Data sovereignty is non-negotiable. This immediately eliminates all SaaS vendors and most cloud-dependent platforms.
EU / International — GDPR
Europe's General Data Protection Regulation covers data residency requirements, the right to deletion (organizations must be able to find and delete a person's data), consent tracking, and breach notification within 72 hours. Fines reach up to 4% of global revenue.
GCC / Middle East — Data Localization and Sovereignty
Countries across the GCC region are enacting strict data sovereignty laws. The UAE's Federal Decree-Law on Data Protection, Saudi Arabia's PDPL, and similar frameworks require personal data to remain within national borders and mandate specific technical security controls. Air-gap capability and on-premise deployment are frequently required for government and financial sector systems.
Telecom — Subscriber Data Protection
Telecom operators hold subscriber data (call records, location, usage patterns) under strict regulation. Lawful intercept readiness is mandatory in most jurisdictions. Access control and audit trail on subscriber data are non-negotiable.
Cross-Industry — SOX (Sarbanes-Oxley)
Any publicly traded company must ensure financial reporting data integrity. Access controls on financial data, an audit trail of who changed what, and segregation of duties are required. Applies to any data platform that feeds financial reports.
Cross-Industry — ISO 27001
The international standard for information security management. Many enterprises require ISO 27001 compliance from the platforms they deploy. It covers access control, encryption, monitoring, and incident response.
The Multi-Engine Security Challenge
Modern data platforms often run multiple query engines: a fast OLAP engine for dashboards, a SQL engine for ad-hoc queries, a distributed engine for federation, a big data engine for ETL. Each engine was built by a different project, with a different security model.
When a deployment runs multiple data tools, security must be maintained separately in each one:
- Define a row-level policy in StarRocks — does it apply in Trino? No. Different policy engine.
- Create a role in Impala — does it exist in Spark? No. Separate user management.
- Mask a column in one engine — a user queries the same data through another engine and sees the unmasked value.
- Audit trail? Four separate logs in four different formats.
The result: security policies inevitably drift. Engine A has the updated policy, Engine B still has last month's version. A compliance auditor finds the gap.
Consider a building with four entrances, each managed by a different security company. They each have their own badge system, their own access lists, their own visitor logs. Somebody gets terminated? You have to call all four companies to revoke access. Miss one, and you have a breach. That's what multi-engine security looks like without a unified policy layer.
Unified Security Architecture
The key architectural principle for securing a multi-engine data platform is one security policy, enforced across all engines. Define a policy once and it is automatically applied everywhere.
A robust implementation has three layers that work together:
Layer 1 — Authentication: Keycloak
Keycloak serves as the central identity provider. It handles login, SSO, multi-factor authentication, and integrates with whatever the organization already uses — Active Directory, LDAP, OIDC, SAML. One login equals access to the entire platform. No separate credentials per engine.
Layer 2 — Authorization: Apache Ranger
Apache Ranger is the central policy engine. All access control policies — table access, row-level security, column masking — are defined once in Ranger's UI and enforced everywhere.
Layer 3 — Engine Integration: Profile Parsers and Policy Mappers
Open-source Ranger doesn't natively understand every query engine. The gap is bridged with custom components:
| Component | What It Does | Why It Matters |
|---|---|---|
| Profile Parsers | Read engine-specific settings and configuration, convert them into Ranger-compatible policy definitions | Ranger always has an accurate picture of what resources exist in each engine |
| Policy Mappers | Convert Ranger policies into engine-specific ACLs for StarRocks, Impala, Trino, and Spark | One policy in Ranger = four engine-specific enforcement points. No drift, no gaps. |
| Keycloak–Ranger User Sync | Automatically syncs users, groups, and roles from Keycloak to Ranger | New user in Active Directory automatically appears in Ranger with correct group membership. No manual sync. |
Encryption and Certificates
TLS everywhere via Kubernetes cert-manager. Certificates are automatically generated, rotated, and renewed — no manual certificate management. mTLS between internal services means even internal network traffic is encrypted.
Unified Audit Trail
Every query from every engine flows to a single destination — one place to search, one place to export for auditors. A query like "show me every access to the customers table in the last 90 days across all engines" returns a single, coherent result.
Air-Gap by Default
A properly architected on-premise deployment requires zero outbound network access. No license servers to phone home to. No telemetry. No cloud dependencies. This isn't a special configuration — it should be the default deployment model.
Security Capability Comparison
How common data platform options compare on security capabilities:
| Capability | Unified Kubernetes Platform (e.g., Alphyn) | Oracle | Teradata | Snowflake | Cloudera | Starburst |
|---|---|---|---|---|---|---|
| Unified Auth Across Engines | Yes — single IdP + policy engine across 4 engines | Single engine only | Single engine only | Single engine only | Ranger, but per-service config | Trino only |
| RBAC | Yes — centralized | Yes | Yes | Yes | Yes | Yes |
| Row-Level Security | Yes — unified across engines | Yes (VPD) | Yes | Yes | Yes — Ranger | Yes |
| Column Masking | Yes — unified across engines | Yes (Redaction) | Limited | Yes | Yes — Ranger | Yes |
| Encryption at Rest | Yes | Yes (TDE) | Yes | Yes | Yes | Yes |
| Encryption in Transit | Yes — TLS/mTLS everywhere, auto-rotating certs | Yes | Yes | Yes | Configurable, not default | Yes |
| Audit Trail | Yes — unified, single destination | Yes (Unified Audit) | Yes | Yes | Per-service logs | Query log only |
| Air-Gap Deployment | Yes — default model | Yes — very high cost | Yes — very high cost | No — SaaS only | Partial — some components need connectivity | Possible, not default |
| SSO / OIDC | Yes — native | Yes | Yes | Yes | Yes | Yes |
| Data Sovereignty (On-Prem) | Yes — Kubernetes on-prem | Yes — appliance on-prem | Yes — appliance on-prem | No — cloud regions only | Yes — on-prem | On-prem option available |
The key observation: most vendors can check individual security boxes. The differentiator for multi-engine platforms is unified enforcement — one policy, one audit trail, one identity provider. When a deployment uses multiple tools managed independently, it has multiple security silos.
Air-Gap Deployments
Air-gap is often the first filter in government, defense, critical infrastructure, and banking in certain jurisdictions. If a platform can't operate air-gapped, it is out of the conversation before it starts.
Who requires air-gap:
- Government and defense — classified systems, national security data
- Critical infrastructure — power grids, transportation, water systems
- Banking — in regulated jurisdictions (GCC, parts of Asia, Central Asia), core banking data platforms must be air-gapped
- Healthcare — some hospital networks and research systems
What air-gap means for vendor selection:
| Vendor | Air-Gap Capability | Reality |
|---|---|---|
| Snowflake | Impossible | SaaS-only. No on-premise deployment option. Eliminated immediately. |
| Databricks | Impossible | Cloud-native SaaS. No air-gap path. Eliminated immediately. |
| On-Prem Kubernetes Platforms | Default | Helm charts + private container registry. Standard Kubernetes air-gap pattern. No license server call-home. Air-gap is the default, not an add-on. |
| Oracle Exadata | Possible, expensive | On-prem appliance can be air-gapped. But proprietary hardware, massive cost, and Oracle licensing in air-gap require special arrangements. |
| Teradata VantageCore | Possible, expensive | On-prem hardware appliance can be air-gapped. Same story as Oracle — very high cost, proprietary infrastructure. |
| Cloudera CDP Private | Partial | Can run on-prem, but some components expect internet access for updates, license validation, or management console connectivity. Not air-gap by default. |
| Starburst / Dremio | Possible, not default | Can be deployed on-prem on Kubernetes. Air-gap is possible but not the designed-for deployment model. License management may need special handling. |
A Kubernetes-based platform doesn't need special "air-gap mode" because there's nothing to turn off. There is no telemetry to disable, no license server to redirect, no cloud dependency to work around. It's just Kubernetes, Helm charts, and container images. If you can run Kubernetes, you can run it — air-gapped or not.
Key Questions for Evaluating Any Data Platform on Security
When evaluating a data platform's security posture, these questions surface the most important gaps:
On data sovereignty and air-gap: Does the platform have air-gap or data sovereignty requirements? Are there specific jurisdictions where data must physically reside? This immediately filters the competitive field — SaaS vendors are eliminated the moment the answer is yes.
On policy consistency across tools: How are security policies managed across different data tools? If an access policy changes in one tool, how does that change propagate to other tools? Most environments will reveal that policies are managed separately per tool — the multi-engine drift problem described above.
On compliance history: When was the last compliance audit? Were there findings related to data access controls? Audit findings are pain points with remediation budget attached. A unified audit trail and consistent policy enforcement directly address the most common findings.
On data segmentation: Do different teams need to see different subsets of the same data? How is that enforced today? Most organizations handle this through application logic or manual views — fragile, error-prone, and hard to audit. Policy-driven, engine-agnostic row-level security is the robust alternative.