Architecture

StarRocks Instead of Oracle for Mixed Analytical Workloads

Can StarRocks replace Oracle for mixed OLTP+OLAP workloads? Real load-test results: sub-second OLTP latency, resource groups, compaction tuning, and migration methodology.

by Alphyn.AI Team·18 min read

StarRocks is well known for its strong analytical performance, and the Alphyn.AI team regularly publishes load-testing results for the Alphyn Lakehouse platform. But today we want to explore a less obvious use case: OLTP workloads.

To be clear about what "OLTP" means in this context: we're not talking about classical transactional throughput in the tens of thousands of TPS typical of operational systems. We mean sub-second latency on both reads and — critically — key-based writes under concurrent analytical load. That is precisely what StarRocks claims to deliver:

Deliver sub-second latency at scale with an engine purpose-built for modern CPUs and complex SQL.


Background

One of our clients runs a production system built on Oracle. The workload is genuinely mixed: OLAP queries from ELT processes, transactional load from analytical applications, and access to pre-aggregated data and reports.

That's everything we love about Oracle — one engine that handles it all. Unfortunately, the current solution has two serious problems:

  1. The system cannot scale;

  2. Migration off Oracle is required.

The obvious answer seems to be a heterogeneous stack — for example, Greenplum + Postgres + optionally ClickHouse, which is a common recommendation in the market.

It works, but comes with clear drawbacks:

Drawback

Impact

Data duplicated across multiple databases

Higher total cost of ownership, additional key synchronization, duplicated business logic, etc.

Data synchronization between databases

Reduced SLA due to replication lag, increased operational overhead

Different SQL dialects

Higher skill requirements, adapting applications for each database

Increased architectural complexity

Multiple ops teams, three hardware footprints, and consequently very high TCO

The client and our team wanted to stay within the paradigm of a single universal platform while gaining horizontal scalability. That's how the evaluation of StarRocks for this demanding, non-trivial use case began.

The success criteria were:

  1. Concurrent processing of tens of OLAP-profile Read/Write SQL queries;

  2. Concurrent processing of hundreds of OLTP-profile Read/Write SQL queries with sub-second response time;

  3. Concurrent processing of mixed-profile (OLAP+OLTP) queries with sub-second response time.

Concurrent processing here means simultaneous read, write, update, and delete operations on the same tables by ETL processes, applications, and BI tools — that is, a true OLTP/OLAP mixed workload.

Test Setup

  1. Alphyn Lakehouse StarRocks 3.5.9 in a shared-nothing architecture;

  2. A data model of 25 tables closely mirroring the Oracle DDL:

    • Primary Key tables (same PK structure as Oracle);

    • Replication factor of 3;

    • Columnar storage format;

    • zstd(3) compression.

  3. Anonymized data, closely matched to production characteristics:

    • Matching cardinality;

    • Matching partition distribution;

    • Referential integrity consistent with the original Oracle model.

  4. SQL queries:

    • Several dozen real OLTP/OLAP queries adapted to StarRocks syntax and parameterized with random values;

    • Result set cardinality comparable to the original Oracle model.

  5. Test scenarios:

    • OLTP-only: concurrent read/write using only OLTP queries;

    • Mixed profile: concurrent read/write using all query types (OLTP+OLAP).

  6. Both database-level and OS-level caching fully disabled;

  7. No per-query optimizations permitted (no materialized views, pre-sorted materializations, or similar);

  8. Cluster configuration, DDL, and SQL set once and not changed during testing;

  9. Scenarios executed from JMeter with automatic logging;

  10. Results verified twice: by the client and by an independent auditor.

Cluster Specifications

Infrastructure: cloud IaaS, virtualization: Kubernetes.

Test Environment

Host

CPU

RAM

Disks

Network

1×FE (front end)

16

32 GB

2 × SSD 838 GB

10 Gbps

4×BE (backend)

40

200 GB

4 × SSD 3576 GB

10 Gbps

It is worth noting that the disk configuration (particularly the disk count) is far from optimal for an MPP platform, and a cloud environment with 10 Gbps networking always imposes certain constraints. That said, getting ahead of ourselves slightly — the results are impressive nonetheless.

JMeter Scenario Description

  1. A specified number of sessions is started; each session randomly selects a query of the appropriate type (OLTP/OLAP, read/write);

  2. Each query is assigned one or more random parameters (ID, date, etc.) that target an arbitrary portion of the table;

  3. The query executes;

  4. The cycle repeats continuously for 120 minutes.

Load Test Results

1. Classic OLTP Profile

The first question was: what is the maximum QPS for queries in the "classic operational OLTP" scenario? In other words, key-based access patterns like:

select * from TABLE  where id=<pk>
insert into TABLE values(...)
update TABLE set atr=<> where id =<pk>
delete from TABLE where id=<pk>

Reads and writes tested separately, 100 sessions each:

Concurrency

Read/Write

Avg. query time, s

QPS

100 sessions

Read

0.025

3,036

100 sessions

Write

0.332

573

Concurrent reads and writes to the same table, 100 sessions:

Concurrency

Read/Write

Avg. query time, s

QPS

100 sessions

Read — 50 sessions

0.152

302

Write — 50 sessions

0.587

85

A reminder: "100 sessions" means 100 open connections to StarRocks, each continuously firing arbitrary parameterized queries — picking up new queries as soon as the previous ones complete — for 120 minutes straight. All with caching disabled.

FrontEnd node load
FrontEnd node load
BackEnd node load
BackEnd node load
  • The bottleneck is the FE coordinator, which saturates its 16 available vCPUs;

  • BE node resource utilization (CPU, RAM, I/O) is low across the board.

In this experiment we hit the current coordinator configuration limit.

Key point: the StarRocks coordinator scales horizontally. This bottleneck can therefore be addressed through standard means without any architectural changes.

One additional feature worth highlighting here is prepared statements.

More detail

A Prepared Statement is a pre-compiled SQL query stored on the server that can be executed repeatedly with different parameters.

When a prepared statement is executed, the SQL parsing and CBO plan preparation stages are skipped, resulting in roughly 30% better performance compared to the same query run with varying parameter values each time.

In StarRocks, Prepared Statements are implemented via the MySQL Binary Protocol. This splits the query lifecycle into three stages:

  1. Prepare
    The client sends a SQL query template (e.g., SELECT * FROM users WHERE id = ?). The Frontend (FE) parses the query, performs basic validation, and caches the structure internally;

  2. Execute
    The client sends only the parameter values. The FE binds them to the cached plan and dispatches the query for execution;

  3. Close (Deallocate)
    The prepared statement is removed from FE memory.

Server-side vs. Client-side

This distinction matters. Many drivers (especially JDBC) emulate prepared statements on the client side by default — simply interpolating strings. To get real performance gains in StarRocks, you need Server-side Prepared Statements (e.g., via useServerPrepStmts=true in JDBC).

Benefits of Prepared Statements

Prepared statements reduce FE load by:

  • Skipping parse and plan stages
    For simple point-lookup queries, these stages can account for 30–50% of total processing time;

  • Binary Protocol
    Transmitting data in binary format is more efficient than parsing text, especially for large parameter sets or specialized types (Date, Decimal).

Prepared statements are also an effective defense against SQL injection, since parameters are transmitted separately from the query logic and are never interpreted as SQL.

StarRocks 3.x introduced a dedicated optimization for Prepared Statements targeting Point Select (primary-key lookups).

When a query matches the criteria (a simple PK lookup on a Primary Key model table), StarRocks takes a "short path":

  • The query bypasses the standard heavyweight optimizer;

  • The FE communicates directly with the BE node holding the relevant data.

This enables tens of thousands of QPS per node, bringing StarRocks performance close to that of key-value stores.

While API drivers are most commonly used in practice, at the SQL level this looks like:

PREPARE select_orders FROM
    'SELECT
        order_id,
        customer_id,
        amount,
        status
     FROM orders
     WHERE customer_id = ?
       AND order_date >= ?
       AND status = ?';

-- 2. Set parameters
SET @customer_id = 12345;
SET @order_date  = '2024-01-01';
SET @status      = 'completed';

-- 3. Execute
EXECUTE select_orders USING @customer_id, @order_date, @status;

A few practical considerations when adopting this feature:

  • The largest gains appear on Primary Key tables. On Duplicate or Aggregate models, the benefit is limited to saved CPU on parsing;

  • Unlike classical OLTP databases (PostgreSQL/Oracle), StarRocks caches the plan only within a session. If the connection drops, PREPARE must be re-executed;

  • For complex analytical queries with many joins and aggregations, saving 1–2 ms on parsing is negligible compared to seconds of actual query execution.

In the version tested, prepared statements in StarRocks were supported only for SELECT queries.

2. OLTP Profile — Real-World Case

The key difference from the previous scenario is that, beyond classical key-based OLTP queries, this scenario also includes queries that look like OLTP but are structurally more like OLAP — highly selective analytical queries.

For example:

select * from TABLE
where region_id = 6
and object_id = 12345
and timestamp('2025-06-30') between date_from and date_to

None of the filter attributes is a primary key, yet the query returns a single row. In the production Oracle system, this query runs in tens to hundreds of milliseconds — and only because of a composite index.

Results after 2 hours of continuous operation:

Concurrent sessions

Read/Write

Avg. query time, s

QPS

25 sessions

Read (70%)

Write (30%)

0.039

556

50 sessions

0.049

903

75 sessions

0.064

1,060

100 sessions

0.083

1,129

Breakdown by DML type:

Concurrency

Read/Write

Avg. query time, s

Avg. p95, s

QPS

100 sessions

Read

0.061

0.037

1,085

Update

0.636

0.290

6

Insert

0.592

0.250

33

Delete

0.789

0.420

5

DML execution times by type
DML execution times by type
FrontEnd node
FrontEnd node
BackEnd node
BackEnd node

In this scenario the cluster load profile shifted: the coordinator (FE) is running at roughly one-third capacity, while the BE nodes are at 100% CPU utilization.

This means the constraint is BE resources — and BE nodes scale horizontally.

Note: OLAP-style queries that look like OLTP (non-key-based) do not use the PK index and run longer, holding CPU for a greater period and reducing overall throughput.

How much does query performance differ between key-based access and non-key OLAP-like queries?

The table below shows average execution times for both types of queries — key-based SELECT and non-key SELECT — against the same tables from the same test run:

Example

Query type

Avg. execution time, ms

Table 1

Read by key

30.15

Non-key read

84.16

Table 2

Read by key

29.81

Non-key read

138.26

Table 3

Read by key

30.57

Non-key read

96.77

The latency increase on non-key reads is real, but not dramatic. How many systems do you know that can search a non-indexed, non-pre-sorted column at sub-200 ms response times?

This performance is delivered by ZoneMap indexes. StarRocks implements two-level ZoneMap indexes that are highly effective at eliminating data blocks from scans — dramatically reducing the volume read, which is why millisecond response times hold even for these patterns.

Reference: A ZoneMap index stores per-chunk statistics including Min, Max, HasNull, and HasNotNull. At query time, StarRocks uses this metadata to determine which chunks can be skipped entirely before reading any data.

More detail

A ZoneMap index stores per-chunk statistics — Min, Max, HasNull, and HasNotNull — for each data block. At query execution time, StarRocks uses this metadata to determine which blocks can be skipped based on filter predicates, reducing the volume of data scanned and improving query speed. Each block can represent a segment or a column data page, so there are two tiers of ZoneMap indexes: one stores statistics per segment, the other per column page. The developers followed best practices that have been established in the Parquet file format for over a decade. And just like Parquet, ZoneMap indexes are created automatically on write — no manual index creation or maintenance (unlike BRIN in Greenplum), and with significantly higher effectiveness.

How ZoneMap indexes work
How ZoneMap indexes work

3. Mixed Profile

This scenario tests how different query groups behave under a workload that closely resembles real production conditions. We run three groups of queries simultaneously: ETL, OLAP, and OLTP. Does sub-second response time hold?

The test mechanics are unchanged: all query types read from and write to the same tables simultaneously. ETL queries — which involve bulk write/update operations on target tables with analytical joins in the SELECT — are added to the mix:

insert into TARGET
select ……… from T1
join T2
join T3

For concurrent mixed workloads, the recommended design practice is to use resource group settings.

Resource group configuration used:

OLAP+ETL

ALTER RESOURCE GROUP olap_etl WITH (
    'concurrency_limit' = '24',
    'cpu_weight' = '40',
    'mem_limit' = '0.9',
    'spill_mem_limit_threshold' = '0.9'
);

OLTP

ALTER RESOURCE GROUP oltp WITH (
    'cpu_weight' = '10',
    'exclusive_cpu_cores' = '0',
    'mem_limit' = '0.1'
);

Concurrent sessions

Query type

Avg. query time, s

QPS

100 sessions

OLTP

0.122

150.92

OLAP

39.520

1.26

ETL

43.800

0.68

TOTAL

0.643

152.87

Resource group configuration can be changed dynamically:

ALTER RESOURCE GROUP olap_etl WITH (
    'concurrency_limit' = '24',
    'cpu_weight' = '20', -- <--- reduced from 40 to 20
    'mem_limit' = '0.9',
    'spill_mem_limit_threshold' = '0.9'
);

By reducing cpu_weight for the olap_etl group, CPU is reallocated to a 1/3 (OLTP) vs. 2/3 (OLAP+ETL) split.

Results after rebalancing:

Concurrent sessions

Query type

Avg. query time, s

QPS

100 sessions

OLTP

0.070

291.56

OLAP

44.120

0.91

ETL

47.520

0.69

TOTAL

0.320

293.17

FrontEnd node
FrontEnd node
BackEnd node
BackEnd node

BE node resource utilization: resource groups allow the cluster to approach the CPU ceiling in a controlled manner.

Execution times across all query types are stable and predictable.

Workload can be adjusted dynamically via resource groups — for example, applying different configurations during different time windows to match the expected workload profile. Sub-second OLTP response time is maintained throughout; the CPU budget allocated to the OLTP group directly determines its achievable QPS.

4. The Impact of Compaction

Nothing is perfect.

During extended continuous operation in our load test, StarRocks exhibits occasional periodic spikes in OLTP query latency — sometimes reaching several seconds, when the normal time is under 100 ms.

This behavior is not visible under OLAP-only load.

Compaction

QPS

OLTP

OLAP, s

Read, s

Modifications, s

P95, s

Max, s

Disabled

152

0.093

0.414

0.055

9.110

39.520

Every 5 minutes

219

0.061

0.342

0.049

5.440

47.050

Every 15 minutes

201

0.069

0.364

0.053

7.990

40.810

Default (1 minute)

86

0.140

4.538

0.098

20.220

55.100

The spikes are caused by the Compaction process (StarRocks storage format maintenance), which holds locks on tablet metadata during execution. Queries are blocked until compaction completes.

Compaction performance depends on disk subsystem throughput and the CPU/memory resources allocated to it.

Disabling Compaction on frequently mutated data is harmful to performance — version accumulation grows and degrades query latency over time. Compacting too frequently causes too many lock events. The default settings are completely unsuitable for production use.

For production environments, you need to select an optimal maintenance strategy that accounts for your hardware capabilities and the resources allocated to the compaction process.

Impact of Compaction frequency
Impact of Compaction frequency

Conclusions

  • OLTP query throughput exceeds 1,000 queries/s, with average response time under 100 ms;

  • StarRocks handles mixed workloads effectively, keeping OLTP responses under 1 second (122 ms average) even alongside concurrent OLAP and ETL activity;

  • In a properly sized production configuration, performance is expected to be significantly better.

Let's summarize the key takeaways.

Technical Considerations

  1. StarRocks can handle mixed workloads combining short read/write queries with OLAP queries typical of ETL transformations and analytical tasks;

  2. For high-frequency OLTP with minimal latency, use Primary Key table models;

  3. The FE coordinator can become a bottleneck under heavy query concurrency. Production sizing should include multiple FE nodes not just for high availability, but for load balancing;

  4. Prepared statements improve OLTP performance by roughly 30% through soft parsing of repeated query templates. Note that the plan cache is session-scoped (in the tested version, prepared statements are supported only for SELECT);

  5. Timely Compaction must be configured to match the expected OLTP write rate. The right frequency balances acceptable tail-latency deviation from your SLA target against the resources allocated to the process;

  6. In mixed workload scenarios, resource groups are mandatory;

  7. A StarRocks cluster can be dynamically reconfigured to meet performance requirements as load and data volumes change: cluster parameters controlling compute resources, concurrency, and resource group assignments do not require a cluster restart. This requires either implementing the orchestration logic yourself or using a complete platform like Alphyn Lakehouse, which provides dynamic scaling out of the box.

Migration Methodology

Every client running a complex analytical system naturally wants to move to new technology with minimal changes to application logic and services. The goal is always to preserve accumulated expertise and established development patterns, reduce operational risk, reuse existing integration tests and data quality frameworks, and so on.

This is where the real challenge begins. StarRocks performed excellently in our testing across concrete practical scenarios — but to fully cover every Oracle "we want it to work exactly like this" requirement, there is still work to be done.

The Alphyn.AI team, building the Alphyn Lakehouse platform, has developed additional StarRocks capabilities to fill these gaps, including:

  • Support for procedural code closely aligned with Oracle PL/SQL notation;

  • Local-disk materialization of compute-deduplication for CTE evaluation;

  • A mechanism for collecting and persisting extended query history from Alphyn Lakehouse StarRocks into a unified logging and audit system, including visual query plan rendering for AI agents;

  • Specialized high-speed direct import/export connectors for extending Alphyn Lakehouse StarRocks federation capabilities.

With these additions in place, our formula for a successful migration looks like this:

  • Migrate the data model as-is, accounting for type mapping without losing precision or semantics;

  • Pre-refactor cursor-based procedural logic on the Oracle side before migration (cursor-heavy patterns are a poor fit for analytical systems, and fortunately there are very few of them in practice);

  • Eliminate trigger-based processing logic (quick fixes that found a permanent home in a few places);

  • Only then migrate Oracle PL/SQL procedural code to Alphyn Lakehouse StarRocks LPSQL, with minimal changes to the procedural logic itself.

The Alphyn.AI engineering team is currently researching the effectiveness of new StarRocks capabilities developed in-house: covering btree indexes with additional included fields beyond the PK, and prepared statements for INSERT, UPDATE, and DELETE operations.


See it on your own data

If you're weighing how this would handle your workloads, we'd be glad to walk you through Alphyn Lakehouse on a real scenario. Book a sovereign-lakehouse walkthrough →


About Alphyn.AI

We build the Alphyn Lakehouse, a Kubernetes-native, high-performance, multi-engine lakehouse for any enterprise data and analytical workload — from agentic AI and BI to structured and unstructured data. Built entirely on open standards and an open architecture, Alphyn Lakehouse is a sovereign, on-premises solution for regulated enterprises across the GCC and the wider MENA region.

Learn more at alphyn.ai and follow us on LinkedIn.

starrocksoracleoltpolapmixed-workloadmigrationlakehouseload-testing

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

By clicking "Subscribe" you agree to receive Alphyn communications. We respect your privacy.