Benchmarks

Benchmarking MPP Systems and Engines: TPC-DS Results

Head-to-head TPC-DS benchmark at 10 TB scale: Impala vs Trino under single- and multi-stream concurrency, plus a comparison with GreenPlum.

by Alphyn.AI Team·21 min read

This post continues our series on comparing massively parallel processing systems and engines. In a previous article I laid out the testing principles our team follows and shared results from both real-world production scenarios and synthetic benchmarks. That piece sparked discussion — some found the evidence convincing, others questioned whether the results were objective. As promised, here are the results of a benchmark run under the widely accepted TPC-DS standard. Today you'll find out whether switching methodologies changed anything.

Introduction

TPC-DS (Transaction Processing Performance Council – Decision Support) is the industry-standard benchmark for measuring the performance of Decision Support Systems (DSS). In plain terms, it tests how well a given system — a database or a big data platform — handles complex analytical queries under conditions that resemble real production workloads.

The standard was developed by the Transaction Processing Performance Council (TPC) to evaluate systems that process large data volumes and execute complex analytical tasks. TPC-DS models the operations of a retail company — sales, inventory, and marketing — and uses that model to generate a dataset and a query set.

Key characteristics of TPC-DS

Realistic workload: TPC-DS is considered to simulate an analytical workload close to real production use, making its results more meaningful for assessing how systems behave in practice.

Varied, complex queries: The benchmark includes 99 distinct analytical SQL queries that span a wide range of complexity — reporting, OLAP, and data mining. They exercise a system's ability to handle joins, aggregations, and filtering.

Scalable data volumes: TPC-DS can generate datasets ranging from a few gigabytes to hundreds of terabytes, making it possible to test systems at whatever scale factor appropriately matches the hardware under test.

Why TPC-DS matters

TPC-DS is regarded as an objective, standardized way to compare the performance of data analysis platforms. Most database and system vendors publish TPC-DS results to demonstrate what their products can do. The variety of queries in the standard frequently exposes strengths and weaknesses not just of the system as a whole, but of individual subsystems — the query optimizer, the cardinality estimator, and so on. This is why many vendors also use TPC-DS internally when working on optimizer strategies or other engine changes. The wider the adoption of a benchmark, the more credible its results become.

Limitations of the benchmark

TPC-DS models a retail business — sales, inventory, and marketing. Some organizations, particularly in finance, consider it inapplicable to their context and develop their own approaches. That was precisely the situation I described in the previous article.

Another limitation, in my view, is that the standard is weighted toward BI, ROLAP, ad hoc, and light-to-medium ETL scenarios (relative to the scale factor's data volume). It is less representative of heavy ETL workloads, which are characterized not only by query complexity but also by the materialization of results.

Test environment

  • Tests were run on cloud IaaS infrastructure

  • Managed services used:

    • S3 Storage

    • Managed Kubernetes, where the lakehouse platform's compute engines were deployed

  • TPC-DS scale-factor 10,000 (~10 TB uncompressed)

  • Data generation performed via Spark

  • File format: Parquet, ZSTD compression (ratio ~3), table format: Iceberg

  • Compressed data volume: ~2 TB

  • Data was partitioned

  • Impala engine version: 4.5

  • Trino engine version: 459

All queries were run as-is from the benchmark specification without any modification to query text. Both processing engines were tuned for maximum performance and maximum utilization of all available hardware resources. Between iterations the systems were restarted to clear local caches. Each engine collected and analyzed statistics independently.

Compute resources were allocated according to the principles established in the first article: the total RAM of the compute cluster must be significantly less than the dataset size, since that configuration reflects real production conditions. It bears repeating: never trust benchmarks where the dataset size is comparable to or smaller than available RAM.

Worker node configuration: 32 vCores, 256 GB RAM, local 100 GB SSD for spill and cache operations. Four worker nodes in total: 128 vCores, 1,024 GB RAM. Trino also had a dedicated coordinator node, but we did not count its resources. Impala operates perfectly well without a dedicated coordinator up to a certain load level — one we did not approach in this test.

Iteration 1

The first iteration ran in single-stream mode: all 99 queries executed sequentially. Results from this mode cannot be used to assess real production performance, but they are useful as an initial calibration pass and for smoothing out configuration rough edges.

The full table with all queries in order appears in the appendix at the end of the article. For easier analysis we split the queries into three groups: "Simple" (under 10 seconds), "Medium" (10–100 seconds), and "Heavy" (over 100 seconds).

Table. Execution time — "Simple" queries

Query

Impala, sec

Trino, sec

query12

1

7

query20

1

9

query21

1

2

query41

1

1

query52

1

7

query42

2

6

query55

2

9

query56

2

6

query73

2

6

query83

2

5

query92

2

2

query32

3

4

query40

3

30

query77

3

8

query10

4

7

query58

4

9

query61

4

11

query69

4

7

query8

5

12

query19

6

10

query53

6

14

query68

6

14

query3

7

63

query33

7

7

query89

7

18

query98

7

17

query26

8

21

query30

8

17

query43

8

14

query5

9

92

query60

9

12

query63

9

15

query84

9

12

query90

9

13

query1

10

15

query39

10

12

query49

10

37

query86

10

23

Chart. Comparative execution time for "simple" queries. Lower is better.
Chart. Comparative execution time for "simple" queries. Lower is better.

Table. "Medium complexity" queries

Query

Impala, sec

Trino, sec

query37

11

19

query62

11

31

query66

11

18

query48

12

87

query6

13

10

query15

13

9

query46

13

18

query79

13

25

query25

14

36

query45

14

10

query2

15

57

query80

16

131

query34

17

18

query59

17

108

query13

18

121

query36

18

31

query22

19

27

query7

20

31

query29

20

69

query82

20

37

query17

21

78

query18

21

26

query35

21

18

query71

21

23

query85

21

70

query94

21

30

query81

22

14

query31

26

39

query51

26

30

query70

27

83

query95

30

170

query91

31

4

query27

32

24

query54

32

48

query96

38

15

query99

45

64

query50

47

367

query16

49

69

query38

49

109

query87

50

104

query76

54

124

query44

74

213

query57

75

190

query97

88

147

query74

93

194

Chart. Comparative execution time for "medium complexity" queries. Lower is better.
Chart. Comparative execution time for "medium complexity" queries. Lower is better.

Table. "Heavy" queries.

Query

Impala, sec

Trino, sec

query93

102

487

query65

121

137

query28

126

350

query88

126

112

query11

144

357

query47

156

368

query9

170

325

query24

194

497

query75

219

310

query4

262

798

query64

310

203

query72

333

65

query14

416

2778

query78

710

573

query23

1007

3436

query67

1324

878

Chart. Comparative execution time for "heavy" queries. Lower is better.
Chart. Comparative execution time for "heavy" queries. Lower is better.

Table. Overall efficiency — single stream

Impala

Trino

Total test duration

~ 2 hours 1 min

~ 4 hours 17 min

Compute cost

$10

$22

Compute cost was calculated using the cloud provider's pricing calculator: monthly cost of four compute nodes / 30 days / 3,600 sec × test duration in seconds. The dedicated Trino coordinator was excluded from the cost calculation.

Chart. Total test duration — all queries. Lower is better.
Chart. Total test duration — all queries. Lower is better.

Iteration 2

Results in hand — can we draw conclusions? Not yet. Staying true to our own principles, we only make decisions based on scenarios that reflect production conditions — concurrent load. For the second iteration we ran TPC-DS with two streams executing simultaneously. Stream composition and query ordering followed the benchmark specification. The key idea is that queries across streams are arranged so they simulate users running different queries against different data, rather than executing the same query in parallel across multiple sessions.

Total test duration was measured as the elapsed time from the start of the run to the completion of the last query across both streams. Since both engines operated under concurrent load, we ran multiple tuning iterations to find optimal resource queue and cluster settings that minimized total elapsed time with zero failures across all 198 queries. Any run where even a single query failed was discarded and configuration tuning continued. For Trino this involved iterative tuning of retry_policy, query_max_total_memory, query_max_memory_per_node, and query_max_memory. Compute cluster resources remained unchanged from the single-stream tests. For Impala the relevant settings were mem_limit and mt_dop.

Table. Overall efficiency — 2 streams

Impala

Trino

Total test duration

~ 4 hours 1 min

~ 11 hours 43 min

Compute cost

$20

$59

As load increased the two engines diverged. Impala degraded by approximately 2x relative to the single-stream run; Trino degraded by 2.7x.

Iteration 3

We continued increasing load and ran the benchmark with 4 concurrent streams on the same hardware. The system received 396 queries as input. The same pass/fail criteria applied — the test was considered successful only if all 396 queries completed without errors or crashes.

Table. Overall efficiency — 4 streams

Impala

Trino

Total test duration

~ 8 hours 11 min

~ 34 hours 2 min

Compute cost

$42

$173

Impala degraded proportionally to the 4x load increase; Trino degraded by 8x.

Chart. Total duration under concurrent load. Lower is better.
Chart. Total duration under concurrent load. Lower is better.

Now imagine this cluster runs this workload on a scheduled daily basis for a full year — 365 runs, one per day. What does the accumulated compute cost look like?

Impala

Trino

Annual compute cost, USD

$15,155

$63,010

The difference: ~$48,000 — on 10 TB of raw data across four compute nodes. Now extrapolate that to 100 TB and 40 nodes.

What about GreenPlum?

Having covered the lakehouse engine comparison, it was time to bring GreenPlum into the picture. To do that we had to rebuild the test environment: GreenPlum is a traditional shared-nothing MPP system that runs on dedicated local disks rather than on top of object storage. For a fair comparison, all lakehouse services also had to run without relying on cloud-managed services — making the setup more representative of an on-premises installation.

Table. GreenPlum hardware

Node type

vCPU

RAM

Disk

Master × 1

16

64

1 SSD × 1.5 TB

Segment × 4

32

256

4 SSD × 2 TB

Total

144

1,088

16 SSD storage layer

A GreenPlum standby master was not needed for load testing — it contributes nothing to computation and only adds cost.

Table. Alphyn Lakehouse hardware

Node type

vCPU

RAM

Disk

MinIO host × 4

8

24

4 × 2 TB

Worker host × 4

32

256

1 × 100 GB

Total

152

1,120

16 SSD storage layer

Sizing followed a principle of equal compute resources and equal disk subsystems, even though the two system topologies differ fundamentally.

Alphyn Lakehouse was deployed on cloud IaaS infrastructure using a decoupled architecture: an isolated S3 cluster based on MinIO and an isolated compute cluster running Impala 4.4.1. The critical constraint was that the number and type of disks in the storage layer had to match exactly, using the highest-performance disks available from the cloud provider. GreenPlum was deployed on cloud VMs with equally high-performance storage.

Test parameters:

  • TPC-DS scale-factor 10,000 (~10 TB uncompressed)

  • GreenPlum tuned for maximum performance:

    • Optimal physical data model including partitioning, compression, storage format selection, and so on

    • Resource management configured for concurrent workloads

  • For Impala, data remained in Parquet + Iceberg with ZSTD compression, same as in previous tests

Pass/fail criteria were unchanged: in single-stream mode all 99 queries must complete successfully; in 4-stream mode all 396 queries must complete without errors or crashes.

Table. Benchmark results.

Alphyn Lakehouse — Impala 4.4.1 + S3 MinIO

OSS GreenPlum 6.27.1

1 stream

4 streams

1 stream

4 streams

Elapsed time

~ 2 hours 8 min

~ 8 hours 48 min

~ 13 hours 20 min

~ 53 hours 20 min

Compute cost, USD

$26

$109

$155

$621

Compute cost was calculated using the same methodology: monthly IaaS node rental cost (public price list) / 30 days / 24 hours / 3,600 sec × elapsed time in seconds.

Chart. Comparison of Alphyn Lakehouse and OSS GreenPlum. Lower is better.
Chart. Comparison of Alphyn Lakehouse and OSS GreenPlum. Lower is better.
Fig. GreenPlum resource utilization graphs. TPC-DS 4 streams.
Fig. GreenPlum resource utilization graphs. TPC-DS 4 streams.

Once again the data confirms what we see in practice: Alphyn Lakehouse, even with physically separated storage and compute, delivers at least 6x better cost efficiency than GreenPlum. Translating performance into cost reveals the difference in total cost of ownership from a capital expenditure standpoint. In reality the gap is even wider, since maintenance and licensing costs are derived from hardware sizing — and a GreenPlum cluster also occupies significantly more datacenter floor space and draws substantially more power. MPP systems that rely on full table scans have been obsolete for a decade.

Now let's apply the same methodology to annual compute cost. Assuming the cluster runs this workload on a scheduled basis for 12 months — 365 runs:

Alphyn Lakehouse / Impala

GreenPlum 6

Annual compute cost, USD

$39,737

$226,488

Difference: ~$187,000.

The gap is significant. Put differently: on just 10 TB of data across a four-node cluster, an organization can redirect ~$187K from hardware budgets toward the data team — and end up with more functionality delivered on time.

What other benchmark standards exist — and are they worth using?

What other open benchmarks can be used for concurrent load testing beyond TPC-DS? One option is TPC-H. However, despite also targeting analytical systems, TPC-H has several weaknesses compared to TPC-DS:

  • Simpler data model:

    • Uses a simple star schema with fewer tables, rather than the snowflake schema used in TPC-DS

  • Limited query set:

    • Contains only 22 queries which, while analytical, are less complex and less varied than TPC-DS

  • Lower optimizer demand:

    • TPC-H is considered less demanding on the query optimizer due to the relative simplicity of its queries, so it may not fully expose the capabilities of sophisticated query optimizers in modern MPP systems

Recently I have seen ClickBench used increasingly often for comparative engine testing. This is a performance benchmark developed by the ClickHouse team using a real dataset and relatively simple queries characteristic of ClickHouse's target use case.

ClickBench is excellent for evaluating ClickHouse performance in the scenarios it was designed for — fast filtered aggregation over a single object. But ClickBench should not be used to select a primary platform engine. Among its 42 SQL queries, not one contains a single JOIN.

Upcoming testing plans

We are currently benchmarking StarRocks, another processing engine that is part of the Alphyn Lakehouse platform. We also plan to benchmark Spark 4 versus Spark 3.5 and compare their performance against other MPP SQL engines in ELT pipeline workloads.

We are also developing a methodology for objectively load-testing engines on "fast read access to a materialized data mart" scenarios. If you have ideas, we'd love to hear them.

A note to readers and the community

Our team is committed to objective, comparative technology testing — often an internal competitive process rather than a comparison against external market offerings. Our goal is to give customers a system that meets their performance and functionality expectations, and to calculate hardware sizing accurately, since it directly determines total cost of ownership. When choosing between technologies, users and platform owners should be guided not only by what feels convenient, but by a clear understanding of what that convenience will cost.

If you have doubts about any of the results presented here, reproduce them yourself. Share your observations, invite discussion, or reach out about joint testing. Thank you for reading.


Appendix

Table: TPC-DS query execution times. Single stream. In benchmark query order.

Query

Impala, sec

Trino, sec

Query

Impala, sec

Trino, sec

query1

10

15

query51

26

30

query2

15

57

query52

1

7

query3

7

63

query53

6

14

query4

262

798

query54

32

48

query5

9

92

query55

2

9

query6

13

10

query56

2

6

query7

20

31

query57

75

190

query8

5

12

query58

4

9

query9

170

325

query59

17

108

query10

4

7

query60

9

12

query11

144

357

query61

4

11

query12

1

7

query62

11

31

query13

18

121

query63

9

15

query14

416

2778

query64

310

203

query15

13

9

query65

121

137

query16

49

69

query66

11

18

query17

21

78

query67

1324

878

query18

21

26

query68

6

14

query19

6

10

query69

4

7

query20

1

9

query70

27

83

query21

1

2

query71

21

23

query22

19

27

query72

333

65

query23

1007

3436

query73

2

6

query24

194

497

query74

93

194

query25

14

36

query75

219

310

query26

8

21

query76

54

124

query27

32

24

query77

3

8

query28

126

350

query78

710

573

query29

20

69

query79

13

25

query30

8

17

query80

16

131

query31

26

39

query81

22

14

query32

3

4

query82

20

37

query33

7

7

query83

2

5

query34

17

18

query84

9

12

query35

21

18

query85

21

70

query36

18

31

query86

10

23

query37

11

19

query87

50

104

query38

49

109

query88

126

112

query39

10

12

query89

7

18

query40

3

30

query90

9

13

query41

1

1

query91

31

4

query42

2

6

query92

2

2

query43

8

14

query93

102

487

query44

74

213

query94

21

30

query45

14

10

query95

30

170

query46

13

18

query96

38

15

query47

156

368

query97

88

147

query48

12

87

query98

7

17

query49

10

37

query99

45

64

query50

47

367

-

-

-


See it on your own data

If you're weighing how this would handle your workloads, we'd be glad to walk you through Alphyn Lakehouse on a real scenario. Book a sovereign-lakehouse walkthrough →


About Alphyn.AI

We build the Alphyn Lakehouse, a Kubernetes-native, high-performance, multi-engine lakehouse for any enterprise data and analytical workload — from agentic AI and BI to structured and unstructured data. Built entirely on open standards and an open architecture, Alphyn Lakehouse is a sovereign, on-premises solution for regulated enterprises across the GCC and the wider MENA region.

Learn more at alphyn.ai and follow us on LinkedIn.

TPC-DSImpalaTrinoGreenPlumMPPlakehousebenchmarksIcebergperformance

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

By clicking "Subscribe" you agree to receive Alphyn communications. We respect your privacy.