How We Improved TPC-C Performance by 50% and TPC-H Performance by 100%

TPC-C performance improved by 50%

TiDB 3.0.13 vs. 4.0 for TPC-C benchmarks (Higher is better)

Optimizing pessimistic transactions

Locking process before optimization
Locking process after optimization

Optimizing Raftstore’s write model

Optimizing RocksDB’s write throughput

TPC-H performance improved by 100%

  • TiDB 3.0.13 that only read data from TiKV
  • TiDB 4.0 that only read data from TiKV
  • TiDB 4.0 that automatically read data from TiKV or TiFlash through cost-based query optimization
TiDB 3.0.13 vs. 4.0 for TPC-H benchmarks (Lower is better)

TiDB’s Chunk structure

  • Fixed-length columns, in which the data has a specified length that cannot be changed.
  • Variable-length columns, in which the data length can change.
TiDB’s Chunk structure
New vector access interface
  • For fixed-length data, such as int64 numbers, the Golang unsafe package directly converts Column.data to []int64 in Int64s() []int64, and returns the result. The user who wants to read or modify Column.data can directly manipulate the array. This is the most efficient way to access fixed-length data.
  • For variable-length data, such as a string, we can use only GetString(rowIdx int) string to obtain the data in the corresponding row, and only append data to update it. Randomly modifying an element in the variable-length data column involves moving all the subsequent data. This creates a heavy overhead. To improve the overall performance, this operation is not implemented in Column.

Why Chunk RPC?

The old encoding format
  • Under the existing execution framework, decoding each record requires multiple function calls. When the data size is large, the overhead of function calls is high.
  • Decoding some types of data requires a lot of calculations. For example, decoding variable integers is more complicated than decoding integers. When decoding decimals, we need to calculate a lot of information like precision to restore the entire structure. This consumes more CPU resources while memory usage and network transmissions are only slightly reduced.
  • In the decoding process, we need to construct a large number of objects and allocate a large amount of memory resources.
High CPU overhead

Chunk RPC in TiDB 4.0

The new encoding format
  • A function call can decode a column of data. This greatly reduces the function call overhead.
  • When TiKV encodes data of DECIMAL, TIME, JSON, and other types, it retains its structure. When TiDB decodes data, it obtains complete data objects without having to do extra calculations.
  • Because this format is similar to Chunk, we can point the pointer inside Go Slice to byte[], which is for decoding. Then, we can reuse the memory. This saves memory space and significantly reduces decoding overhead.

Chunk RPC does not impact OLTP capabilities

Default encoding vs. Chunk RPC

Conclusion

--

--

--

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Two Factor Authentication Bypass [ $50 ]

PROCESIO February 2022 Community Updates

How to control cloud costs through collaborative BI

IT Infrastructure Monitoring ToolHow to Choose a Suitable IT Infrastructure Monitoring Tool?

My experience with JP Morgan and Code For Good

January 18, 2017 at 07:20PM

Methods in Go! Confusing ? 🤔

https://soundcloud.app.goo.gl/DFfUrWuiKFUXLx6y9

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PingCAP

PingCAP

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

More from Medium

Time Synchronization in Distributed Systems — TiDB’s TSO

Horizontally scaling writes with Redis Clusters

Distributed Rate-Limiting Algorithms

Configure JanusGraph with Astra DB Storage Backend for Dynamic Workloads