Reducing Real-Time Query Latency from 0.5 s to 0.01 s with a Scale-Out HTAP Database

Industry: Automobile

Authors:

  • Xianqi Jin (DBA at Autohome)
  • Fan Zhang (R&D at Autohome)
  • Technical Architecture Team of Autohome Technical College

Autohome is the leading online destination for automobile consumers in China. It’s one of the most visited auto websites in the world. We provide professionally produced and user-generated content, a comprehensive automobile library, and extensive automobile listing information, covering the entire car purchase and ownership cycle.

The Autohome community forum is one of our oldest applications, with 100 million+ daily visits and 1 billion+ daily interface calls. As our data size rapidly grew, SQL Server became our database bottleneck. Sharding didn’t meet our app requirements, and scaling database capacity affected apps. We looked for a new database solution.

After we compared TiDB, Apache Ignite, and CockroachDB, we chose TiDB, an open-source, distributed, Hybrid Transactional/Analytical Processing (HTAP) database. Thanks to its horizontal scalability, now we can easily scale out our database, and our transactional 99th percentile latency is within 12 ms.

In 2020, TiDB also served as the underlying storage for our big sales promotion with hundreds of thousands of Queries Per Second (QPS). Our real-time query latency was reduced from 0.5 s to 0.01 s. TiDB’s 99th percentile latency was within 20 ms.

This post introduces how TiDB helps us achieve database scaling and real-time analytics, and covers:

  • Our forum reply app’s pain points
  • Why we chose TiDB over Ignite and CockroachDB
  • How we use TiDB in the forum reply app
  • Why and how we use TiDB in our big sales promotion app

Our pain points for the community forum reply app

Database sharding didn’t meet our app requirements

However, our applications developed and application requirements changed, and when we wanted to implement some application requirements, sharding couldn’t meet our needs. We hoped that data was logically in a single table.

Scaling database capacity affected applications

Our exploration of distributed SQL databases

Why we chose TiDB over Apache Ignite and CockroachDB

  • TiDB is compatible with the MySQL protocol, so we can use TiDB as we used MySQL.
  • TiDB’s architectural design is better than Ignite and CockroachDB.
  • TiDB has a vibrant community and rich documentation.
  • The TiDB team is easy to work with and responsive to our needs.

TiDB features we value most

Among TiDB’s features, we think these are the most attractive:

  • Horizontal scalability
    We can add nodes to the cluster at any time, and it’s easy to change nodes.
  • Massive data support
    It can easily handle 10 billion+ rows of data. For example, you can check out the Zhihu case study.
  • High availability
    SQL Server uses the Publish/Subscribe replication model, while TiDB uses the Raft algorithm to achieve 100% strong data consistency. When the majority of data replicas are available, TiDB can implement automatic fault recovery.
  • Real-time HTAP
    TiDB is an HTAP database that targets both OLTP and OLAP scenarios. TiFlash, TiDB’s analytical engine, lets users perform real-time analytics.

Testing TiDB in OLTP, OLAP, and exception scenarios

OLTP testing for TiDB

We tested TiDB against 20 million rows of data and 500 concurrent threads in an OLTP scenario.

As the following figure shows, TiDB’s 99th percentile response time was within 16 ms. This met our application requirement. When the amount of data increased, TiDB showed even greater advantages. In addition, we can add TiDB or TiKV nodes to improve read and write performance.

TiDB’s query response time

OLAP testing for TiDB

To compare TiDB vs. MySQL, we ran a 50 GB TPC-H test. The test results showed that TiDB had a great advantage over MySQL. In the table below, the superior performance for each query is shown in bold:

Exception testing for TiDB

We tested the performance of PD and TiKV during abnormal downtime. We found that PD’s and TiKV’s downtime had little impact on applications, and the cluster could automatically recover from failures.

How we use TiDB in the Autohome forum reply application

TiDB’s 99th and 99.9th percentile response times

In September 2019, we ran the distributed database TiDB in the production environment. Since then, it has been running stably.

Next, we’ll introduce how we use TiDB in our yearly big sales promotion party.

Introducing TiDB to our big sales promotion

Why we chose TiDB

Cross-city high availability

The three DCs across two cities architecture is a high-availability disaster recovery solution for production DCs, intra-city disaster recovery centers, and remote disaster recovery centers. In this deployment, the three DCs are interconnected. If one DC fails or a disaster occurs, the other DCs can normally operate and take over critical applications or all applications. Compared with the multi-DCs in the same city plan, the three DCs in two cities deployment has cross-city high availability and can survive urban natural disasters.

Real-time HTAP

TiDB provides two storage engines: the TiKV row storage engine and the TiFlash column storage engine. TiDB replicates updated data from TiKV to TiFlash in real time to ensure data strong consistency between TiKV and TiFlash. For HTAP resource isolation, we can deploy TiKV and TiFlash on different machines as needed.

Scaling out and scaling in easily

The TiDB architecture separates computing from storage, so we can separately scale out or scale in the computing or storage capacity online as needed. The scaling process is transparent to application operations and maintenance staff.

Compatible with the MySQL 5.7 protocol and MySQL ecosystem

TiDB is compatible with the MySQL 5.7 protocol, common features of MySQL, and the MySQL ecosystem. We were able to migrate our applications to TiDB with minimal code changes, and, in some cases, no changes at all. In addition, TiDB provides a series of data migration tools to help migrate application data easily into TiDB.

Good reputation and use experience

At Autohome, the User Product Center was the first business unit to use TiDB, and it migrated the core application forum replies from SQL Server to TiDB. The car owner price application migrated from MySQL to TiDB. We have experience in migrating to, using, and optimizing TiDB. TiDB has a good reputation in Autohome.

The TiDB cluster architecture for our big sales promotion

For the sales promotion, we deployed TiDB in three DCs across two cities, with five replicas for data. Our cluster components were as follows:

TiDB + TiFlash + TiCDC

As this diagram shows:

  • TiFlash is TiDB’s columnar storage engine. This makes TiDB a true HTAP database. TiFlash provides the same snapshot isolation level of consistency as TiKV and ensures that the latest data is read.
    During the big sales promotion party, we used TiFlash to display real-time data on a large screen.
  • TiCDC is an open-source feature that replicates TiDB’s incremental changes to downstream platforms by subscribing to change logs in TiKV. It also provides TiCDC Open Protocol to support other systems that subscribe to TiKV’s data changes.
    In our cluster, it replicated TiDB cluster data to the downstream MySQL database in real time. TiCDC’s replication latency was within seconds, which satisfied our real-time requirements for online sales promotion applications.
  • The MySQL database was used as a backup in case of failures to improve the applications’ capacity to tolerate disasters.

TiDB stress tests

Test tools

Because Sysbench didn’t have reconnection functionality, we couldn’t use it to test some scenarios. We used Sysbench and an application simulation program.

Test items

We developed a test plan based on actual application scenarios. The main test items were as follows:

  • Single node failure tests simulated:

A tidb-server failure

A PD follower failure

A PD leader failure

A tikv-server failure

Two tikv-server failures

  • Internet data center (IDC) failure tests simulated:

All servers’ downtime in IDC 2, with the PD leader in IDC 1

All servers’ downtime in IDC 2, with the PD leader in IDC 2

IDC 2 network isolation, with the PD leader in IDC 1

IDC 2 network isolation, with the PD leader in IDC 2

  • Performance tests:

oltp_read_write test, five replicas, optimistic transactions

oltp_read_write test, five replicas, pessimistic transactions

oltp_read_write test, three replicas, optimistic transactions

oltp_read_write test, three replicas, pessimistic transactions

Test conclusion

We won’t discuss test results in detail here, but our conclusion was that whether it was a single node failure or an IDC failure, the cluster could recover within about 30 seconds and continue to provide services to meet application needs.

TiDB’s performance metrics in production

  • TiFlash reduced the query response time for panel real-time analytics from 0.5 s to 0.01 s.
  • TiCDC replicated TiDB’s data to MySQL downstream with about 2 s latency.
  • TiDB’s 99th percentile response time was within 20 ms, with 9,000+ connections.

Conclusion

We’ve run TiDB for more than two years, and it’s used in important applications such as forum replies, resource pools, and friend management. We also deployed TiDB in three DCs across two cities for our 2020 big sales promotion applications. This solution guaranteed our data high availability and helped us achieve real-time HTAP. We could scale out or scale in our database as needed. In addition, TiCDC’s high availability and low latency, as well as support for large-scale clusters has made a big impression on us.

If you have any questions or want detailed information about our experience, you could join the TiDB community on Slack.

Originally published at www.pingcap.com on Feb 18, 2021

PingCAP is the team behind TiDB, an open source MySQL compatible NewSQL HTAP database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store