How We Use a MySQL Alternative to Avoid Sharding and Provide Strong Consistency

This article is based on a talk given by Xiao Huang at a TiDB User Group event.

Meituan is a world-leading online-to-offline (O2O) local life service platform, connecting more than 240 million consumers and five million local merchants through a comprehensive array of e-commerce services and products. To support our highly concurrent online services, we need a distributed database that can handle large volumes of data.

As more and more distributed databases enter the market, choosing the right one for our applications is not easy. After thoughtful consideration, we chose TiDB, a distributed, scale-out MySQL alternative database.

In this post, I’ll share the considerations we had in database selection and introduce how we use TiDB at Meituan to solve our pain points.

What to consider when selecting a database

Stability

  • Be able to support a multi-active architecture with high availability
  • Provide easy-to-use monitoring and alert services
  • Perform a rolling upgrade gracefully with no impact on the application
  • Have few problems in data migration and data validation
  • Be efficient in elastic scaling

Performance

  • Low latency.
  • Pessimistic transactions. If we are going to migrate from MySQL to a new database, we hope to retain the transaction usage.
  • High Queries per Second (QPS). If we have a special sale tonight that triples our traffic, can the database handle that surge? Should the database fail, does it go down completely or activate a performance protection mechanism?
  • Massive data processing. If the database does not support this, we have to communicate with the application team in advance to alter the system design and plan whether to manually shard the data and determine the number of shards to create.

Cost

First, the cost of the application. The database should be easily connected to the application, so we can minimize the burden on the application team. Ideally, this process should require little communication or training.

Second, the cost of CPU, memory, and disks. We once researched a scale-up database that required the machine to have 384 GB of memory. A high configuration machine like that would exponentially increase our hardware cost.

Third, the network bandwidth. Some networks connecting remote data centers are charged according to the bandwidth.

Security

  • Data auditing. We would like to know who accessed a specific piece of data and how they operated on it.
  • Data recovery. No matter what anomaly happens, we should be able to recover the data from backup.
  • Database privileges. The database privilege management should be fine-grained. For example, the user ID, phone number, and password should not be accessible to DBAs or application developers. In such cases, we need to grant table-level or even field-level privileges.

Therefore, we want a database that is stable, efficient, and secure, with low cost. We also hope it’s an open source database so that we can get community support when a problem occurs and potentially contribute to the community ourselves.

After comparing the distributed databases on the market, including Amazon Aurora, Spanner, and CockroachDB, we chose TiDB, an open source, MySQL-compatible database that provides horizontal scalability and ACID-compliant transactions.

How we use TiDB at Meituan

We use TiDB mostly for three scenarios: horizontal scaling, financial-grade data consistency, and data middle office.

Horizontal scaling

The major cause of the sharding issues is that computation and storage are not separated. To solve this problem, we need a database that separates the computing layer from the storage layer, so we can scale the storage capacity and the computing resources by different amounts as needed. That’s where TiDB comes in.

TiDB has a multi-layered architecture. It has three major components: TiDB for computing, Placement Driver (PD) for storing metadata and providing timestamp, and TiKV for distributed storage. In this architecture, storage and computing is separate so that each layer can scale independently to handle the fluctuating traffic, without affecting other components.

TiDB architecture

Financial services

Let’s first look at a nagging problem we met in MySQL. MySQL 5.7 implements lossless semi-sync replication. In this implementation, MySQL first writes the transaction to the binary logs, which are then sent to a replica. After the replica returns an acknowledgment (ACK), the source database completes the transaction in the storage engine (InnoDB). In other words, the binary logs are already sent to a replica before the application knows the commit is completed. If the source database crashes at this moment and the transaction is committed in the replica, a risk occurs.

Moreover, lossless semi-sync replication doesn’t solve data inconsistency. Even if we set the semi-sync timeout to an infinite value, data is not strongly consistent. For example, if the network goes down between the source database and replicas, no replica would be able to receive the ACK.

To resolve this issue, MySQL later introduced MySQL Group Replication (MGR), but its scalability is poor. A group can have at most nine members. In addition, MGR is sensitive to network jitter. A few seconds of network jitter can cause the cluster to change the primary. In addition, MGR’s multi-primary mode has so many bugs that the MySQL community mostly uses the single-primary mode to avoid transaction conflict.

MySQL’s semi-sync doesn’t solve the consistency problem, but TiDB’s Multi-Raft protocol does. In the storage layer, data is divided into Regions (a 96 MB range of data), each having multiple replicas to make up a Raft group. Each Raft group has a Leader that processes read and write requests. When a Leader crashes, other replicas in the group elect a new Leader. Even if a single node crashes, the Raft Group doesn’t lose any data.

TiDB’s Multi-Raft protocol

Next, I’ll introduce how we use distributed transactions at Meituan for different financial scenarios.

Cross-database transactions

Cross-database transactions

Cross-shard transactions

Cross-shard transactions

SOA

SOA architecture

If we don’t have distributed transactions, keeping data consistent across different services is complicated. Without distributed transactions, a single operation might need to be written into different modules, resulting in multiple writes. In food delivery, when the customer-side cluster goes down but the restaurant-side cluster survives, a validating service detects that an order only exists in the restaurant database but not in the customer database. Next, the order data can be added to the customer database. But such validate-and-add logic adds complexity to the application.

Also, for account services, distributed transactions are essential because:

  • The account balance can’t be minus.
  • In the case of a power or network outage, it’s hard to maintain data consistency.
  • Data corruption might be disastrous. When multiple clusters are corrupted, it’s difficult to recover data.

Thankfully, TiDB solves these difficulties with its distributed transaction model.

Solution: the Percolator distributed transaction model

If you plan to use TiDB’s transaction model, we have two general tips:

  • Consolidate small transactions. Distributed transactions need lots of network communication, and lots of small transactions might cause high network latency and thus affect performance.
  • Split up large transactions. A large transaction often takes a long time and updates multiple keys. Because other read requests need to wait for the large transaction to commit, the read latency might rise.

Data middle office

In a hotel booking application, we need to fetch lots of data to calculate if the hotel room pricing is competitive. The results must be given in real time, but the calculation can’t affect the online service. However, if the data is stored in one database, the analytical engine continuously reads massive data from the database, which might cause OLTP transactions to respond slowly.

To address this problem, we use TiDB Binlog to replicate data to another TiDB cluster. Because TiDB’s bottom storage engine, RocksDB, uses the log-structured merge-tree (LSM-tree) structure, which is highly efficient for write operations, we can quickly sync data to a TiDB replica. In this replica, we can perform massive calculations and frequent queries, generate reports, and even build a search engine.

We have many related data scattered across different systems. If we extract these data and pool them together into a single data middle office, we can apply it to different services, such as operations reporting and data subscription.

TiDB for the data middle office

Other scenarios

  • Separate hot and cold data. As the company grows, the accumulated online historical data can be dumped into TiDB clusters to reduce costs.
  • Store logs and monitoring data. As TiDB’s storage layer uses the LSM-tree data structure, which is efficient for write operations, it can scale infinitely to hold all the data for analytical purposes.
  • Online Data Definition Language (DDL) schema change. MySQL has many restrictions on changing table schema. We used to use tools like pt-osc or gh-ost to perform the schema change, but the changes might lower online performance or take too long. TiDB can perform online DDL operations within seconds, which solves a pain point for us.
  • Migrate from HBase/Elasticsearch (ES). Some of our services used HBase or ES, but these databases were frustrating to use, so we migrated the services to TiDB.
  • HTAP is a growing need. With the rise of 5G and IoT, data volumes are growing exponentially. Our fast-paced business may have both OLTP and OLAP requirements, which brings up the need for an HTAP database. With HTAP, we can quickly generate big data analytics, reducing operating and marketing costs.

Summary

PingCAP is the team behind TiDB, an open source MySQL compatible NewSQL HTAP database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store