No Sharding, No ETL: Use a Scale-Out MySQL Alternative to Store 160+ TB of Data

Industry: Advertising


  • Chunlei Liu (Senior DBA at
  • Kai Xuan (Former Senior DBA at is China’s leading online marketplace for classifieds covering various categories, such as jobs, real estate, automotive, financing, used goods, and local services. Merchants and consumers can publish their advertisements on our online platform so that they can connect, share information, and conduct business. By the end of 2018, we had more than 500 million users, and our total revenue in 2019 was nearly US $2.23 billion.

As our businesses grew, large amounts of data flooded into our databases. But standalone MySQL databases couldn’t store so much data, and sharding was an undesirable solution. To achieve MySQL high availability, we needed an external tool. To perform Online Analytical Processing (OLAP), we had to use complicated and tedious extract, transform, load (ETL) jobs. We looked for a scalable, easy-to-maintain, highly available database to solve these issues.

After an investigation, we adopted TiDB, an open-source, distributed, Hybrid Transactional/Analytical Processing (HTAP) database. Now, our production environment has 52 TiDB clusters that store 160+ TB of data, with 320+ servers running in 15 applications. To maintain such large-scale clusters, we only need two DBAs.

In this post, we’ll deep dive into why we migrated from MySQL to TiDB and how we use TiDB to scale out our databases, perform real-time analytics, and achieve high availability.

Our pain points

When we used MySQL, we encountered these problems:

  • A standalone MySQL database has limited capacity. It can’t store enough data to meet our needs.
  • Sharding was troublesome. At, we didn’t have a middleware team, and our developers needed to operate and maintain sharding scenarios by themselves. After we sharded our database, it took a lot of work to aggregate the data.
  • To perform OLAP analytics, we must do complex and boring ETL tasks. An ETL process was time-consuming, and this hindered our ability to do real-time data analytics.
  • To achieve high availability, MySQL relies on an external tool. We used Master High Availability (MHA) to implement MySQL high availability, but it increased our maintenance cost.
  • In the primary-secondary database framework, MySQL has high latency on the secondary database. When we performed data description language (DDL) operations, high latency occurred in the secondary database. This greatly affected real-time reads.
  • A standalone MySQL database couldn’t support large amounts of writes. When our writes in a standalone MySQL database reached about 15,000 rows of data, we encountered a database performance bottleneck.

Therefore, we looked for a new database solution with the following capabilities:

  • It has an active community. If its community is not active, when we find issues or bugs, we can’t get solutions.
  • It’s easy to operate and maintain.
  • It can solve our current problems, such as issues brought by sharding and lots of writes and deletes.
  • It is suitable for many application scenarios and provides multiple solutions.

Why we chose TiDB, a NewSQL database

We adopted TiDB, because it met all our requirements for the database:

  • TiDB uses distributed storage, and it can easily scale out. We no longer need to worry that a single machine’s capacity is limited.
  • TiDB is highly available. We don’t need to use an additional high-availability service. So TiDB helps us eliminate extra operation and maintenance costs.
  • TiDB supports writes from multiple nodes. This prevents a database performance bottleneck when we write about 15,000 rows of data to a single node.
  • TiDB provides data aggregation solutions for sharding. With TiDB, we no longer need to shard databases.
  • TiDB has a complete monitoring system. We don’t need to build our own monitoring software.

How we’re using TiDB as an alternative to MySQL

TiDB’s architecture at

TiDB’s architecture at

From TiDB 2.0 to TiDB 4.0.2

Our applications running on TiDB

TEG’s applications are our self-developed databases’ management platforms. They have high write volumes. In August 2019, their data size reached about 6 TB. TEG applications’ data increases by about 500 GB per month. In the past 6 months, 8 flash memory cards have been damaged in these applications. But thanks to TiDB’s high availability, they didn’t affect our businesses.


Now, at, we use databases including MySQL, Redis, MongoDB, Elasticsearch, and TiDB. We receive about 800 billion visits per day. We have 4,000+ clusters, with 1,500+ physical servers.

If you have any questions or you’d like to learn more about our experience with TiDB, you can join the TiDB community on Slack.

Originally published at on Jan 14, 2021

PingCAP is the team behind TiDB, an open source MySQL compatible NewSQL HTAP database. Official website: GitHub:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store