Always Fun, Always On: How TiDB Helps iQiyi Deliver Streaming Videos

Industry: Media and Entertainment

Author: Boshuai Zhu (Senior Infrastructure Engineer at iQiyi)

iQiyi, formerly Qiyi, is the Netflix of China: the country’s largest online video-on-demand platform. With “Always Fun, Always Fine” as our brand’s motto, we are committed to providing users with high-resolution, authentic media content including movies, TV dramas, variety shows, documentaries, animations and travel programs. On March 29, 2018, our company IPO’ed on the NASDAQ and raised $2.25 billion.

Since our business has grown rapidly, our data size has also soared. This has placed enormous pressure on our backend system, especially the MySQL cluster. We experienced the suffocating pain of tackling immense data until we found TiDB, built and supported by PingCAP. Finally, we can properly manage our data.

Currently, the TiDB cluster is deployed in the internal system of our database cloud team. With the April 2018 release of its 2.0 version, TiDB has proven to be much more mature, with increased system stability and query efficiency. In this post, we will share why we chose TiDB, how we are using it, and the lessons we learned working closely with the PingCAP team.

Why We Chose TiDB

Before TiDB, MySQL was our main database solution for the production environment. The business developed so quickly that our data size rocketed and many bottlenecks occurred in the MySQL cluster. For example:

To support our fast-growing business, we were in urgent need of a database which would be:

TiDB checked all of those boxes, and in fact, its performance has exceeded our expectations.

What Is TiDB?

TiDB is an open source, NewSQL, scalable hybrid transactional and analytical processing (HTAP) database built by the PingCAP team and the open source community. It aims to break down the traditional separation between an OLTP database and an OLAP database, and offer a one-stop solution that enables real-time business analysis on live transactional data.

Figure 1: TiDB Platform Architecture
Figure 1: TiDB Platform Architecture
Figure 1: TiDB Platform Architecture

Inside the TiDB Platform, there are several components:

The TiDB ecosystem also has a wealth of other enterprise-level tools, such as Ansible scripts for quick deployment, Syncer for seamless migration from MySQL, Wormhole for migrating heterogeneous data, and TiDB Binlog, which is a tool to collect binlog files.

How Are We Using TiDB?

Scenario 1: TiDB in Risk Monitoring Center

The Risk Monitoring Center stores machine security statistics, including the traffic information from different dimensions such as per DC (data center), per IP address, per port, etc. To gain timely insights into the system status, some complex queries are performed by the application from time to time.

During the database evaluation process, we compared Apache Druid with TiDB and found:

In addition, TiDB is highly compatible with the MySQL protocol. Users can access TiDB via the existing MySQL connection pool component. This translates to low cost for service migration and high efficiency for development.

Therefore, we decided to deploy TiDB in our Risk Monitoring Center.

Deployment Process

The Risk Monitoring Center was the first iQiyi project to use TiDB online in the production environment, so we came up with the following plan:

Issues and Solutions

The following issues occurred during our adoption of TiDB, but were quickly resolved.

Issue 1: Connection timeout.

Cause:

Solution: Be fixed in the latest version of TiDB with improved statistics-collecting strategy and auto-analyze.

Issue 2: Adding an index in the table took a long time.

Causes:

Solution: Be fixed in the latest version of TiDB with the addition of the Batch Split feature for large Regions (After we reported the issue to the PingCAP development team, they responded actively and quickly).

TiDB in production

During the upgrade of TiKV nodes, some errors occurred such as “Region is unavailable [try again later]” and “TiKV server timeout.” This was due to the lag of cache information, which is unavoidable in a distributed system. But it does not affect the services as long as the application has a retry mechanism.

Figure 2: Data growth in the Risk Monitoring Center
Figure 2: Data growth in the Risk Monitoring Center
Figure 2: Data growth in the Risk Monitoring Center
Figure 3: Auto partition of tables in TiKV
Figure 3: Auto partition of tables in TiKV
Figure 3: Auto partition of tables in TiKV

Scenario 2: Video Transcoding

The video transcoding database stores the historical data produced in transcoding, which needs to be further analyzed and processed after they are generated.

Pain point: Previously in the MySQL cluster, because of the limited storage capacity, we could only retain the data of the last several months. Thus we lost the chance to analyze and process the earlier data.

Solution: To solve this problem, we deployed a TiDB cluster at the end of 2017 and migrated the data to the TiDB cluster through full and incremental import. This strategy ensured data consistency between the previous MySQL cluster and the newly-built TiDB cluster.

During the full import, we originally used Mydumper + Loader, a data migration tool developed by PingCAP. But we found that Loader was too slow for our needs.

To fix this problem, PingCAP developed TiDB Lightning, which converted the data exported from Mydumper to SST files and imported the files to TiKV nodes. This way, data migration efficiency was improved greatly: 1T data could be migrated successfully in five or six hours. After video transcoding ran stably for a while, we switched all the traffic to the TiDB cluster and expanded our services. So far, it has run smoothly.

The following picture shows the TiDB Lightning architecture:

Figure 4: TiDB Lightning implementation architecture
Figure 4: TiDB Lightning implementation architecture
Figure 4: TiDB Lightning implementation architecture

Scenario 3: User Login Information

In the user login information database project, we were confronted with some thorny problems — and all of them have been resolved with TiDB.

After data was migrated to TiDB, we did not need sharding anymore, and the application codes have been simplified.

The migration process

In the incremental synchronization process, we used Syncer, which aggregated data from multiple sources and various tables in a single table using wildcards. It has vastly simplified the incremental synchronization work.

The Syncer architecture is as follows:

Figure 5: Syncer architecture
Figure 5: Syncer architecture
Figure 5: Syncer architecture

However, Syncer currently cannot display real-time delay information in Grafana. This is a drawback for the applications that are sensitive to synchronization delay. The good news is that PingCAP is working on this issue, and they have refactored Syncer to automatically deal with the primary key conflict of table partition. With Syncer and TiDB, users can quickly synchronize data from multiple MySQL clusters in real time.
We have two requirements for high availability of the database:

For these requirements, TiDB has the corresponding solutions:

To ensure high availability during the data migration process, we used Drainer to synchronize the data in the TiDB cluster with the MySQL cluster. Drainer supports reverse synchronization by specifying the starting timestamp.

Figure 6: Deploying TiDB in multiple data centers
Figure 6: Deploying TiDB in multiple data centers
Figure 6: Deploying TiDB in multiple data centers

Throughout the process, the PingCAP team offered us timely and expert-level help. They helped us locate the issue and gave us constructive suggestions. We really appreciate their patience and dedicated support!

Lessons Learned

The most attractive features of TiDB are horizontal scalability and high availability. The data that a standalone database can hold is limited. If the policy of MySQL sharding + proxy is applied, the maintenance cost will go up whether the proxy is on the client or the server.

What’s worse, the query efficiency fails to meet the performance demands in many scenarios. In addition, the proxy does not support transactions well and cannot guarantee data consistency.

TiDB is a perfect alternative for MySQL sharding + proxy solutions. With highly available service and data and horizontal scalability, TiDB effectively solves many problems triggered by a huge surge in the amount of data. The more data there is, the more TiDB outperforms MySQL.

Conclusion

TiDB now has been deployed in our production environment. Thanks to the horizontal scalability and high availability of TiDB, we no longer worry about data volume and can bring high-quality entertainment services to our users with more confidence than before.

In addition to its use in the applications mentioned above, TiDB is also being evaluated or tested in other applications at iQiyi. In some use cases, TiDB needs to handle a mixed scenario of OLTP and OLAP, and that is a good opportunity to put TiSpark to work.

One area of development we’re interested in is getting TiDB Binlog, TiDB’s data synchronizing tool, to synchronize with Kudu and HBase in addition to MySQL. To that end, we plan to invest more in TiDB and send some pull requests to the TiDB community. We believe that with its powerful technology and the professional and highly-motivated team behind it, TiDB will be embraced by companies in more and more industries in the future.

Originally published at www.pingcap.com on Oct 18, 2018

PingCAP is the team behind TiDB, an open source MySQL compatible NewSQL HTAP database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap