Long Live MySQL: Kudos to the Ecosystem Innovators
Author: Ed Huang (Co-Founder & CTO at PingCAP)
This post is the second part in our series. You can find the first part here: Long Live MySQL: Good Old MySQL Should Be Rejuvenated.
I am a firm believer in MySQL and its ecosystem. As I said in my previous article, the MySQL ecosystem is still thriving with a lot of innovations in the space, and MySQL applications and developers are still one of the mainstays of the database market. The question is: do we need a more modernized MySQL? Absolutely yes.
In this article, I’ll illustrate the robustness and progressiveness of MySQL by sharing some of the key players in the more modern MySQL ecosystem.
Forks of MySQL: MariaDB and Percona as examples
There is no talking about MySQL alternatives without mentioning MariaDB, an enterprise MySQL fork driven by the open source community. Open source factors aside, MariaDB has a lot to offer both as a technology and a product.
To cope with increasing data volume, MariaDB provides MaxScale, a simple proxy-like component that supports query routing, read-write separation, and automatic failover based on the MySQL Binlog. Scaling with MaxScale is more about sharding data vertically than horizontally. If a user’s application works with large tables, the horizontal sharding workload must be completed in the application layer, which is intrusive to the application.
As the boundaries blur between real-time analytics and online transactions in modern workloads, MariaDB introduced ColumnStore to speed up queries for Online Analytical Processing (OLAP). ColumnStore supports direct reads and writes and was introduced as a MySQL new storage engine. However, currently there isn’t a way for ColumnStore tables to be interconverted with those of InnoDB, MySQL’s default storage engine. The way I see it, the performance of column storage for writing is different from that of row storage in terms of both throughput and latency. Users need to know clearly which workloads are suitable for running on ColumnStore. To achieve a Hybrid Transactional and Analytical Processing (HTAP) experience, ColumnStore must be used with MaxScale to route Online Transactional Processing (OLTP) and OLAP requests to the corresponding table or database at the MaxScale layer. When users design their schema, they must take these requirements into account. Many next generation database companies, not just MariaDB, are noticing this trend to process both OLTP and OLAP workloads.
Percona is a key long-term supporter of the MySQL ecosystem and a recognized expert in MySQL operation and maintenance. They have contributed many well-received tools to the MySQL community, such as pt-online-schema-change and pt-query-digest. Percona’s business model is to package these tools with MySQL, introduce other practical functions based on the MySQL kernel, and release it as a community distribution with commercial support.
The differences between Percona Server for MySQL and MySQL is easy to find on Percona’s official website. Here I will highlight two points that I think are interesting:
- MyRocks. MyRocks is a MySQL storage engine based on RocksDB, an embeddable, persistent key-value store. Percona MyRocks implements Percona Server for MySQL. RocksDB is a log structured merge (LSM) tree implementation. Compared to B-tree, LSM tree has a data structure that caters more to modern hardware, has a better compression ratio, and is easier to implement. In addition, LSM-Tree is built with the idea of tiered storage. If you want to implement a cloud-native storage engine and make full use of different storage media on the cloud, tiered storage is an ideal design. MyRocks would be a great addition to MySQL for modern hardware and cloud infrastructure.
- XtraDB Cluster. Percona integrates ProxySQL as a routing layer and then uses Galera to achieve high availability of storage nodes in a distributed architecture. However, like MariaDB, the most suitable scenarios for XtraDB Cluster are the vertical sharding of tables and databases, read-write separation, and node-level auto-failover. XtraDB Cluster is not a distributed database that supports horizontal scaling.
Percona has supported the ClickHouse community for quite a while. In some ways, this support makes up for the real-time analytical capabilities of MySQL. As businesses become more real-time in the digital transformation era, the mass MySQL user groups hope that the MySQL ecosystem can provide real-time analytics capabilities.
Middleware on top of MySQL: Vitess as an example
When faced with the challenge of massive data, MariaDB and Percona Server choose vertical sharding and read-write separation through proxy. For large single tables, however, horizontal sharding is a better option with less intrusion on the application. The application only needs to specify the sharding key, and then a routing layer decides which shard it should store. The routing layer is transparent to the application. One example of this proxy design is Vitess, a SQL middleware which turns MySQL into a fast, scalable, and highly-available distributed database. Vitess was open-sourced after being widely used at YouTube and has become the mainstream solution for MySQL sharding. However, its limitations are also obvious. For example, a sharding proxy is not a database, but a routing layer for requests. This means that Vitess does not support complex queries, which is a tricky issue for distributed transactions across shards. The failover and service levels heavily rely on the underlying MySQL mechanisms. Of course, the existing MySQL operation and maintenance experience can still be leveraged.
Initial HTAP practitioners: SingleStore as an example
The predecessor of SingleStore is MemSQL. Its early positioning was as a distributed OLAP database that supports massively parallel processing (MPP), with an all-in-memory design. Later, this solution added disk storage support and was renamed SingleStore. SingleStore supports a row-based storage format for OLTP workloads. However, the default storage engine is ColumnStore, which indicates that it focuses more on OLAP workloads. Because SingleStore can handle both workloads in the same database, it can be classified as an HTAP database. Since then, many databases in the MySQL ecosystem have enhanced their OLAP capabilities.
There are two types of nodes in the SingleStore architecture: aggregator nodes and leaf nodes. The leaf node stores table data shards. Some SQL operators are pushed down to the leaf node to accelerate computing. SingleStore is also one of the few OLAP databases that follow codegen technology routes, in which execution plans are compiled into machine code and cached locally for reuse. Other OLAP databases, such as ClickHouse and MonetDB, adopt vectorization to develop faster analytical query engines by making efficient utilization of CPU cache.
Because SingleStore is compatible with the MySQL protocol, it is easier to adopt and more cost effective. Overall, I think it has a good market and growth strategy.
RDS on public cloud: Amazon Aurora as an example
Amazon Aurora is a significant database innovation. Aurora does not use a shared-nothing architecture like Spanner. Instead, it is optimized based on a stand-alone RDBMS. Aurora keenly seized the following characteristics of long-tail users on the cloud:
- High expectations for elasticity and relatively low requirements for scalable storage capacity
- Higher scalability requirements for data reads than writes
- Mostly RDBMS users because it’s 100% compatible with MySQL
Aurora also leverages the capabilities of cloud infrastructure in a smart way: the throughput of shared storage is basically the same as that of local disks. Shared storage for a database means that the computing node can become stateless, thereby realizing the separation of computing and storage. This is possible because computing and storage have different hardware requirements. For example, to scale the read requests, Aurora simply adds a new computing node of the read replica. The process does not involve much data migration effort except for the cache warm-up time. This is also a key prerequisite for serverless implementation.
In addition, Aurora’s shared storage does not mean just running InnoDB on a distributed file system. It represents the design concept of “The log is database.” In fact, as long as the Redo log is secured, the data is secured. The storage uses Redo log records to build the page image on demand. Compared with traditional Binlog replication, this method has reduced log serialization, network transmission load, and the write amplification caused by applying the binlog. This significantly improves the write performance when compared with a standalone database. As a cloud provider, AWS offers a complete cloud hosting service for Aurora.
The problem with Aurora is that it is essentially a standalone database. Even with its modern design, if Aurora is faced with large data volume or write scalability issues, it may still need to be supplemented by sharding middleware like Vitess or distributed SQL databases like TiDB. Additionally, Aurora supports parallel scanning for some types of computation on the read replica, but its main focus still seems to be on traditional OLTP workloads.
Next-Gen distributed SQL: TiDB as an example
TiDB is one of the next-gen distributed SQL databases that has been active in recent years. It was initially inspired by Google Spanner and F1. Currently, there are several distributed SQL projects that are active in the community: CockroachDB (CRDB), YugabytesDB, and TiDB. CRDB and YugabytesDB are PostgreSQL compatible; however, of the three, TiDB is the only one that is MySQL compatible.
TiDB is not an altered fork of the MySQL code. Instead, it was developed from scratch with a more modern design. TiDB adopts a typical shared-nothing design, which ensures its elastic scalability. The TiDB architecture has two layers: the top is the stateless SQL layer responsible for connection management, SQL parsing and optimization, and distributed execution plan generation. The bottom layer is TIKV, a distributed key-value storage layer. TiKV supports ACID transactions, and it stores the data. The SQL layer converts data in relational models such as tables, rows, and indexes into key-value pairs and stores them in TiKV. The advantage of this design is that it is easier to achieve elastic distribution in a key-value model than in a relational model. This is because in the relational model, there are too many concepts: there are tables, there are databases, there are many columns in a row, and that means there are countless kinds of sharding strategies. However, for the key-value model, it is obvious to follow the range of key (or hash of keys). I described the details in the blog post, Building a Large-scale Distributed Storage System Based on Raft. TiDB adopts Raft, a modern consensus algorithm, to achieve high availability and horizontal scalability on a distributed architecture. In addition, since the SQL layer is stateless with no storage dependency, it is convenient to scale the connection layer and the computing layer independent of each other.
Unlike SingleStore, TiDB’s positioning in the early days was as more of an OLTP database, which was designed to handle high-frequency transactions with low-latency and high-concurrency. When MySQL users are faced with scalability challenges, if they are up for sharding and are looking for a simpler, less intrusive distributed database that supports elastic scaling, TiDB could be a good option.
In the past two years, while the boundaries between OLTP and OLAP are blurring, more users sought a simpler database. Building on the existing distributed framework, we implemented TiFlash, a distributed columnar storage engine to make TiDB a true HTAP database. Thanks to TiDB’s decoupled design, the SQL computing layer forwards the OLAP request to the columnar replica on TiFlash. Users do not need to care about the synchronization of data replicas between the column storage and the row storage, and OLTP and OLAP workloads are resolved within the same architecture. I will not explain the technical details here, but if you’d like more information on TiFlash’s design, see the TiDB team’s paper, TiDB: A Raft-based HTAP Database.
It is not easy for users to maintain a complex distributed system in a production environment, and it is even more complicated with the various tools built into modern databases. As mentioned earlier, more users are moving to the cloud, where managed services can shield these complexities. TiDB’s fully managed service is also available on AWS and CCP, which makes an important difference.
However, there is a lot of legacy code in MySQL, after all it’s been around for more than 20 years and was mostly designed around MyISAM, a non-transactional storage engine. Some of that idiosyncrasy doesn’t transfer well into a fully distributed database architecture. To fully take advantage of the capabilities of the distributed architecture, TiDB’s SQL optimizer and executor are designed as a distributed environment computing engine. Therefore, some behaviors may be different from that in stand-alone MySQL.
MySQL users now have a greater choice with more scalable, innovative distributed architectures. There is no shortage of innovation in the MySQL ecosystem, and some of these projects are even leading the database industry in terms of technical progress. These are exciting times and the future is bright.
Long live MySQL.
This post is the second part in our series. You can find the first part here: Long Live MySQL: Good Old MySQL Should Be Rejuvenated.