A history of Databases
It is rare for any technology to exist for more than 50 years in a rapidly changing IT industry. Databases have a longevity that is an anomaly, as other technologies come and go with the times. If we look back at the history of databases, we can distinguish five stages:
- Stage 1: The mainframe era of the 1950s. In this era, there were likely less than 100 mainframes, which were basically used in scientific research, national defense and applications in some financial fields. At the time, databases were mainly hierarchical and mesh databases and the typical product was IBM’s IMS. IMS is now rarely seen.
- Stage 2: In the 1970s and 1980s, minicomputers became increasingly popular in defense and scientific research, and in commercial fields such as banks. At the same time, relational databases were born, and databases such as DB2, Oracle, and Ingres appeared.
- Stage 3: In the 1990s, as the x86 architecture and local area networks became more stable, IT applications came to widespread use in enterprises, and relational databases began to flourish. Data warehouses and stand-alone databases, such as SQL Server and dBase became popular.
- Stage 4: In the 2000s, the Internet era took off, with search, social networking, e-commerce, and other internet applications rising in prominence. At this time, open source databases such as MySQL, PG, Redis, and MongoDB became widely adopted.
- Stage 5: Today, we see the cloud era of databases. Whether it is in new media, mobile applications, cloud computing, the Internet of Things, the rapid development of online education or the rise in people working online from home due to the pandemic, cloud databases play an increasingly important role. Typical cloud database products are Mongodb Altas and Snowflake.
As we look back, we find that each database generation evolves with the underlying infrastructure changes — from mainframes, minicomputers, PC, to the internet and to the cloud.
Today is the era of the Cloud, and a new generation of the Cloud database.This represents a good opportunity for database practitioners.
Cloud leverage B2B Business from offline to online
As we all know, the Cloud has changed the underlying infrastructure of the database. Application developers do not need to know how to install and maintain the database; they only need to know how to use it, making it fast and convenient for application development. The Cloud lowers the bar for databases as DBaaS platforms makes databases easy to use.
Because of the Cloud, the new generation of databases have undergone a lot of change. All modern databases such as Mongodb Altas and Snowflake fully leverage the cloud infrastructure and services to reduce costs and increase efficiency of customers. All this is possible due to the Cloud’s elasticity, low cost, high availability, and intelligent capabilities.
One big difference between Internet and traditional businesses is that being online is far more efficient than being offline. For example, in 2021, Amazon won for the second year in a row, capturing 17.7% of overall Black Friday sales. Online sales revenue grew again this year, accounting for 38.1% of overall sales on Black Friday, up 11.5 points from last year.It is an amazing achievement that online can beat offline giants.
Cloud is the ‘Internet’ for the B2B business. The Cloud not only innovates in infrastructure but also brings B2B offline businesses online. Through Cloud, a small database company can provide service for tens of thousands of customers. That’s also the key reason why the market gives high value to SaaS companies.
The Cloud brings big opportunities to B2B businesses, and allows more startups to redefine and help customers build the modern data stack for customers.
Next-Gen Cloud database
When we talk about the next-gen database, it should fulfill the requirement of growth of data explodes, so it’s architecture has to be totally distributed.
TiDB is a next-gen distributed SQL database that has been active in recent years. It was initially inspired by Google Spanner and F1. Currently, there are several distributed SQL projects that are active in the community: CockroachDB (CRDB), YugabyteDB, SingleStore and TiDB. CRDB and YugabyteDB are PostgreSQL compatible and SingleStore and TiDB are MySQL compatible. Of the two MySQL compatible databases, only TiDB is Open Source.
TiDB is not a modified fork of the MySQL code. Instead, it was developed from scratch with a modern architecture. TiDB adopts a typical shared-nothing design, which ensures its elastic scalability. The TiDB architecture has two layers: the top is a stateless SQL layer responsible for connection management, SQL parsing and optimization, and distributed execution plan generation. The bottom layer is TiKV, a distributed key-value storage layer. TiKV supports ACID transactions, and stores the data. The SQL layer converts data in relational models such as tables, rows, and indexes into key-value pairs and stores them in TiKV.
The advantage of this architecture is that it is easier to achieve elastic distribution in a key-value model than in a relational model. This is because in the relational model, there are many concepts: tables, databases, many columns in a row, resulting in many complex sharding strategies. In the key-value model, it is only necessary to track the range of keys (or hash of keys). It is described the details in a blog post, Building a Large-scale Distributed Storage System Based on Raft. TiDB uses Raft, a modern consensus algorithm, to achieve high availability and horizontal scalability in a distributed architecture. In addition, since the SQL layer is stateless with no storage dependencies, it is convenient to scale the connection layer and the computing layer independently of each other.
TiDB’s positioning in the early days was more of an OLTP database which was designed to handle high-frequency transactions with low-latency and high-concurrency. When MySQL users face scalability challenges and are not up for sharding challenges, TiDB represents a good option as a simpler, less intrusive distributed database that supports elastic scaling.
In the past two years, while the boundaries between OLTP and OLAP blurred, more users sought a simpler database. Building on the existing distributed framework, we implemented TiFlash, a distributed columnar storage engine to make TiDB a true HTAP database. Thanks to TiDB’s decoupled design, the SQL Frontend layer forwards the OLAP request to the Analytical engine called TiFlash. Users do not need to care about the synchronization of data replicas between the column storage and the row storage, and OLTP and OLAP workloads are resolved within the same architecture.If you’d like more information on TiFlash’s design, see the TiDB team’s paper, TiDB: A Raft-based HTAP Database.
It is not easy for users to maintain a complex distributed system in a production environment, and it is even more complicated with the various tools necessary to manage a modern database. As mentioned earlier, more users are moving to the cloud where managed services can shield much of these complexities. TiDB’s fully managed service is now available on AWS and GCP, making TiDB easy to deploy, manage and use.
TiDB’s core idea and value proposition
We are in a great era of the Cloud. The rise of modern data architectures helps businesses store, manage and discover value from their data. We truly believe that as the growth of data explodes, the importance of databases in this data-driven world will grow even larger.
TiDB‘s core value is to simplify system architectures and unlock more value in data. There are three aspects to how we aim to achieve this:
- Architecture evolution: through loosely coupled architecture to make full use of cloud infrastructure to provide elasticity, low cost, high availability, and intelligent capabilities. We will provide a one-stop data processing platform to simplify system architecture to reduce costs and increase the efficiency of customers.
- Usability evolution to democratize data processing and analysis: full use of cloud automation and intelligence to build a self-service system which is easy to maintain and monitor, with a high SLA.
- Rich and open source ecosystem: from day one, TiDB’s open-source ecosystem is the gene of TiDB. TiDB will continue to adopt the OSS ecosystem, Cloud and third-party SaaS ecosystem.
What are we looking for?
We are looking for a Senior Product Manager to join our global product management team to develop the next-generation TiDB database. Help us plan, define and design our product to build long-term competitiveness in the Cloud. If you are interested, please send your resume or CV to bruce.zhu@pingcap.com.