Using Streaming, Pipelining, and Parallelization to Build High Throughput Apps

Streaming, pipelining, and parallelization can be used to design applications to maximize the utilization of resources on a single node. Then, you can add additional hardware resources to scale your system out/up.

Increasing throughput without additional hardware resources

How to measure system performance

To describe the performance of a system, we need to take both the input and output into consideration.

Fully consuming system resources with better scheduling

Have you ever played the cooking-themed video game “Overcooked”? The goal is to serve your customers. You take their order, cook the food, and serve it. During cooking, you may need to chop ingredients, combine them, and cook them in a pot. Let’s look at this process more closely and see how things can quickly back up.

Using streaming, pipelining, and parallelization to achieve maximum resource utilization

Breaking down a large task with streaming

The word “streaming” can mean different things in different contexts. For example, “event streaming” means a set of events coming one after another like a stream. In the context of this article, streaming means breaking a big task into smaller chunks and processing them in sequence. For example, to download a file from the web browser, we can break the file into smaller data blocks, download one data block, write it to our local disk system, and repeat the process until the whole file is downloaded.

Folding the processing time with pipelining; idle = waste

Assume that an application is running without a streaming design, and it takes two steps. Step 1 is encoding. It consumes CPU resources and takes 100s. Step 2 is writing to disk. It consumes I/O bandwidth and takes 70s. Overall, it takes 100s + 70s = 170s to run this application.

Maximizing resource consumption with parallelization

With streaming and pipelining, the tasks have been broken down into small enough actions, and the consumption curve for one resource is close to flat, although it does not reach 100%. We can apply parallelization to fully utilize the bottleneck resource and achieve high performance, for example, 100% CPU occupation.

--

--

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PingCAP

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap