Getting Started with JuiceFS Using TiKV

Author: Herald Yu (Director of Marketing at Juicedata)

Editor: Ran Huang

As a cloud-native distributed storage system, JuiceFS was designed into a plug-in structure at the beginning of its birth to ensure that new technologies can be continuously integrated into the JuiceFS ecosystem. According to your needs, you can flexibly choose two core components, the data storage engine and the metadata engine, according to their needs.

The data storage engine is mainly object storage. It supports almost all public and private cloud object storage services, as well as KV storage, WebDAV, and local disks. The metadata engine supports databases such as Redis, MySQL, PostgreSQL, and SQLite.

The newly released JuiceFS v0.16 officially supports TiKV, a distributed key-value database, which further meets the requirements for elastic scaling in high-performance, large-scale data storage.

In this article, I’ll share with you how to use JuiceFS and select TiKV as a metadata engine.

TiKV, a highly scalable KV database

TiKV is a distributed transactional key-value database featuring high scalability, low latency, and ease of use. It easily processes over a trillion rows of data in petabytes.

TiKV supports unlimited horizontal scaling and provides a distributed transaction interface that meets ACID compliance. It also ensures data consistency and high availability of multiple replicas using the Raft protocol.

TiKV architecture

TiKV was developed by PingCAP and is a graduated project of the Cloud Native Computing Foundation (CNCF).

Install TiKV

PingCAP provides TiUP, a package manager, which can easily install TiKV on Linux or macOS.

Install TiUP

This command downloads and installs the appropriate TiUP version to your system.

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

For the settings to take effect, you can open a new terminal, or manually declare the global environment variable as follows:

source .bash_profile

Now try to execute the command tiup -v and see the version information similar to the following, which means the installation is successful.

tiup -v1.5.4 tiup
Go Version: go1.16.6
Git Ref: v1.5.4
GitHash: b629670276269cd1518eb28f362a5180135cc985

Deploy a TiKV cluster

For testing purposes, we’ll use the TiUP playground component to install a minimal TiKV test cluster in the local environment.

tiup playground --mode tikv-slim

After the deployment is successful, the terminal displays a message similar to the following:

PD client endpoints: [127.0.0.1:2379]
To view the Prometheus:[ http://127.0.0.1:9090](http://127.0.0.1:9090/)
To view the Grafana:[ ](http://127.0.0.1:3000/)http://127.0.0.1:3000

Among them, <127.0.0.1:2379> is the Placement Driver (PD) address, which manages and schedules the TiKV cluster. JuiceFS will interact with TiKV through this address. The other two addresses are Prometheus and Grafana services, which are used for monitoring and data visualization of TiKV clusters.

Note: The playground component of TiUP is mainly used to quickly build a minimal test cluster of TiDB and TiKV in the local environment. For production environment deployment, please refer to TiKV Official Document.

Install JuiceFS

JuiceFS supports Linux, Windows, and macOS systems. You only need to download the corresponding version of the client program and place it in the executable path of the system. For example, I use a Linux distribution, so I can install the latest version of the client by executing the following commands in sequence.

Check the current system information and set temporary environment variables:

JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')

Download the latest version of the JuiceFS client corresponding to your system:

wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"

Unzip the package:

mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice

Install JuiceFS client to /usr/local/bin:

sudo install juice/juicefs /usr/local/bin

Execute the command and see the help information returned, which means that the client is installed successfully.

juicefsNAME:
juicefs - A POSIX file system built on Redis and object storage.
USAGE:
juicefs [global options] command [command options] [arguments...]
VERSION:
0.16.1 (2021-08-16 2edcfc0)
COMMANDS:
format format a volume
mount mount a volume
umount unmount a volume
gateway S3-compatible gateway
sync sync between two storage
rmr remove directories recursively
info show internal information for paths or inodes
bench run benchmark to read/write/stat big/small files
gc collect any leaked objects
fsck Check consistency of file system
profile analyze access log
stats show runtime stats
status show status of JuiceFS
warmup build cache for target directories/files
dump dump metadata into a JSON file
load load metadata from a previously dumped JSON file
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--verbose, --debug, -v enable debug log (default: false)
--quiet, -q only warning and errors (default: false)
--trace enable trace log (default: false)
--no-agent Disable pprof (:6060) and gops (:6070) agent (default: false)
--help, -h show help (default: false)
--version, -V print only the version (default: false)
COPYRIGHT:
AGPLv3

You can also visit the JuiceFS GitHub Releases page to select other versions for manual installation.

Download TiDBSubscribe to Blog

Use TiKV in JuiceFS

Here I refer to JuiceFS Quick Start Guide, and build a MinIO object storage locally. The access address is http://127.0.0.1:9000, and Access Key ID and Access Key Secret are both minioadmin.

Create a file system

The following command uses the format subcommand provided by the JuiceFS client to create a file system named mystor. In this command, TiKV database address format uses the PD address of the TiKV cluster:

juicefs format \
--storage minio \
--bucket[ http://127.0.0.1:9000/mystor](http://127.0.0.1:9000/mystor) \
--access-key minioadmin \
--secret-key minioadmin \
tikv://127.0.0.1:2379/mystor \
mystor

Parameter Description:

  • --storage: Specifies the data storage engine, here is minio.
  • --bucket: Specifies the bucket access URL. In this case, it is the bucket mystor I created in advance on MinIO.
  • --access-key and --secret-key: Specifies the secret key for accessing the object storage service API.
  • When you use TiKV to store metadata, set the PD address of the cluster. When multiple file systems or applications share the same TiKV, it is recommended to add a prefix. In this case, the prefix mystor is specified in the PD address.

If you see output similar to the following, the file system was created successfully:

2021/08/12 23:28:36.932241 juicefs[101222] <INFO>: Meta address: tikv://127.0.0.1:2379/mystor
[2021/08/12 23:28:36.932 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[127.0.0.1:2379]"]
[2021/08/12 23:28:36.935 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2379] [old-leader=]
[2021/08/12 23:28:36.935 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=6995548759432331426]
[2021/08/12 23:28:36.935 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
2021/08/12 23:28:36.936892 juicefs[101222] <INFO>: Data uses minio://127.0.0.1:9000/mystor/mystor/
2021/08/12 23:28:36.976722 juicefs[101222] <INFO>: Volume is formatted as {Name:mystor UUID:0c9594a8-fe2c-463c-a4b6-eb815f38c843 Storage:minio Bucket:http://127.0.0.1:9000/mystor AccessKey:minioadmin SecretKey:removed BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}

Mount the file system

Use the mount subcommand to mount the file system to the jfs directory under the current user's home directory:

sudo juicefs mount -d tikv://127.0.0.1:2379/mystor ~/jfs

The sudo command is used here to mount the file system as a super user. The purpose is to allow JuiceFS to create and use the /var/jfsCache directory to cache data.

If you see output similar to the following, the file system is mounted successfully:

2021/08/12 23:34:44.288136 juicefs[101873] <INFO>: Meta address: tikv://127.0.0.1:2379/mystor
[2021/08/12 23:34:44.288 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[127.0.0.1:2379]"]
[2021/08/12 23:34:44.291 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2379] [old-leader=]
[2021/08/12 23:34:44.291 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=6995548759432331426]
[2021/08/12 23:34:44.291 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
2021/08/12 23:34:44.296270 juicefs[101873] <INFO>: Data use minio://127.0.0.1:9000/mystor/mystor/
2021/08/12 23:34:44.296768 juicefs[101873] <INFO>: Disk cache (/var/jfsCache/0c9594a8-fe2c-463c-a4b6-eb815f38c843/): capacity (1024 MB), free ratio (10%), max pending pages (15)
2021/08/12 23:34:44.800551 juicefs[101873] <INFO>: OK, mystor is ready at /home/herald/jfs

Use the df command to see the mounting status of the file system:

df -Th
File system type capacity used usable used% mount point
JuiceFS:mystor fuse.juicefs 1.0P 64K 1.0P 1% /home/herald/jfs

After mounting, you can now store data in the ~/jfs directory just like using a local hard disk.

View file system information

The status subcommand can view the basic information and connection status of a file system.

juicefs status tikv://127.0.0.1:2379/mystor{
"Setting": {
"Name": "mystor",
"UUID": "9f50f373-a7ec-4d5b-b790-3defbf6d0509",
"Storage": "minio",
"Bucket": "http://127.0.0.1:9000/mystor",
"AccessKey": "minioadmin",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"Shards": 0,
"Partitions": 0,
"Capacity": 0,
"Inodes": 0
},
"Sessions": [
{
"Sid": 2,
"Heartbeat": "2021-08-13T10:43:35+08:00",
"Version": "0.16-dev (2021-08-12 a871c3d)",
"Hostname": "herald-manjaro",
"MountPoint": "/home/herald/jfs",
"ProcessID": 6309
}
]
}

In the output, you can learn more about the data storage engine used by the file system and the status of the host that the file system mounts on.

In addition, v0.16 and above can also learn the detailed configuration of the file system by viewing the .config virtual file in the root directory of the mount point:

sudo cat ~/jfs/.config{
"Meta": {
"Strict": true,
"Retries": 10,
"CaseInsensi": false,
"ReadOnly": false,
"OpenCache": 0,
"MountPoint": "jfs",
"Subdir": ""
},
"Format": {
"Name": "myabc",
"UUID": "e9d8373c-7ced-49d9-a033-75f6abb44854",
"Storage": "minio",
"Bucket": "http://127.0.0.1:9000/mystor",
"AccessKey": "minioadmin",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"Shards": 0,
"Partitions": 0,
"Capacity": 0,
"Inodes": 0
},
"Chunk": {
"CacheDir": "/var/jfsCache/e9d8373c-7ced-49d9-a033-75f6abb44854",
"CacheMode": 384,
"CacheSize": 1024,
"FreeSpace": 0.1,
"AutoCreate": true,
"Compress": "none",
"MaxUpload": 20,
"Writeback": false,
"Partitions": 0,
"BlockSize": 4194304,
"GetTimeout": 60000000000,
"PutTimeout": 60000000000,
"CacheFullBlock": true,
"BufferSize": 314572800,
"Readahead": 0,
"Prefetch": 1
},
"Version": "0.16.1 (2021-08-16 2edcfc0)",
"Mountpoint": "jfs"
}

Notice: It is important to note that this article uses a local demo environment. If you need to share and mount the same JuiceFS file system on multiple hosts, make sure that the deployed object storage and TiKV cluster can be accessed by all hosts.

Unmount the file system

You can use the umount subcommand to unmount the file system, for example:

sudo juicefs umount ~/jfs

Warning: Force the unmount of the file system in use may cause data damage or loss, please be careful.

Mount at boot

If you don’t want to manually remount JFS on every reboot, you can set up auto mounting.

First, rename the juicefs client to mount.juicefs and copy it to the /sbin/ directory:

sudo cp /usr/local/bin/juicefs /sbin/mount.juicefs

Edit the /etc/fstab configuration file and add a new record:

tikv://127.0.0.1:2379/mystor    /home/herald/jfs       juicefs     _netdev,cache-size=20480     0  0

In the mount option, cache-size=20480 means allocating 20GB of local disk space to JuiceFS cache. Please determine the allocated size according to the actual hardware conditions. Generally speaking, the more cache space for JuiceFS, the better performance.

You can adjust the FUSE mount options in the above configuration according to your needs.

Benchmarking TiKV, MySQL, and Redis

To understand the differences between TiKV and other backends, we benchmarked the performance of TiKV, MySQL, and Redis using Golang’s benchmarking tools. In the same environment, the three databases’ performance varies a lot:

The figures in the table refer to the duration (in microseconds) of each operation, so the smaller duration, the better performance. The digits in parentheses are the ratio to that of Redis-Always.

Both Redis and MySQL have only one replica in the benchmarking, while TiKV uses a three-replica distributed architecture. Even though TiKV has three replicas, it still performs better than MySQL. TiKV is 2~3 times slower than Redis in terms of metadata operations alone. However, when it comes to data storage, object storage latency takes large proportions, so in real-life business, the latency gap between TiKV and Redis is much smaller, and, more importantly, TiKV outperforms Redis in terms of reliability and scalability.

Summary

For JuiceFS, the support for TiKV as a metadata engine is a milestone. As a new choice for JuiceFS users, TiKV provides easier horizontal scaling than Redis and better performance than MySQL and PostgreSQL.

JuiceFS is an open-source project under the AGPLv3, and its development is inseparable from everyone’s contribution. An article, a page of documentation, an idea, a suggestion, a report, or a bug fix, no matter how big or small the contribution is, is the driving force for the progress of an open-source project.

If you’re interested in JuiceFS, here is what you can do for the community:

  • Starring the project on GitHub
  • Post your opinion on the Forum
  • Pick a task from Issues
  • Improve our document
  • Share anything about JuiceFS on your blog, Twitter, Vlog, or other social media.
  • Join our Slack channel
  • Tell more people about JuiceFS

We sincerely welcome everyone who loves open source to join our community and make JuiceFS better together!

An original version of this article was published on Medium.

--

--

--

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to use SQL with Python Pandas

Limit Output Of Python Traceback

Applications of Combinations with Repetition

Generate Invoice Reports with KNIME Blueprint

Setting up CoreData in SwiftUI

Journal 55 — Using Previs Elements in Unity for Game Cutscene

HYPERDEX PLATFORM LAUNCH

Setup a Visual Studio Code Development Environment with Conda, Git, Docker & Powershell for…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PingCAP

PingCAP

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap

More from Medium

The Road to Chaos … as a Service

Apache APISIX makes it more convenient for You to Proxy Dubbo Services

ELK vs Prometheus/Grafana/Jaeger

How to Set up a Kubernetes CronJob to Create Medusa Backups