Vhost vs local NVMe-over-fabrics targets
We show how to set up vhost targets as a local SPDK storage service and measure the basic set of performance numbers in comparison to local NVMe-over-fabrics connections. Local NVMe-over-fabrics connections provide an alternative way to provide local storage service based on SPDK.
SPDK provides an accelerated vhost target by applying the user space and polling techniques. Since SPDK is polling for vhost submissions, it can signal the VM to control I/O submissions. As a result, CPU usage can be significantly reduced on heavy I/O workloads. Currently, SPDK provides Vhost-SCSI, Vhost-BLK, and Vhost-NVMe as vhost targets. In this note, we use Vhost-BLK for the target storage.
Qemu VM connecting to vhost targets
We set up a Qemu VM that utilizes Vhost-BLK targets as shown in the diagram.
First, create a Qemu VM and install Ubuntu 18.10 desktop on it. We continue to use OCSSD qemu-nvme for our Qemu VM which is based on Qemu emulator version 3.1.50. SPDK vhost target requires qemu version 2.12 or later for Vhost-BLK targets.
cbuser@pm111:~/github/qemu-nvme/bin$ ./qemu-img create -f qcow2 \ ~/work/qemu/u1.qcow2 20G cbuser@pm111:~/github/qemu-nvme/bin$ ./qemu-system-x86_64 -m 4G \ -enable-kvm -drive if=virtio,file=~/work/qemu/u1.qcow2 \ -cdrom /tmp/ubuntu-18.10-desktop-amd64.iso -vnc :2
Second, we initiate the SPDK vhost target service on the physical machine that hosts qemu VM. For the target block devices, we use an SPDK malloc ram drive and an NVMe SSD drive.
cbuser@pm101:~$ sudo NRHUGE=20 ~/github/spdk/scripts/setup.sh cbuser@pm101:~$ sudo ~/github/spdk/app/vhost/vhost --wait-for-rpc \ -r localhost:7778 -S /var/tmp & cbuser@pm101:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7778 \ start_subsystem_init cbuser@pm101:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7778 \ construct_malloc_bdev -b Malloc1 512 512 cbuser@pm101:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7778 \ construct_vhost_blk_controller vhost.0 Malloc1 cbuser@pm101:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7778 \ construct_nvme_bdev -b NVMe0 -t PCIe -a 0000:d8:00.0 cbuser@pm101:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7778 \ construct_vhost_blk_controller vhost.1 NVMe0n1
Third, we launch the qemu with the following parameters. Note that we pass /var/tmp/vhost.[0,1] to identify the vhost targets we created in the second step.
cbuser@pm111:~$ sudo ~/github/qemu-nvme/bin/qemu-system-x86_64 \ --enable-kvm -cpu host -smp 1 -m 4G \ -drive file=/home/cbuser/work/qemu/u1.qcow2,if=none,id=disk \ -device ide-hd,drive=disk,bootindex=0 \ -object memory-backend-file,id=mem0,size=4G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem0 \ -chardev socket,id=spdk_vhost_blk0,path=/var/tmp/vhost.0 \ -device vhost-user-blk-pci,chardev=spdk_vhost_blk0,num-queues=4 \ -chardev socket,id=spdk_vhost_blk1,path=/var/tmp/vhost.1 \ -device vhost-user-blk-pci,chardev=spdk_vhost_blk1,num-queues=4 \ -net nic,macaddr=DE:AD:BE:EF:01:41 \ -net tap,ifname=tap0,script=q_br_up.sh,downscript=q_br_down.sh -vnc :2
Inside the VM, you can verify that the vhost block devices have been created as expected. In the example below, you can see the information in the lines of vda and vdb.
cbuser@qemu-nvme141:~$ lsblk --output \ "NAME,KNAME,MODEL,HCTL,SIZE,VENDOR,SUBSYSTEMS" NAME KNAME MODEL HCTL SIZE VENDOR SUBSYSTEMS fd0 fd0 4K block:platform loop0 loop0 3.7M block loop1 loop1 140.9M block ...(truncated)... sda sda QEMU HARDDISK 1:0:0:0 20G ATA block:scsi:pci └─sda1 sda1 20G block:scsi:pci vda 512M 0x1af4 block:virtio:pci vdb 477G 0x1af4 block:virtio:pci
Comparison with local NVMe-over-fabrics
One can set up NVMe-over-fabrics SPDK malloc local targets as described in our previous tech notes. We also show additional SPDK setup commands needed for the NVMe-oF RDMA target based on the NVMe drive.
cbuser@pm111:~$ sudo ~/github/spdk/app/nvmf_tgt/nvmf_tgt --wait-for-rpc \ -r localhost:7779 cbuser@pm111:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7779 \ construct_nvme_bdev -b NVMe0 -t PCIe -a 0000:d8:00.0 cbuser@pm111:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7779 \ nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode02 NVMe0n1 cbuser@pm111:~$ sudo ~/github/spdk/scripts/rpc.py -s localhost -p 7779 \ nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode02 -t RDMA \ -a 10.0.0.11 -s 4421
We took latency and throughput fio tests. For simplicity, we used command line parameters to fio without job files. A throughput and latency test parameters are shown below.
cbuser@qemu-nvme141:~$ sudo fio --name=global --filename=/dev/vda \ --direct=1 --rw=read --norandommap --ioengine=libaio --bs=256k \ --iodepth=64 --numjobs=1 --time_based --runtime=60 --group_reporting \ --name=job1 cbuser@pm111:~$ sudo fio --name=global --filename=/dev/nvme1n1 \ --direct=1 --rw=randwrite --norandommap --ioengine=libaio --bs=4k \ --iodepth=1 --numjobs=1 --time_based --runtime=60 --group_reporting \ --name=job1
The test results are presented in the graphs below.
For throughput, SPDK malloc drive boasts the highest numbers with vhost. NVMe-oF throughput numbers are much lower. The NVMe drive's write throughput number seems to indicate that about 2.1GB/s is bounded by the SSD's performance. We verified this with additional device tests. The local NVMe-oF throughput maxes out around 2.4GB/s.
For latency, vhost performs about 3.2usec to 7.9usec better than NVMe-oF RDMA connections. For all tests, NVMe SSD's are preconditioned with x2 disk capacity writes which are indicated by much higher read latency numbers for both vhost and NVMe-oF read latency.
Lastly, we measured reference numbers, i.e. the best read throughput and latency, of SPDK malloc and the NVMe SSD. Read throughput was virtually a tie while vhost protocol seems to add about 5.2usec to the read latency.
Vhost and NVMe-oF RDMA protocols are useful tools to expose local SPDK devices as kernel block devices. This is one of the powerful features of SPDK since one can expose their SPDK devices to multitudes of existing applications. Vhost protocol showed superior performance while local NVMe-oF RDMA connections still showed sufficient performance for scenarios where applications run on the same physical machine. Our measurements here are not meant to be comprehensive. Interested readers should run their own tests before determining whether this configuration suits their requirements.
Sticking to following software versions will ensure all the commands in this article work in your environment:
Linux OS: Ubuntu 18.10 Desktop for both physical machine and qemu virtual machine
Linux kernel: 4.18.0 version
Contact firstname.lastname@example.org for additional information.