GPU based Computational Storage with Video Tagging Example

Sungjoon Ahn
May 9, 2019
2 min read

Updated: May 13, 2019

Overview

GPUs have become a widely popular tool to perform various machine learning tasks. Circuit Blvd's Cinabro storage platform may function as computational storage by populating GPU cards into PCIe slots. In this technote, we show an example of computation storage applications where incoming video streams get processed in real-time while being stored into I/O isolated storage partitions. During the real-time processing, machine learning inference models run on each video frame to identify and tag multiple objects.

Machine learning setup

We use Nvidia's GeForce 1080Ti GPU card for the video object identification. In addition, we chose YOLO (You Only Look Once), a machine learning based real-time object detection system. We leverage two YOLO-based software implementations by darknet and pyimagesearch. For hardware installation, one should ensure that all the GPU power cables are connected correctly as shown in the figure below.

ree — Fig 1. Power cables connection to the GPU card

The following picture shows the setup used in this technote with one GPU, four units of OCSSDs, and one NVMe-oF NIC card.

ree — Fig. 2 Our computational storage setup

For software installation, both darknet and pyimagesearch codes are straightforward to build. Browse their web sites for additional details if needed. In order to leverage hardware accelerations provided by Nvidia, make sure to install CUDA and OpenCV.

Validation

We validated the setup for one real-time video stream into an OCSSD hardware over local NVMe-oF connection. We've tweaked pyimagesearch's python codes to have more visible object text labels and generate object indices. The indices can be later used by database systems to search for particular objects depending on application scenarios. The figure below shows an example video frame that has been processed.

ree — Fig 3. Objects identified by our YOLO based code

The object detection performance was around 30 FPS for HD resolution video feeds which satisfied the real-time processing requirement. This number was expected by YOLO-based machine learning algorithms. Local NVMe-oF connection provided ample I/O performance for this single stream scenario. Because Nvidia's enterprise-grade GPU offerings allow usage inside virtual machines, we can set up vhost targets off multiple OCSSD partitions. This will realize even higher efficiency between GPU acceleration engine with QoS guaranteed OCSSD storage hardware.

Conclusion and What's next

We've validated that a single stream video object detection can be done in real-time over the Cinabro computational storage setup. The contemporary GPU array servers support as many as 20 units of GPU cards. If one combines such systems with Cinabro's low latency and high predictability SPDK/OCSSD based storage platform, it is possible to build a cost-effective parallel video object identification system. One potential business area is home or office securities. Machine learning inference models can process parallel video streams from customer sites in real-time and greatly reduce human involvement in identifying and responding to only real security threats.

ree — Fig 4. Computational storage system for security applications

Software versions

Sticking to following software versions will ensure all the commands in this article work in your environment:

Linux OS: Ubuntu 18.10 Desktop
Linux kernel: 4.18.0-17 version
SPDK: v19.01-819-gf74643ef0
Darknet: yolov3
OpenCV: 3.4.0 and 4.0.0
CUDA: 10.1.105

Questions?

Contact info@circuitblvd.com for additional information.