Introduction
In this article, we discuss Cinabro platform version 1. With Cinabro platform, we are aiming at a software driven, open flash array architecture for scalable cloud storage services. We first describe Cinabro architecture that represents our vision about how flash memory storage should be built and work inside data centers. Then, we detail the 1st version of Cinabro platform.
Cinabro Architecture
We started out with two motivations. First motivation is to serve diverse cloud storage requirements. Once a hyper-scaler approves a data center SSD model, SSDs are put into operation for typically 3 to 5 years. Due to the high cost associated with SSD firmware updates, updates are made rather infrequently and hence made only for critical bug fixes. Therefore, more flexible mechanisms are desired in order to accommodate ever evolving data center storage workloads. Second motivation is to streamline flash memory deployments. Current data center SSD hardware designs are optimized for self-contained single units and does not utilize the resources of other SSDs. By eliminating redundancies across SSD units and also leveraging more powerful and cost-effective server CPU/memory, Cinabro architecture facilitates timely deployment of latest NAND technologies.
Fig.1 depicts Cinabro appliance hardware. It is based on today's off-the-shelf server hardware whose major components are server CPUs (e.g. x86), DRAMs, motherboards, and backup power supplies.
CBBridge is a lean yet robust enterprise grade NAND flash controller. Its optimization features make it possible to build fast and robust all flash array systems. The optimization features include (1) maximum exposure of NAND die/channel geometry for fine grain media management and IO isolation; (2) parallel request processing with automatic PU (Parallel Units) conflict detection to minimize sync operation penalty; (3) optimistic protection policy for volatile chunk information; (4) end to end data path protection. CBBridge also provides advanced media management features including (1) special command and automatic data generator to support border line programming; (2) hardware automation for vector command processing; (3) CBLink for direct data copy between multiple controllers. Fig.2 shows block diagram of CBBridge NAND controller.
Cinabro appliance also accommodates optional components in hardware accelerators and SCMs (storage class memories). GPU or FPGA based PCIe cards offload the appliance CPU for application specific functions such as machine learning parallel computations. SCMs provide much bigger main memory space than DRAM can potentially provide. It also provides non-volatile byte addressable memory regions. Cinabro appliance can deploy huge metadata or index structure over SCM's huge in-memory space.
Fig.3 shows our open-source based software architecture, CBOS. At its core, the software-based high speed parallel FTLs run over different numbers of flash modules. Our FTLs provide multi-tenancy features such as I/O isolation, defragmentation, multiple reliability modes, and multiple mapping modes. In addition, the FTLs execute in high performance by running parallel user level instances over multiple CPU cores, requiring minimum meta-data writes based upon reliable power backup and BMC protocols, and utilizing global media management across NAND modules using CBLink. CBOS harnesses polling based, user-level storage software such as SPDK that facilitates higher hardware utilization and balanced performance.
Cinabro appliances are designed to be deployed in a scale-out fashion inside data centers. Fig.4 shows how multiple Cinabro appliances are connected to the data center fabric via high speed Ethernets. NVMe-oF (NVMe over Fabrics) on top of RDMA exposes different sizes of block devices and logical volumes with near local storage latency. Most software modules run inside containers as microservices and are orchestrated by Kubernetes. There are several advantages by deploying software in this scale-out framework. First, diverse FTLs can be launched on-demand and address dynamic workload characteristics from multiple tenants and applications. Second, elastic block managers can accommodate dynamic data provisioning and data protection over multiple Cinabro appliances. Finally, the framework inherits all the benefits from Dockers and Kubernetes deployments, i.e., natural blending into modern cloud operations, efficient isolation mechanisms for HW components, scalable orchestration of multiple containers and scalable resource allocation and monitoring.
CBFlashAnalytics system collects NAND characteristics data during both development and operation phases. This analytics data can be used in multiple fronts to enhance the overall flash array storage efficiency. Using the latest machine learning technologies, the system can discover non-trivial patterns with both spatial and temporal variations. These variations can be systematically explored through machine learning based ECC algorithms (LDPC LLR/H-matrix), erasure codes, caching & buffering, etc. Cinabro architecture is a natural platform to run ML optimizations because its direct interface to NAND geometry, software-based host FTLs and a multitude of OCSSDs allow massive NAND characterization. Following figure shows the block diagram of CBFlashAnalytics system.
Cinabro Platform V1
We started building the Cinabro platform in the 2nd half of 2017. Currently, we have V1 platform which consists of following hardware components:
Supermicro 7049GP-TRT server with dual socket Intel Xeon Scalable Processors
CRZ Cosmos Mini FPGA OpenSSD (designed by Hanyang University)
Mellanox RoCEv2 RDMA Ethernet NIC card
For V1, we didn't implement following hardware components which are part of the original Cinabro architecture:
CBBridge features (instead we integrated with Xilinx FPGA for OCSSD 1.2 firmware)
Backup batteries
Customized BMC
Customized PCIe fabric
SCM
Hardware accelerators
For the CBOS software stack, we have implemented or integrated following modules for V1:
Customized SPDK nvmf_tgt app with host based FTL and OCSSD access features
GFTL: Page based host FTL
ubm: SPDK bdev layer to access Cosmos Mini OpenSSD
OCSSD qemu-nvme virtual machine: configured to run with SPDK and Mellanox RDMA NVMe-oF
Docker container images hosting customized SPDK apps including nvmf_tgt, vhost_tgt and benchmark apps with SPDK fio, RocksDB plug-in
Docker images for pilot video server application using nginx web server
Kubernetes YAML configuration files that launch Docker container images with specified resource allocation including hugepages for SPDK
Performance and Resource Monitor: Kubernetes dashboard along with Heapster/InfluxDB/Grafana and Netflix Vector
We did not implement following software features for V1:
Flash Management / Data & Storage Management / Out-of-Band Management
Redfish/Swordfish integration for Performance and Resource Monitor
Host FTLs: Garbage Collection / Wear Leveling
I/O isolation at flash die or channel boundaries (Instead it is implemented at SSD boundaries)
Recommended software versions are:
Linux OS: CentOS 7 or later, Ubuntu 16.04 LTS or later
SPDK: 17.10 or later
Kubernetes: 1.10 or later (Docker and Kubernetes features have been only tested with Ubuntu)
Availability
Currently, CircuitBlvd is offering Cinabro Platform V1 to selected partners. Interested parties can contact info@circuitblvd.com for additional information.
Acknowledgments
We thank following entities in support of building Cinabro Platform V1:
CRZ and Hanyang University for providing OpenSSD technologies and hardware
Supermicro for providing the Intel Xeon server and related information
SPDK community for the open source user space storage software framework
Hozzászólások