Interview: With Release 2.4, Lustre Moves Up with Clustered Metadata and ZFS

Today OpenSFS announced the general availability of Lustre 2.4 file system, a feature release that extends the capabilities of Lustre significantly. To learn more, I caught up with Galen Shipman, Chairman of OpenSFS.

insideHPC: How does this move Lustre closer to Big Data?

Galen Shipman: Big Data, or data intensive workloads increasingly require a file system capable of supporting the concurrent processing of tens or even hundreds of thousands of files within a single file system namespace. We are seeing these requirements across a variety of areas from social media (Yahoo! and Facebook), Bioinformatics, data capture from large-scale sensor networks, and other data intensive workloads. Take for example MapReduce workloads. In order to scale a MapReduce workload you need a file system namespace whose performance scales as you may have many thousands of files created and accessed concurrently from your MapReduce tasks. With DNE, more commonly known as “Clustered Metadata,” Lustre is now capable of scaling the file system name space performance in a similar way that it scales I/O bandwidth and capacity today.

insideHPC: What are the advantages of DNE and Clustered Metadata?

Galen Shipman: Lustre has long provided horizontal scalability of both capacity and I/O bandwidth. When a user required more bandwidth and capacity you simply increase the number of Lustre Object Storage Servers (OSSs) in the cluster. This has served HPC environments well, but when we look across the broader Big Data space we increasingly see the need to scale metadata performance horizontally as well. I’m pleased to report that Lustre now provides this capability. When workloads demand higher metadata throughput you simply increase the number of Lustre Metadata Servers in the cluster. Other Big Data oriented file systems such as HDFS have recognized this need but unlike the HDFS solution that federates the namespace, Lustre provides horizontal scalability in a single namespace. We have been eagerly awaiting this feature, it provides Lustre with an entirely new dimension of scalability.

insideHPC: What are the advantages of integrating Lustre with ZFS?

Galen Shipman: ZFS provides an extremely rich set of features as a backend storage target for Lustre OSSs. Most notably, ZFS is able to scale storage performance and capacity vertically on a single server while providing high-end resiliency and data integrity features. These features include checksums in metadata, advanced RAID levels (RAID-Z), native support for SSD accelerators via the ZFS intent-log and L2 ARC, nearly instantaneous snapshotting, the list goes on and on. These advanced features further improve the resiliency and scalability of Lustre.

insideHPC: Isn’t ZFS too slow for an HPC environment?

Galen Shipman: The first port of ZFS to Lustre used FUSE, File system in Userspace. More recently, LLNL created a native kernel based port of ZFS for Linux. The native port does not exhibit the performance limitations inherent to a FUSE port. ZFS is a modern, fast, scalable file system. ZFS, and other copy-on-write filesystems such as BTRFS hold advantages over other file systems such as ext4 particularly in HPC environments. Copy-on-right is particularly well suited to wringing the highest performance out of the highly random write workloads that are typically seen on Lustre servers. Built in compression also serves to improve apparent throughput for some HPC applications. These features can yield significantly better performance for real-world workloads.

insideHPC: Are there licensing issues with ZFS?

Galen Shipman: Now recognize that I’m not a lawyer, but my understanding of the issue is as follows. The Linux kernel is licensed under the GNU General Public License Version 2 (GPLv2) while ZFS is licensed under the Common Development and Distribution License (CDDL). Both licenses are true Free Software, copyleft licenses. There are however differences between these licenses that prevent mixing of software licensed under these two licenses. This prevents compiling ZFS directly within the kernel and then distributing it, fortunately this isn’t necessary.

insideHPC:Does not including ZFS in the packages help work around these issues?

Galen Shipman: As designed, ZFS is built independently as a kernel module and avoids using kernel APIs that are flagged GPL-only. While there is some debate on this issue in the free software community, many companies distribute kernel modules in a similar fashion today holding that a kernel module does not constitute a derivate work and therefore does not violate the kernel’s license.

insideHPC: Who would provide support for ZFS in this kind of integrated environment?

Galen Shipman: Intel’s High-Performance Data Division is providing support for Lustre 2.4 today. As ZFS in Lustre is entirely new with the 2.4 release I anticipate that the market for support will evolve over the coming months, and we are likely to see more support options to complement Intel’s offering.

insideHPC: This is a feature release. How does it measure up in terms of stability?

Galen Shipman: The Lustre engineers across the community have been very focused on stability, more so than at any other point in Lustre’s history. This is a natural maturation and the development community should be applauded for their efforts. The level of rigor that is now fully embedded in their development processes, from detailed code inspections, component level testing, and integrated testing at scale is remarkable. As with any major new feature release, there will certainly be some new issues encountered, but keep in mind that Lustre 2.4 is the new stable release series, and we will continue to see new maintenance releases of 2.4 well into the future.

Resource Links: