HybridFS - a high performance and balanced file system framework with multiple distributed file systems

Zhang, L, Wu, Y, Xue, R, Hsu, T, Yang, H and Chung, Y (2017) 'HybridFS - a high performance and balanced file system framework with multiple distributed file systems.' In: Proceedings: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). IEEE Computer Society, Los Alamitos, pp. 796-805. ISBN 9781538603673

Official URL: http://dx.doi.org/10.1109/COMPSAC.2017.140

Abstract

In the big data era, the distributed file system is getting more and more significant due to the characteristics of its scale-out capability, high availability, and high performance. Different distributed file systems may have different design goals. For example, some of them are designed to have good performance for small file operations, such as GlusterFS, while some of them are designed for large file operations, such as Hadoop distributed file system. With the divergence of big data applications, a distributed file system may provide good performance for some applications but fails for some other applications, that is, there has no universal distributed file system that can produce good performance for all applications. In this paper, we propose a hybrid file system framework, HybridFS, which can deliver satisfactory performance for all applications. HybridFS is composed of multiple distributed file systems with the integration of advantages of these distributed file systems. In HybridFS, on top of multiple distributed file systems, we have designed a metadata management server to perform three functions: file placement, partial metadata store, and dynamic file migration. The file placement is performed based on a decision tree. The partial metadata store is performed for files whose size is less than a few hundred Bytes to increase throughput. The dynamic file migration is performed to balance the storage usage of distributed file systems without throttling performance. We have implemented HybridFS in java on eight nodes and choose Ceph, HDFS, and GlusterFS as designated distributed file systems. The experimental results show that, in the best case, HybridFS can have up to 30% performance improvement of read/write operations over a single distributed file system. In addition, if the difference of storage usage among multiple distributed file systems is less than 40%, the performance of HybridFS is guaranteed, that is, no performance degradation.

Item Type: Book Chapter or Section
Note:

ISSN 0730-3157

Divisions: Bath School of Design
Identification Number: https://doi.org/10.1109/COMPSAC.2017.140
Date Deposited: 02 May 2017 14:46
Last Modified: 05 Jan 2022 16:07
URI / Page ID: https://researchspace.bathspa.ac.uk/id/eprint/9518
Request a change to this item or report an issue Request a change to this item or report an issue
Update item (repository staff only) Update item (repository staff only)