Stabilizing and boosting I/O performance for file systems with journaling on NVMe SSD
Many journaling file systems currently use non-volatile memory-express (NVMe) solid-state drives (SSDs) as external journal devices to improve the input and output (I/O) performance. However, when facing microwrite workloads, which are typical of many applications, they suffer from severe I/O fluctuations and the NVMe SSD utilization is extremely low. The experimental results indicate that this phenomenon arises mainly because writing back data to backend file systems on hard disk drives is much slower than journal writing, causing journal writing to frequently freeze because of the two-phase mechanism. We, therefore, propose a merging-in-memory (MIM) acceleration architecture to stabilize and boost the I/O performance for such journaling file systems. MIM employs an efficient data structure of hash-table-based multiple linked lists in memory, which not only merges random microwrites into sequential large blocks to speed up writebacks but also provides additional gains in terms of reducing the frequency of write addressing and object opening and closing. Using a prototype implementation in Ceph FileStore, we experimentally show that MIM not only eliminates severe fluctuations but also improves the I/O operations per second by roughly 1×–12× and reduces the write latency by 75%–98%.
This is a preview of subscription content, log in via an institution to check access.
Access this article
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
Instant access to the full article PDF.
Rent this article via DeepDyve
Similar content being viewed by others
Spindle: A Write-Optimized NVM Cache for Journaling File System
Chapter © 2019
Reducing Journaling Overhead with Hybrid Buffer Cache
Chapter © 2015
montage : NVM-based scalable synchronization framework for crash-consistent file systems
Article 30 June 2021
References
- Tweedie S. Ext3, journaling filesystem. In: Proceedings of Ottawa Linux Symposium, Ottawa, 2000. 24–29
- Mathur A, Cao M, Bhattacharya S, et al. The new EXT4 filesystem: current status and future plans. In: Proceedings of the Linux symposium, Ottawa, 2007. 21–33
- Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, 2006. 307–320
- Wei Q, Chen J, Chen C. Accelerating file system metadata access with byte-addressable nonvolatile memory. ACM Trans Storage, 2015, 11: 1–28 Google Scholar
- Sehgal P, Basu S, Srinivasan K, et al. An empirical study of file systems on NVM. In: Proceedings of the 31st International Conference on Mass Storage Systems and Technologies (MSST), Santa Clara, 2015. 1–14
- Chen C, Yang J, Wei Q, et al. Optimizing file systems with fine-grained metadata journaling on byte-addressable NVM. ACM Trans Storage, 2017, 13: 1–25 Google Scholar
- Lee D-Y, Jeong K, Han S-H, et al. Understanding write behaviors of storage backends in ceph object store. In: Proceedings of the 33rd IEEE International Conference on Massive Storage Systems and Technology (MSST), Santa Clara, 2017
- Roselli D S, Lorch J R, Anderson T E, et al. A comparison of file system workloads. In: Proceedings of 2000 USENIX Annual Technical Conference (ATC), San Diego, 2000. 41–54
- Leung A W, Pasupathy S, Goodson G R, et al. Measurement and analysis of large-scale network file system workloads. In: Proceedings of 2008 USENIX Annual Technical Conference (ATC), Boston, 2008. 2–5
- Dong M, Ota K, Yang L T, et al. LSCD: a low-storage clone detection protocol for cyber-physical systems. IEEE Trans Comput-Aided Des Integr Circ Syst, 2016, 35: 712–723 ArticleGoogle Scholar
- Li D, Dong M, Tang Y, et al. A novel disk I/O scheduling framework of virtualized storage system. Cluster Comput, 2019, 22: 2395–2405 ArticleGoogle Scholar
- Joo Y, Park S, Bahn H. Exploiting I/O reordering and I/O interleaving to improve application launch performance. ACM Trans Storage, 2017, 13: 1–17 ArticleGoogle Scholar
- Chahal D, Nambiar M. Cloning io intensive workloads using synthetic benchmark. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, L’AQUILA, 2017. 317–320
- Madireddy S, Balaprakash P, Carns P, et al. Analysis and correlation of application I/O performance and system-wide I/O activity. In: Proceedings of the 12th IEEE International Conference on Networking, Architecture, and Storage (NAS), Shenzhen, 2017. 1–10
- Li D, Dong M, Tang Y, et al. Triple-L: improving CPS disk I/O performance in a virtualized NAS environment. IEEE Syst J, 2015, 11: 152–162 ArticleGoogle Scholar
- Jannen W, Yuan J, Zhan Y, et al. BetrFS: a right-optimized write-optimized file system. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2015. 301–315
- Best S. Journaling file systems. Linux Magaz, 2002, 4: 24–31 Google Scholar
- Chen J, Tan Z, Wu F, et al. sJournal: a new design of journaling for file systems to provide crash consistency. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS), Tianjin, 2014. 53–62
- Lee W, Lee K, Son H, et al. WALDIO: eliminating the filesystem journaling in resolving the journaling of journal anomaly. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 235–247
- Dua R, Kohli V, Patil S, et al. Performance analysis of union and cow file systems with docker. In: Proceedings of 2016 International Conference on Computing, Analytics and Security Trends (CAST), India, 2016. 550–555
- Son M, Ahn J, Yoo S. Nonvolatile write buffer-based journaling bypass for storage write reduction in mobile devices. IEEE Trans Comput-Aided Design Integr Circ Syst, 2017, 37: 1747–1759 ArticleGoogle Scholar
- Huang K, Zhou J, Huang L, et al. NVHT: an efficient key-value storage library for non-volatile memory. J Parall Distrib Comput, 2018, 12: 339–354 ArticleGoogle Scholar
- Nightingale E B, Veeraraghavan K, Chen P M, et al. Rethink the sync. ACM Trans Comput Syst, 2018, 26: 6 Google Scholar
- Aghayev A, Ts’o T, Gibson G, et al. Evolving EXT4 for shingled disks. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2017. 105–120
- Rodeh O, Teperman A. ZFS-a scalable distributed file system using object disks. In: Proceedings of 2003 International Conference on Mass Storage Systems and Technologies (MSST), San Diego, 2003. 207–218
- Rodeh O, Bacik J, Mason C. Btrfs: the linux B-tree filesystem. ACM Trans Storage, 2013, 9: 1–32 ArticleGoogle Scholar
- Chen J, Wang J, Tan Z H, et al. Effects of recursive update in copy-on-write file systems: a BTRFS case study. Can J Electr Comput Eng, 2014, 37: 113–122 ArticleGoogle Scholar
- Choi H J, Lim S-H, Park K H. JFTL: a flash translation layer based on a journal remapping for flash memory. ACM Trans Storage, 2009, 4: 1–22 ArticleGoogle Scholar
- Lee E, Yoo S, Jang J-E, et al. Shortcut-JFS: a write efficient journaling file system for phase change memory. In: Proceedings of 2012 IEEE Conference on Mass Storage Systems and Technologies (MSST), Pacific Grove, 2012. 1–6
- Chen T-Y, Chang Y-H, Chen S-H. Enabling write-reduction strategy for journaling file systems over byte-addressable NVRAM. In: Proceedings of the 54th International Conference on Design Automation Conference (DAC), Austin, 2017. 1–6
- O’Neil P, Cheng E, Gawlick D, et al. The log-structured merge-tree (LSM-tree). Acta Inform, 1996, 33: 351–385 ArticleGoogle Scholar
- Shetty P J, Spillane R P, Malpani R R, et al. Building workload-independent storage with VT-trees. In: Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST), San Jose, 2013. 17–30
- Wu X, Xu Y, Shao Z, et al. LSM-trie: an LSM-tree-based ultra-large key-value store for small data items. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 71–82
- Lu L, Pillai T S, Gopalakrishnan H, et al. Wisckey: separating keys from values in SSD-conscious storage. ACM Trans Storage, 2017, 13: 5 ArticleGoogle Scholar
- Griffiths N. nmon performance: a free tool to analyze AIX and linux performance. 2003. https://sourceforge.net/projects/nmon/
- Son Y, Kim S, Yeom H Y, et al. High-performance transaction processing in journaling file systems. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, 2018. 227–240
- Rajimwale A, Prabhakaran V, Davis J D. Block management in solid-state devices. In: Proceedings of 2009 USENIX Annual Technical Conference (ATC), San Diego, 2009
- Tarasov V, Zadok E, Shepler S. Filebench: a flexible framework for file system benchmarking. The USENIX Magaz, 2016, 41: 6–12 Google Scholar
Acknowledgements
This work was supported in part by National Key R&D Program of China (Grant No. 2018YFB1004704), National Natural Science Foundation of China (Grant Nos. 61832005, 61872171), Natural Science Foundation of Jiangsu Province (Grant No. BK20190058), Key R&D Program of Jiangsu Province (Grant No. BE2017152), Science and Technology Program of State Grid Corporation of China (Grant No. 52110418001M), and Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
- National Key Laboratory for Novel Software Technology, Nanjing, 210023, China Lin Qian, Bin Tang, Baoliu Ye, Jianyu Wu, Xiaoliang Wang & Sanglu Lu
- State Grid Electric Power Research Institute, Nanjing, 211000, China Lin Qian
- Lin Qian