Stabilizing and boosting I/O performance for file systems with journaling on NVMe SSD

Many journaling file systems currently use non-volatile memory-express (NVMe) solid-state drives (SSDs) as external journal devices to improve the input and output (I/O) performance. However, when facing microwrite workloads, which are typical of many applications, they suffer from severe I/O fluctuations and the NVMe SSD utilization is extremely low. The experimental results indicate that this phenomenon arises mainly because writing back data to backend file systems on hard disk drives is much slower than journal writing, causing journal writing to frequently freeze because of the two-phase mechanism. We, therefore, propose a merging-in-memory (MIM) acceleration architecture to stabilize and boost the I/O performance for such journaling file systems. MIM employs an efficient data structure of hash-table-based multiple linked lists in memory, which not only merges random microwrites into sequential large blocks to speed up writebacks but also provides additional gains in terms of reducing the frequency of write addressing and object opening and closing. Using a prototype implementation in Ceph FileStore, we experimentally show that MIM not only eliminates severe fluctuations but also improves the I/O operations per second by roughly 1×–12× and reduces the write latency by 75%–98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Rent this article via DeepDyve

Similar content being viewed by others

Spindle: A Write-Optimized NVM Cache for Journaling File System

Chapter © 2019

Reducing Journaling Overhead with Hybrid Buffer Cache

Chapter © 2015

montage : NVM-based scalable synchronization framework for crash-consistent file systems

Article 30 June 2021

References

  1. Tweedie S. Ext3, journaling filesystem. In: Proceedings of Ottawa Linux Symposium, Ottawa, 2000. 24–29
  2. Mathur A, Cao M, Bhattacharya S, et al. The new EXT4 filesystem: current status and future plans. In: Proceedings of the Linux symposium, Ottawa, 2007. 21–33
  3. Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, 2006. 307–320
  4. Wei Q, Chen J, Chen C. Accelerating file system metadata access with byte-addressable nonvolatile memory. ACM Trans Storage, 2015, 11: 1–28 Google Scholar
  5. Sehgal P, Basu S, Srinivasan K, et al. An empirical study of file systems on NVM. In: Proceedings of the 31st International Conference on Mass Storage Systems and Technologies (MSST), Santa Clara, 2015. 1–14
  6. Chen C, Yang J, Wei Q, et al. Optimizing file systems with fine-grained metadata journaling on byte-addressable NVM. ACM Trans Storage, 2017, 13: 1–25 Google Scholar
  7. Lee D-Y, Jeong K, Han S-H, et al. Understanding write behaviors of storage backends in ceph object store. In: Proceedings of the 33rd IEEE International Conference on Massive Storage Systems and Technology (MSST), Santa Clara, 2017
  8. Roselli D S, Lorch J R, Anderson T E, et al. A comparison of file system workloads. In: Proceedings of 2000 USENIX Annual Technical Conference (ATC), San Diego, 2000. 41–54
  9. Leung A W, Pasupathy S, Goodson G R, et al. Measurement and analysis of large-scale network file system workloads. In: Proceedings of 2008 USENIX Annual Technical Conference (ATC), Boston, 2008. 2–5
  10. Dong M, Ota K, Yang L T, et al. LSCD: a low-storage clone detection protocol for cyber-physical systems. IEEE Trans Comput-Aided Des Integr Circ Syst, 2016, 35: 712–723 ArticleGoogle Scholar
  11. Li D, Dong M, Tang Y, et al. A novel disk I/O scheduling framework of virtualized storage system. Cluster Comput, 2019, 22: 2395–2405 ArticleGoogle Scholar
  12. Joo Y, Park S, Bahn H. Exploiting I/O reordering and I/O interleaving to improve application launch performance. ACM Trans Storage, 2017, 13: 1–17 ArticleGoogle Scholar
  13. Chahal D, Nambiar M. Cloning io intensive workloads using synthetic benchmark. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, L’AQUILA, 2017. 317–320
  14. Madireddy S, Balaprakash P, Carns P, et al. Analysis and correlation of application I/O performance and system-wide I/O activity. In: Proceedings of the 12th IEEE International Conference on Networking, Architecture, and Storage (NAS), Shenzhen, 2017. 1–10
  15. Li D, Dong M, Tang Y, et al. Triple-L: improving CPS disk I/O performance in a virtualized NAS environment. IEEE Syst J, 2015, 11: 152–162 ArticleGoogle Scholar
  16. Jannen W, Yuan J, Zhan Y, et al. BetrFS: a right-optimized write-optimized file system. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2015. 301–315
  17. Best S. Journaling file systems. Linux Magaz, 2002, 4: 24–31 Google Scholar
  18. Chen J, Tan Z, Wu F, et al. sJournal: a new design of journaling for file systems to provide crash consistency. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS), Tianjin, 2014. 53–62
  19. Lee W, Lee K, Son H, et al. WALDIO: eliminating the filesystem journaling in resolving the journaling of journal anomaly. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 235–247
  20. Dua R, Kohli V, Patil S, et al. Performance analysis of union and cow file systems with docker. In: Proceedings of 2016 International Conference on Computing, Analytics and Security Trends (CAST), India, 2016. 550–555
  21. Son M, Ahn J, Yoo S. Nonvolatile write buffer-based journaling bypass for storage write reduction in mobile devices. IEEE Trans Comput-Aided Design Integr Circ Syst, 2017, 37: 1747–1759 ArticleGoogle Scholar
  22. Huang K, Zhou J, Huang L, et al. NVHT: an efficient key-value storage library for non-volatile memory. J Parall Distrib Comput, 2018, 12: 339–354 ArticleGoogle Scholar
  23. Nightingale E B, Veeraraghavan K, Chen P M, et al. Rethink the sync. ACM Trans Comput Syst, 2018, 26: 6 Google Scholar
  24. Aghayev A, Ts’o T, Gibson G, et al. Evolving EXT4 for shingled disks. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2017. 105–120
  25. Rodeh O, Teperman A. ZFS-a scalable distributed file system using object disks. In: Proceedings of 2003 International Conference on Mass Storage Systems and Technologies (MSST), San Diego, 2003. 207–218
  26. Rodeh O, Bacik J, Mason C. Btrfs: the linux B-tree filesystem. ACM Trans Storage, 2013, 9: 1–32 ArticleGoogle Scholar
  27. Chen J, Wang J, Tan Z H, et al. Effects of recursive update in copy-on-write file systems: a BTRFS case study. Can J Electr Comput Eng, 2014, 37: 113–122 ArticleGoogle Scholar
  28. Choi H J, Lim S-H, Park K H. JFTL: a flash translation layer based on a journal remapping for flash memory. ACM Trans Storage, 2009, 4: 1–22 ArticleGoogle Scholar
  29. Lee E, Yoo S, Jang J-E, et al. Shortcut-JFS: a write efficient journaling file system for phase change memory. In: Proceedings of 2012 IEEE Conference on Mass Storage Systems and Technologies (MSST), Pacific Grove, 2012. 1–6
  30. Chen T-Y, Chang Y-H, Chen S-H. Enabling write-reduction strategy for journaling file systems over byte-addressable NVRAM. In: Proceedings of the 54th International Conference on Design Automation Conference (DAC), Austin, 2017. 1–6
  31. O’Neil P, Cheng E, Gawlick D, et al. The log-structured merge-tree (LSM-tree). Acta Inform, 1996, 33: 351–385 ArticleGoogle Scholar
  32. Shetty P J, Spillane R P, Malpani R R, et al. Building workload-independent storage with VT-trees. In: Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST), San Jose, 2013. 17–30
  33. Wu X, Xu Y, Shao Z, et al. LSM-trie: an LSM-tree-based ultra-large key-value store for small data items. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 71–82
  34. Lu L, Pillai T S, Gopalakrishnan H, et al. Wisckey: separating keys from values in SSD-conscious storage. ACM Trans Storage, 2017, 13: 5 ArticleGoogle Scholar
  35. Griffiths N. nmon performance: a free tool to analyze AIX and linux performance. 2003. https://sourceforge.net/projects/nmon/
  36. Son Y, Kim S, Yeom H Y, et al. High-performance transaction processing in journaling file systems. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, 2018. 227–240
  37. Rajimwale A, Prabhakaran V, Davis J D. Block management in solid-state devices. In: Proceedings of 2009 USENIX Annual Technical Conference (ATC), San Diego, 2009
  38. Tarasov V, Zadok E, Shepler S. Filebench: a flexible framework for file system benchmarking. The USENIX Magaz, 2016, 41: 6–12 Google Scholar

Acknowledgements

This work was supported in part by National Key R&D Program of China (Grant No. 2018YFB1004704), National Natural Science Foundation of China (Grant Nos. 61832005, 61872171), Natural Science Foundation of Jiangsu Province (Grant No. BK20190058), Key R&D Program of Jiangsu Province (Grant No. BE2017152), Science and Technology Program of State Grid Corporation of China (Grant No. 52110418001M), and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

  1. National Key Laboratory for Novel Software Technology, Nanjing, 210023, China Lin Qian, Bin Tang, Baoliu Ye, Jianyu Wu, Xiaoliang Wang & Sanglu Lu
  2. State Grid Electric Power Research Institute, Nanjing, 211000, China Lin Qian
  1. Lin Qian