Spotlight on Dr. Hong Jiang | Announce | University of Nebraska-Lincoln

Dr. Hong Jiang, a CSE professor and the Director of the Abacus Distributed Storage Lab (ADSL), recently published his article "Semantic-Aware Metadata Organization Paradigm in Next Generation File Systems" in Vol. 23 of IEEE Transactions on Parallel and Distributed Systems. His research includes computer systems architecture, cluster and grid computing, and IT support for distance education. He takes time to talk about his academic interests and teaching.

Bits & Bytes (BB): Your article "Semantic-Aware Metadata Organization Paradigm in Next Generation File Systems" was recently published. Can you provide a short abstract of this article?

Hong Jiang (HJ): It was published in IEEE Transactions on Parallel and Distributed Systems, February 2012, Vol. 23, No. 2., pp. 337-344, with Yu Hua (my former Postdoc, 2010-2011, currently an associate professor at Huazhong University of Science & Technology (HUST)), Yifeng Zhu (associate professor at the University of Maine, my research collaborator, and former Ph.D. student), Dan Feng (professor at HUST, my research collaborator, and former Postdoc), and Lei Tian (my current postdoc at UNL). Here is the abstract of the article:

Existing data storage systems based on the hierarchical directory-tree organization do not meet the scalability and functionality requirements for exponentially growing data sets and increasingly complex metadata queries in large-scale, Exabyte-level file systems with billions of files. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits semantics of files' metadata to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. The key idea of SmartStore is to limit the search scope of a complex metadata query to a single or a minimal number of semantically correlated groups and avoid or alleviate brute-force search in the entire system. The decentralized design of SmartStore can improve system scalability and reduce query latency for complex queries (including range and top-k queries). Moreover, it is also conducive to constructing semantic-aware caching, and conventional filename-based point query. We have implemented a prototype of SmartStore and extensive experiments based on real-world traces show that SmartStore significantly improves system scalability and reduces query latency over database approaches. To the best of our knowledge, this is the first study on the implementation of complex queries in large-scale file systems.

BB: What can students who take your Computer Architecture class expect? Do you know which courses you will be teaching next year?

HJ: Students who wish to take my Computer Architecture class can expect to learn the fundamentals and design principles behind today’s computer systems, from laptops to desktops, from smartphones to tablets, from server machines to data centers and cloud computing. The course takes a quantitative approach to the design of the processor, memory hierarch and I/O and their interactions and threads together the key computer architectural concepts in a hands-on semester-long design project in which students work on a team of 3-4 design an actual pipelined processor complete with caches and dynamic branch predictions.

In addition to the Computer Architecture class that recurs annually, I am scheduled to teach for following classes in the near future: Storage and File Systems, Advance Computer Architecture, High-Performance Processor Architectures, and Computer Engineering Senior Design Projects.

BB: Can you explain the research and current projects of the Abacus Distributed Storage Laboratory (ADSL)?

HJ: The name of the lab was coined by one of my former Ph.D. students in the early 2000s (2003-2004) when my research group embarked on a new direction of distributed storage systems. Abacus, an ancient Chinese invention used to carry out arithmetic operations and represent numbers, is arguably the oldest information storage and processing instrument in the world, and my student thought it was appropriate and fitting to prefix it to the name of the research lab. The lab focuses its research on storage and file systems to address the challenges of “Big Data”, a new tech phrase coined recently to refer to the huge volume, high velocity and great variety (i.e., the 3 Vs) with which digital data are being produced in the world. The lab currently has the following four NSF-funded research projects:

• SANE: Semantic-Aware Namespace in Exascale File Systems (NSF CNS-1116606)

This project investigates a novel semantic-aware namespace scheme to provide dynamic and adaptive management and support typical file-based operations in Exascale file systems. The project leverages semantic correlations among files and exploits the evolution of metadata attributes to support customized namespace management, with the end goal of efficiently facilitating file identification and end users data lookup. This project provides significant performance improvements for existing file systems. Since Exascale file systems constitute one of the backbones of the high-performance computing infrastructure, the semantic-aware techniques also benefits a great number of scientific and engineering data-intensive applications.

• Turbo Button: A Semantically-Smart SSD-Based RAID System for Internet-Scale Applications (NSF CNS-1016609)

This project seeks to develop a Semantically-Smart SSD (S4D) framework to explore and exploit the file system and application semantic information to boost the performance and improve the reliability of flash-memory SSDs. In particular, this project qualitatively and quantitatively identifies the critical issues for existing flash-memory SSDs, and conveys the file-system block liveness and correlation information to the underlying S4D with the standard or modified block interface. Secondly, S4D exploits the block liveness information to efficiently supplement the log-block pool with free blocks, and reduce the FTL block mapping table size. Finally, based on S4D, Turbo Button, an SSD-HDD-Hybrid RAID storage system, will be designed and constructed, in order to leverage the advantages of HDD judiciously to address the problems of straightforwardly applying RAID algorithms to SSDs.

• ProActive: A RAID Protection Activator for High Availability (NSF-IIS-0916859)

This project seeks to develop a holistic framework, called a RAID protection activator ("ProActive"), to address the fundamental and ever-increasing availability challenge facing RAID-structured storage systems. ProActive exploits application workload intensity and data/parity management and intelligently leverages rich available spare storage resources in large-scale data centers to address the efficiency problem of the existing state-of-the-art availability mechanisms for RAID. ProActive will develop solutions to handle the increasingly more frequent partial and complete disk failures in RAID-structured storage systems based on the design goals of significantly supplementing and improving existing fault-detection, fault-tolerance, and fault-recovery mechanisms.

• A New Semantic-Aware Metadata Organization for Improved File-System Performance and Functionality in High-End Computing (NSF-CCF-0937993)

The research has four major components: 1) exploit metadata semantic-correlation to organize metadata in a scalable way, 2) exploit the semantic and scalable nature of the new metadata organization to significantly speed up complex queries and improve file system functionality, 3) fully leverage the semantic-awareness of the new metadata organization to optimize storage system designs, such as caching, prefetching, and data de-duplication, and 4) implement the new metadata organization, complex query functions, and system design optimizations in large-scale storage systems.

BB: Are there any other research projects that you are currently working on?

HJ: Yes. I am currently collaborating with my CSE colleagues Professors Sharad Seth and Witty Srisa-An on a couple of projects. In the first, with Sharad, we have been working on developing optimized resource management for the last level caches (LLCs) in multicore and manycore processors that dominate today’s computer systems. Our co-supervised Ph.D. student Dongyuan Zhan is the main driving force behind this project that has noticeably advanced the state of the art and has produced three top-tier conference publications (IPDPS’10, MICRO’10, and ICS’12) with more expected. The second, with both Witty and Sharad, along with our co-supervised Ph.D. students, aims to optimize performance of multithreaded applications in manycore-based systems by taking a cross-layer and semantic-aware approach among architecture, OS, runtime systems and programming language layers of abstraction in the modern-day computer systems. While this project only started recently, we are very excited by its promising prospect and expect it to produce results that will have very high impact.

BB: You have an impressive list of professional activities posted on your faculty page, which demonstrates a true passion for what you do. What initially sparked your interest in computer science and what continues to motivate you to continue your research?

HJ: I am glad you asked this question, particularly the first part (i.e., the initial sparks part)! So first thing first. The really honest answer to that question is a combination of coincidence and pure luck! To elaborate, I must go back some 34 years in China when I applied for college. It is necessary to explain that it was the first year after the Cultural Revolution (CR) in China when colleges were reopened for regular folks (yes, all colleges were closed for regular folks during the ten years of the CR!). Each applicant must take a national college entrance exam (similar to the ACT), of which the results were used as the ONLY criteria for admission. Also, we filled out our application forms, along with a choice of majors, after the exam results were announced. Since I scored the highest in math among all my exam subjects and I liked to install radios as a hobby as a high schooler, I decided to choose a major that would combine my math strength and my interest in electronics. As it turned out, the word “Computer” in Chinese was literately “Electronic Calculating Machine” back then! Bingo! Computer Science was my choice of major in college!

Another point I must make is that in Chinese colleges, one rarely, if ever, changes his/her major once enrolled. As a result, I actually chose my major half blindly and stuck to it, and, as pure luck may have it, it turned out to be a perfect choice for me! The more I studied computer science and engineering, the more I was attracted to it by the sheer elegance of its algorithms and architecture designs and the gratification of seeing homework and projects actually taking effect in front of your own eyes. As I became a professor, I continued to get a sense of gratification and accomplishment by being able to impart knowledge to students and seeing them graduate and go on to become successful engineers, researchers, and professors and make an impact on society, which in turn motivates me to continue to teach, supervise students, conduct research, and involve myself in cutting-edge research.

Further information on Jiang's research can be found at http://cse.unl.edu/~jiang/.

Bits & Bytes Wed. March 28, 2012