Dongyuan Zhan will present his PhD dissertation defense of "Spatiotemporal Capacity Management For The Last Level Caches Of Chip Multiprocessors" on Tuesday, August 28 at 4 p.m. in 347 Avery Hall.
Abstract:
Judicious management of on-chip last-level caches (LLC) is becoming increasingly important to alleviating the memory wall problem of chip multiprocessors (CMP). Although there already exist many capacity management proposals, which are categorized into either the spatial or the temporal dimension, they fail to capture and utilize the inherent interplays between the two dimensions in capacity management. Therefore, this dissertation is targeted at exploring and exploiting the spatiotemporal interactions in LLC capacity management to improve CMPs’ performance. Based on this general idea, we address four specific research problems in the dissertation.
First, for the private LLC organization, the prior-art dynamic spill-receive (DSR) paradigm can improve the efficacy of inter-core cooperative caching at the coarse-grained application level. However, DSR is still suboptimal in that it is unable to take advantage of the diverse capacity demands at the fine-grained set level. We introduce the SNUG LLC design that exploits the set-level non-uniformity of capacity demands in inter-core cooperative caching and thus further improves over DSR.
Second, still for the private LLC management, we find that neither spatial nor temporal LLC management schemes, working separately as in prior work, can deliver robust performance under various circumstances due to set-level non-uniform capacity demands. We propose a novel adaptive scheme, called STEM, to solve the problem by concurrently and dynamically managing both spatial and temporal dimensions of capacity demands at the set level.
Third, for the shared LLC organization, existing proposals are targeted at overcoming the weaknesses of the least recently used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads. But we find that none of them can deliver consistently the best performance under a variety of workloads due to applications’ diverse locality and utility features. To address the problem, we present the CLU LLC design to co-optimize the locality and utility of co-scheduled threads, which enables CLU to adapt to more diverse workloads than the state-of-the-arts.
Further, to make a cache management strategy practical for industry, we will need to cut the per-cacheline overhead for the re-reference prediction value (RRPV). We observe that delicately-tweaked replacement policies rooted in single-bit RRPVs can closely approximate the performance of their correspondents with log{associativity}-bit RRPVs. Therefore, we propose a novel practical shared LLC design, called COOP, which needs just one bit per cacheline for the RRPV and a per-core lightweight monitor to conduct locality & utility co-optimization. At a considerably low storage cost, COOP achieves higher performance than the two recent practical solutions that rely on 2-bit RRPVs but are oriented towards locality optimization only.