Chip Multiprocessors

Overview

Our investigation into CMPs involves (i) cache coherence directory storage efficiency improvement for better CMP scalability and (ii) bufferless on-chip network (NoC) design and optimization. Two mechanisms for cache coherence directory storage efficiency improvement have been developed and thoroughly evaluated via a cycle-accurate simulation environment, gem5. The first mechanism is featured with novel relinquishment coherence and superior directory efficiency to lower area overhead for cache coherent CMPs, with relinquishment coherence boosting the utilization of a hash-based, set-associative table which holds distinct present-bit vectors (PVs) and superior directory efficiency resulted from table width shrunk via dropping “runs of zeros” commonly found in PVs. The second mechanism considers a framework for drastically lowering on-chip directory storage of CMPs in this project year. Called non-uniform directory architecture (NUDA), our framework adopts a small on-chip directory vector (DV) buffer while provisioning a full-fledged backing store in off-chip memory, with each store entry designated statically for one DV (and DV only, without any additional bit) of a corresponding LLC (last level cache) block. NUDA intelligently promotes vector transfer between on-chip and off-chip storage to sustain execution performance.

Our bufferless NoC design and optimization deals with (1) packet deflection containment in bufferless NoCs for performance improvement and energy savings, (2) fairness and performance under multiprogrammed workloads for NoCs through Fairness-Aware Source Throttling (FAST), and (3) optimization for bufferless NoCs multicast (i.e., one-to-many) and hotspot (i.e., many-to-one) traffic, aiming at high performance, low design complexity, and high energy efficiency.