Predictability in Memory Hierarchies for Real-Time Multi-Core Embedded Systems


Marco Solieri

Presentation title

Predictability in Memory Hierarchies for Real-Time Multi-Core Embedded Systems

Authors

Marco Solieri

Institution(s)

Università degli studi di Modena e Reggio Emilia, Italy

Presentation type

Technical presentation

Abstract

Recent high-performance embedded platforms are paving the way towards the real-time execution of demanding applications in the cyber-physical domain. The architectural design of autonomous vehicles, smart manufacturing systems and advanced embedded controllers requires: 1) coupling a variety of tasks, ranging from vision and machine learning, to control routines, onto a single computing platform; 2) high-performance computational power, featuring high parallelism. Next-generation embedded multi-core architectures (Nvidia, Xilinx, NXP) do satisfy both needs, but quite unfortunately they sacrifice predictability, which is the 0-th requirement for real-time applications. Indeed, recent architectural trends favoured design decisions aiming at improving average performances, whilst exposing to corner cases that may significantly increase worst-case response times. Notable examples are out-of-order execution, speculative branching or automatic prefetching, but the most prominent weaknesses are located in the memory hierarchy. Firstly, many components are shared — the number of cores and accelerators grows linearly with the degree of contention they generate on the memory controller and on the large Last Level Cache (LLC) subsystems. Such contention make linearly grow, in turn, also the entity of detriment to the worst-case performance of real-time activities. Secondly, last-level caches, which are getting larger and deeper (three levels systems have been recently announced in upcoming NVIDIA Xavier), but lack support for explicit locking, or partitioning. Also, they feature the pseudo-random replacement policy, which expose long prefetch sequences to self-eviction of useful lines, We thus identified three specific issues, and solved them at a software level by implementing real-time extensions to Jailhouse, a partitioned micro-hypervisor designed for safety-critical systems. The solution enjoys simplicity, reduced overhead, programmability, and compatibility to legacy systems. I Contention to cluster-shared last-level cache is prevented by the cache colouring support, which partitions LLC for host OSs or bare metal applications by leveraging ARMv8 virtual extensions to properly setup address their translation tables. II Self-eviction of useful lines in caches with pseudo-random replacement is avoided by preventive invalidation, which consists, before fetching a number of new data, in simply marking as invalid a corresponding number of cache location, which are then deterministically selected for eviction. III Contention to shared RAM is tamed by the Predictable Execution Model (PREM), which imposes to different hosted OSs or bare metal applications exclusive, thus high-speed and high-predictable, access to the central memory.


Additional material

  • Presentation slides: [pdf]

For more details on this presentation please click the button below: