The same topics that were treated for single processors in our elementary
system architecture lectures are treated here for multicore processors,
namely processors, compilers and operating system kernels. The following
complications arise and are addressed in the lectures.
1) The manufacturers documentation of the instruction sets of high end
processors has thousands of pages, and any document with so many page is
almost necesserily contradictory and incomplete. We have to boil this down
to a manageable size. Where the manuals do not provide all necessary details
we have to make educated guesses by reverse engineering parts of the
processors hardware. This concerns in particular the memory system.
2) In the full instruction set many parts of the machines hardware are
visible to the programmer that one prefers to be hidden. This concerns in
particular: caches, the states of the cache coherence protocols used, store
buffers and translation look aside buffers (TLB's) of memory units (MMUs).
Moreover in general a multicore processor does NOT interleave instructions
of the sequential instruction set architecture (ISA); if it is not properly
configuerd and used it interleaves a sort of microinstructions. We will show
how to configure a multicore processor such that it simultaneously
- runs in translated mode
- interleaves instructions of the ISA workig on a sequentially consistent
shared memory.
3) We have to specify the semantics of parallel C. In general compiled C
code running on a multicore machine does NOT interleave small steps
semantics steps of sequential C threads. We formalize the subset of parallel
C known as 'structured parallel C' and show how to compile it such that it
runs on the machine model exhibited in 2). In particular we have to consider
here volatile variables and their treatment by optimizing compilers.
4) In order to give semantics to assembler portions of C programs we have to
specify the behavior of optimizing compilers for multicore machines at least
as far as allocation functions are concerned. We give specifications which
guarantee that when a C variable X is accessed by assembler code, then its
current value (a non trivial concept) is stored at &X (with an optimizing
compiler this is in general not the case).
5) We specify hypervisors for multicore machine. Hypervisors are operating
system (OS) kernels whose user processes are so called partitions. This are
simply virtual multicore processors which are allowed to run in system mode,
i.e. in translated mode; thus each partition of a hypervisor can run its own
OS. The specification of hypervisors is very CVM like (CVM is the generic OS
kernel from the elementary architecture lectures).
6) Hypervisors have to compose the translation of the host machine with the
translation of the guest partitions. On many processors there is
'virtualization support' for this in the form of 'nested page tables'; this
is hardware supporting the composition of two translations. If we have such
hardware and if the guest partitions do not run hypervisors but only
operating systems (i.e. they have only a single level of translation), then
hypervisor construction is very similar to the elementary case.
7) One can also construct hypervisors on processors without nested page
tables; after all the composition of two translations is a translation. This
requires to implement page tables for the composed translation in the form
of a C data structure called 'shadow page tables' and to redirect the MMUs
to walk these data structures (which simultaneously are updated by other
cores). This leads to two very exciting subjects:
7.1. the design and correctness proof of a parallel page table algorithm and
7.2. the semantics of the parallel programming model in which this algorithm
is programmed. In this model we have
- C portions
- assembly portions
- user visible translations (hypervisors have to inspect guest page tables;
this happens to be implemented my memory relocation).
- MMU travesing (and even writing) the shadow page tables ( a C data
structure) in parealle with the C threads.
We plan to produce lecture notes. They will be ready between 1 and 2 weeks
after the lectures.
Prerequisites: ideally computer architecture and system architecture as read by the author. If you have heard none of these lectures prepare to do some serious additional reading; we will point you to the appropriate texts.
|