From 785c02f7f8fc61e2137679438bfe7abe090ae519 Mon Sep 17 00:00:00 2001 From: Sebastian Huber Date: Thu, 2 Feb 2017 14:07:53 +0100 Subject: c-user: Add SMP low-level synchronization --- c-user/glossary.rst | 12 +++ c-user/symmetric_multiprocessing_services.rst | 122 +++++++++++++++++--------- images/c_user/smplock01fair-t4240.pdf | Bin 0 -> 33624 bytes images/c_user/smplock01fair-t4240.png | Bin 0 -> 39680 bytes images/c_user/smplock01perf-t4240.pdf | Bin 0 -> 34148 bytes images/c_user/smplock01perf-t4240.png | Bin 0 -> 35093 bytes 6 files changed, 93 insertions(+), 41 deletions(-) create mode 100644 images/c_user/smplock01fair-t4240.pdf create mode 100644 images/c_user/smplock01fair-t4240.png create mode 100644 images/c_user/smplock01perf-t4240.pdf create mode 100644 images/c_user/smplock01perf-t4240.png diff --git a/c-user/glossary.rst b/c-user/glossary.rst index f3e784e..aab9601 100644 --- a/c-user/glossary.rst +++ b/c-user/glossary.rst @@ -14,6 +14,9 @@ Glossary A task which must execute only at irregular intervals and has only a soft deadline. + API + An acronym for Application Programming Interface. + application In this document, software which makes use of RTEMS. @@ -314,6 +317,9 @@ Glossary A group of related RTEMS' directives which provide access and control over resources. + MCS + An acronym for Mellor-Crummey Scott. + memory pool Used interchangeably with heap. @@ -379,6 +385,9 @@ Glossary non-existent The state occupied by an uncreated or deleted task. + NUMA + An acronym for Non-Uniform Memory Access. + numeric coprocessor A component used in computer systems to enhance performance in mathematically intensive situations. It is typically viewed as a logical @@ -614,6 +623,9 @@ Glossary SMCB An acronym for Semaphore Control Block. + SMP + An acronym for Symmetric Multiprocessing. + SMP locks The SMP locks ensure mutual exclusion on the lowest level and are a replacement for the sections of disabled interrupts. Interrupts are diff --git a/c-user/symmetric_multiprocessing_services.rst b/c-user/symmetric_multiprocessing_services.rst index 6d39944..4baf244 100644 --- a/c-user/symmetric_multiprocessing_services.rst +++ b/c-user/symmetric_multiprocessing_services.rst @@ -524,47 +524,6 @@ on a suitable platform, e.g. QorIQ T4240. High-performance SMP applications need full control of the object storage :cite:`Drepper:2007:Memory`. Therefore, self-contained synchronization objects are now available for RTEMS. -Implementation Details -====================== - -Thread Dispatch Details ------------------------ - -This section gives background information to developers interested in the -interrupt latencies introduced by thread dispatching. A thread dispatch -consists of all work which must be done to stop the currently executing thread -on a processor and hand over this processor to an heir thread. - -In SMP systems, scheduling decisions on one processor must be propagated -to other processors through inter-processor interrupts. A thread dispatch -which must be carried out on another processor does not happen instantaneously. -Thus, several thread dispatch requests might be in the air and it is possible -that some of them may be out of date before the corresponding processor has -time to deal with them. The thread dispatch mechanism uses three per-processor -variables, - -- the executing thread, - -- the heir thread, and - -- a boolean flag indicating if a thread dispatch is necessary or not. - -Updates of the heir thread are done via a normal store operation. The thread -dispatch necessary indicator of another processor is set as a side-effect of an -inter-processor interrupt. So, this change notification works without the use -of locks. The thread context is protected by a TTAS lock embedded in the -context to ensure that it is used on at most one processor at a time. -Normally, only thread-specific or per-processor locks are used during a thread -dispatch. This implementation turned out to be quite efficient and no lock -contention was observed in the testsuite. The heavy-weight thread dispatch -sequence is only entered in case the thread dispatch indicator is set. - -The context-switch is performed with interrupts enabled. During the transition -from the executing to the heir thread neither the stack of the executing nor -the heir thread must be used during interrupt processing. For this purpose a -temporary per-processor stack is set up which may be used by the interrupt -prologue before the stack is switched to the interrupt stack. - Directives ========== @@ -633,3 +592,84 @@ DESCRIPTION: NOTES: None. + +Implementation Details +====================== + +This section covers some implementation details of the RTEMS SMP support. + +Low-Level Synchronization +------------------------- + +All low-level synchronization primitives are implemented using :term:`C11` +atomic operations, so no target-specific hand-written assembler code is +necessary. Four synchronization primitives are currently available + +* ticket locks (mutual exclusion), + +* :term:`MCS` locks (mutual exclusion), + +* barriers, implemented as a sense barrier, and + +* sequence locks :cite:`Boehm:2012:Seqlock`. + +A vital requirement for low-level mutual exclusion is :term:`FIFO` fairness +since we are interested in a predictable system and not maximum throughput. +With this requirement, there are only few options to resolve this problem. For +reasons of simplicity, the ticket lock algorithm was chosen to implement the +SMP locks. However, the API is capable to support MCS locks, which may be +interesting in the future for systems with a processor count in the range of 32 +or more, e.g. :term:`NUMA`, many-core systems. + +The test program `SMPLOCK 1 +`_ can be used +to gather performance and fairness data for several scenarios. The SMP lock +performance and fairness measured on the QorIQ T4240 follows as an example. +This chip contains three L2 caches. Each L2 cache is shared by eight +processors. + +.. image:: ../images/c_user/smplock01perf-t4240.* + :width: 400 + :align: center + +.. image:: ../images/c_user/smplock01fair-t4240.* + :width: 400 + :align: center + +Thread Dispatch Details +----------------------- + +This section gives background information to developers interested in the +interrupt latencies introduced by thread dispatching. A thread dispatch +consists of all work which must be done to stop the currently executing thread +on a processor and hand over this processor to an heir thread. + +In SMP systems, scheduling decisions on one processor must be propagated +to other processors through inter-processor interrupts. A thread dispatch +which must be carried out on another processor does not happen instantaneously. +Thus, several thread dispatch requests might be in the air and it is possible +that some of them may be out of date before the corresponding processor has +time to deal with them. The thread dispatch mechanism uses three per-processor +variables, + +- the executing thread, + +- the heir thread, and + +- a boolean flag indicating if a thread dispatch is necessary or not. + +Updates of the heir thread are done via a normal store operation. The thread +dispatch necessary indicator of another processor is set as a side-effect of an +inter-processor interrupt. So, this change notification works without the use +of locks. The thread context is protected by a TTAS lock embedded in the +context to ensure that it is used on at most one processor at a time. +Normally, only thread-specific or per-processor locks are used during a thread +dispatch. This implementation turned out to be quite efficient and no lock +contention was observed in the testsuite. The heavy-weight thread dispatch +sequence is only entered in case the thread dispatch indicator is set. + +The context-switch is performed with interrupts enabled. During the transition +from the executing to the heir thread neither the stack of the executing nor +the heir thread must be used during interrupt processing. For this purpose a +temporary per-processor stack is set up which may be used by the interrupt +prologue before the stack is switched to the interrupt stack. diff --git a/images/c_user/smplock01fair-t4240.pdf b/images/c_user/smplock01fair-t4240.pdf new file mode 100644 index 0000000..f7d1b2e Binary files /dev/null and b/images/c_user/smplock01fair-t4240.pdf differ diff --git a/images/c_user/smplock01fair-t4240.png b/images/c_user/smplock01fair-t4240.png new file mode 100644 index 0000000..ce36e84 Binary files /dev/null and b/images/c_user/smplock01fair-t4240.png differ diff --git a/images/c_user/smplock01perf-t4240.pdf b/images/c_user/smplock01perf-t4240.pdf new file mode 100644 index 0000000..b3eee7a Binary files /dev/null and b/images/c_user/smplock01perf-t4240.pdf differ diff --git a/images/c_user/smplock01perf-t4240.png b/images/c_user/smplock01perf-t4240.png new file mode 100644 index 0000000..219eba7 Binary files /dev/null and b/images/c_user/smplock01perf-t4240.png differ -- cgit v1.2.3