From b033e3960bf23856053aae96141fd96d627eec8e Mon Sep 17 00:00:00 2001 From: Sebastian Huber Date: Thu, 2 Feb 2017 10:46:05 +0100 Subject: c-user: Add SMP application issues section --- c-user/glossary.rst | 19 +- c-user/symmetric_multiprocessing_services.rst | 341 +++++++++++++++----------- 2 files changed, 209 insertions(+), 151 deletions(-) (limited to 'c-user') diff --git a/c-user/glossary.rst b/c-user/glossary.rst index 3cb0ed9..44f8f88 100644 --- a/c-user/glossary.rst +++ b/c-user/glossary.rst @@ -25,7 +25,7 @@ Glossary manager are used to service signals. :dfn:`atomic operations` - Atomic operations are defined in terms of *ISO/IEC 9899:2011*. + Atomic operations are defined in terms of :ref:`C11 `. :dfn:`awakened` A term used to describe a task that has been unblocked and may be scheduled @@ -61,6 +61,16 @@ Glossary :dfn:`buffer` A fixed length block of memory allocated from a partition. +.. _C11: + +:dfn:`C11` + The standard ISO/IEC 9899:2011. + +.. _C++11: + +:dfn:`C++11` + The standard ISO/IEC 14882:2011. + :dfn:`calling convention` The processor and compiler dependent rules which define the mechanism used to invoke subroutines in a high-level language. These rules define the @@ -701,6 +711,13 @@ Glossary :dfn:`timeslice` The application defined unit of time in which the processor is allocated. +.. _TLS: + +:dfn:`TLS` + An acronym for Thread-Local Storage :cite:`Drepper:2013:TLS`. TLS is + available in :ref:`C11 ` and :ref:`C++11 `. The support for + TLS depends on the CPU port :cite:`RTEMS:CPU`. + :dfn:`TMCB` An acronym for Timer Control Block. diff --git a/c-user/symmetric_multiprocessing_services.rst b/c-user/symmetric_multiprocessing_services.rst index ac6d921..66516a8 100644 --- a/c-user/symmetric_multiprocessing_services.rst +++ b/c-user/symmetric_multiprocessing_services.rst @@ -271,156 +271,6 @@ tree of the task needing help and other resource trees in case tasks in need for help are produced during this operation. Thus the worst-case latency in the system depends on the maximum resource tree size of the application. -Critical Section Techniques and SMP ------------------------------------ - -As discussed earlier, SMP systems have opportunities for true parallelism which -was not possible on uniprocessor systems. Consequently, multiple techniques -that provided adequate critical sections on uniprocessor systems are unsafe on -SMP systems. In this section, some of these unsafe techniques will be -discussed. - -In general, applications must use proper operating system provided mutual -exclusion mechanisms to ensure correct behavior. This primarily means the use -of binary semaphores or mutexes to implement critical sections. - -Disable Interrupts and Interrupt Locks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A low overhead means to ensure mutual exclusion in uni-processor configurations -is to disable interrupts around a critical section. This is commonly used in -device driver code and throughout the operating system core. In SMP -configurations, however, disabling the interrupts on one processor has no -effect on other processors. So, this is insufficient to ensure system wide -mutual exclusion. The macros - -- ``rtems_interrupt_disable()``, - -- ``rtems_interrupt_enable()``, and - -- ``rtems_interrupt_flush()`` - -are disabled in SMP configurations and its use will lead to compiler warnings -and linker errors. In the unlikely case that interrupts must be disabled on -the current processor, then the - -- ``rtems_interrupt_local_disable()``, and - -- ``rtems_interrupt_local_enable()`` - -macros are now available in all configurations. - -Since disabling of interrupts is not enough to ensure system wide mutual -exclusion on SMP, a new low-level synchronization primitive was added - the -interrupt locks. They are a simple API layer on top of the SMP locks used for -low-level synchronization in the operating system core. Currently they are -implemented as a ticket lock. On uni-processor configurations they degenerate -to simple interrupt disable/enable sequences. It is disallowed to acquire a -single interrupt lock in a nested way. This will result in an infinite loop -with interrupts disabled. While converting legacy code to interrupt locks care -must be taken to avoid this situation. - -.. code-block:: c - :linenos: - - void legacy_code_with_interrupt_disable_enable( void ) - { - rtems_interrupt_level level; - rtems_interrupt_disable( level ); - /* Some critical stuff */ - rtems_interrupt_enable( level ); - } - - RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" ); - - void smp_ready_code_with_interrupt_lock( void ) - { - rtems_interrupt_lock_context lock_context; - rtems_interrupt_lock_acquire( &lock, &lock_context ); - /* Some critical stuff */ - rtems_interrupt_lock_release( &lock, &lock_context ); - } - -The ``rtems_interrupt_lock`` structure is empty on uni-processor -configurations. Empty structures have a different size in C -(implementation-defined, zero in case of GCC) and C++ (implementation-defined -non-zero value, one in case of GCC). Thus the -``RTEMS_INTERRUPT_LOCK_DECLARE()``, ``RTEMS_INTERRUPT_LOCK_DEFINE()``, -``RTEMS_INTERRUPT_LOCK_MEMBER()``, and ``RTEMS_INTERRUPT_LOCK_REFERENCE()`` -macros are provided to ensure ABI compatibility. - -Highest Priority Task Assumption -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -On a uniprocessor system, it is safe to assume that when the highest priority -task in an application executes, it will execute without being preempted until -it voluntarily blocks. Interrupts may occur while it is executing, but there -will be no context switch to another task unless the highest priority task -voluntarily initiates it. - -Given the assumption that no other tasks will have their execution interleaved -with the highest priority task, it is possible for this task to be constructed -such that it does not need to acquire a binary semaphore or mutex for protected -access to shared data. - -In an SMP system, it cannot be assumed there will never be a single task -executing. It should be assumed that every processor is executing another -application task. Further, those tasks will be ones which would not have been -executed in a uniprocessor configuration and should be assumed to have data -synchronization conflicts with what was formerly the highest priority task -which executed without conflict. - -Disable Preemption -~~~~~~~~~~~~~~~~~~ - -On a uniprocessor system, disabling preemption in a task is very similar to -making the highest priority task assumption. While preemption is disabled, no -task context switches will occur unless the task initiates them -voluntarily. And, just as with the highest priority task assumption, there are -N-1 processors also running tasks. Thus the assumption that no other tasks will -run while the task has preemption disabled is violated. - -Task Unique Data and SMP ------------------------- - -Per task variables are a service commonly provided by real-time operating -systems for application use. They work by allowing the application to specify a -location in memory (typically a ``void *``) which is logically added to the -context of a task. On each task switch, the location in memory is stored and -each task can have a unique value in the same memory location. This memory -location is directly accessed as a variable in a program. - -This works well in a uniprocessor environment because there is one task -executing and one memory location containing a task-specific value. But it is -fundamentally broken on an SMP system because there are always N tasks -executing. With only one location in memory, N-1 tasks will not have the -correct value. - -This paradigm for providing task unique data values is fundamentally broken on -SMP systems. - -Classic API Per Task Variables -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The Classic API provides three directives to support per task variables. These are: - -- ``rtems_task_variable_add`` - Associate per task variable - -- ``rtems_task_variable_get`` - Obtain value of a a per task variable - -- ``rtems_task_variable_delete`` - Remove per task variable - -As task variables are unsafe for use on SMP systems, the use of these services -must be eliminated in all software that is to be used in an SMP environment. -The task variables API is disabled on SMP. Its use will lead to compile-time -and link-time errors. It is recommended that the application developer consider -the use of POSIX Keys or Thread Local Storage (TLS). POSIX Keys are available -in all RTEMS configurations. For the availablity of TLS on a particular -architecture please consult the *RTEMS CPU Architecture Supplement*. - -The only remaining user of task variables in the RTEMS code base is the Ada -support. So basically Ada is not available on RTEMS SMP. - OpenMP ------ @@ -521,6 +371,197 @@ the heir thread must be used during interrupt processing. For this purpose a temporary per-processor stack is set up which may be used by the interrupt prologue before the stack is switched to the interrupt stack. +Application Issues +================== + +Most operating system services provided by the uni-processor RTEMS are +available in SMP configurations as well. However, applications designed for an +uni-processor environment may need some changes to correctly run in an SMP +configuration. + +As discussed earlier, SMP systems have opportunities for true parallelism which +was not possible on uni-processor systems. Consequently, multiple techniques +that provided adequate critical sections on uni-processor systems are unsafe on +SMP systems. In this section, some of these unsafe techniques will be +discussed. + +In general, applications must use proper operating system provided mutual +exclusion mechanisms to ensure correct behavior. + +Task variables +-------------- + +Task variables are ordinary global variables with a dedicated value for each +thread. During a context switch from the executing thread to the heir thread, +the value of each task variable is saved to the thread control block of the +executing thread and restored from the thread control block of the heir thread. +This is inherently broken if more than one executing thread exists. +Alternatives to task variables are POSIX keys and :ref:`TLS `. All use +cases of task variables in the RTEMS code base were replaced with alternatives. +The task variable API has been removed in RTEMS 4.12. + +Highest Priority Thread Never Walks Alone +----------------------------------------- + +On a uni-processor system, it is safe to assume that when the highest priority +task in an application executes, it will execute without being preempted until +it voluntarily blocks. Interrupts may occur while it is executing, but there +will be no context switch to another task unless the highest priority task +voluntarily initiates it. + +Given the assumption that no other tasks will have their execution interleaved +with the highest priority task, it is possible for this task to be constructed +such that it does not need to acquire a mutex for protected access to shared +data. + +In an SMP system, it cannot be assumed there will never be a single task +executing. It should be assumed that every processor is executing another +application task. Further, those tasks will be ones which would not have been +executed in a uni-processor configuration and should be assumed to have data +synchronization conflicts with what was formerly the highest priority task +which executed without conflict. + +Disabling of Thread Pre-Emption +------------------------------- + +A thread which disables pre-emption prevents that a higher priority thread gets +hold of its processor involuntarily. In uni-processor configurations, this can +be used to ensure mutual exclusion at thread level. In SMP configurations, +however, more than one executing thread may exist. Thus, it is impossible to +ensure mutual exclusion using this mechanism. In order to prevent that +applications using pre-emption for this purpose, would show inappropriate +behaviour, this feature is disabled in SMP configurations and its use would +case run-time errors. + +Disabling of Interrupts +----------------------- + +A low overhead means that ensures mutual exclusion in uni-processor +configurations is the disabling of interrupts around a critical section. This +is commonly used in device driver code. In SMP configurations, however, +disabling the interrupts on one processor has no effect on other processors. +So, this is insufficient to ensure system-wide mutual exclusion. The macros + +* :ref:`rtems_interrupt_disable() `, + +* :ref:`rtems_interrupt_enable() `, and + +* :ref:`rtems_interrupt_flash() `. + +are disabled in SMP configurations and its use will cause compile-time warnings +and link-time errors. In the unlikely case that interrupts must be disabled on +the current processor, the + +* :ref:`rtems_interrupt_local_disable() `, and + +* :ref:`rtems_interrupt_local_enable() `. + +macros are now available in all configurations. + +Since disabling of interrupts is insufficient to ensure system-wide mutual +exclusion on SMP a new low-level synchronization primitive was added -- +interrupt locks. The interrupt locks are a simple API layer on top of the SMP +locks used for low-level synchronization in the operating system core. +Currently, they are implemented as a ticket lock. In uni-processor +configurations, they degenerate to simple interrupt disable/enable sequences by +means of the C pre-processor. It is disallowed to acquire a single interrupt +lock in a nested way. This will result in an infinite loop with interrupts +disabled. While converting legacy code to interrupt locks, care must be taken +to avoid this situation to happen. + +.. code-block:: c + :linenos: + + #include + + void legacy_code_with_interrupt_disable_enable( void ) + { + rtems_interrupt_level level; + + rtems_interrupt_disable( level ); + /* Critical section */ + rtems_interrupt_enable( level ); + } + + RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" ) + + void smp_ready_code_with_interrupt_lock( void ) + { + rtems_interrupt_lock_context lock_context; + + rtems_interrupt_lock_acquire( &lock, &lock_context ); + /* Critical section */ + rtems_interrupt_lock_release( &lock, &lock_context ); + } + +An alternative to the RTEMS-specific interrupt locks are POSIX spinlocks. The +:c:type:`pthread_spinlock_t` is defined as a self-contained object, e.g. the +user must provide the storage for this synchronization object. + +.. code-block:: c + :linenos: + + #include + #include + + pthread_spinlock_t lock; + + void smp_ready_code_with_posix_spinlock( void ) + { + int error; + + error = pthread_spin_lock( &lock ); + assert( error == 0 ); + /* Critical section */ + error = pthread_spin_unlock( &lock ); + assert( error == 0 ); + } + +In contrast to POSIX spinlock implementation on Linux or FreeBSD, it is not +allowed to call blocking operating system services inside the critical section. +A recursive lock attempt is a severe usage error resulting in an infinite loop +with interrupts disabled. Nesting of different locks is allowed. The user +must ensure that no deadlock can occur. As a non-portable feature the locks +are zero-initialized, e.g. statically initialized global locks reside in the +``.bss`` section and there is no need to call :c:func:`pthread_spin_init`. + +Interrupt Service Routines Execute in Parallel With Threads +----------------------------------------------------------- + +On a machine with more than one processor, interrupt service routines (this +includes timer service routines installed via :ref:`rtems_timer_fire_after() +`) and threads can execute in parallel. Interrupt +service routines must take this into account and use proper locking mechanisms +to protect critical sections from interference by threads (interrupt locks or +POSIX spinlocks). This likely requires code modifications in legacy device +drivers. + +Timers Do Not Stop Immediately +------------------------------ + +Timer service routines run in the context of the clock interrupt. On +uni-processor configurations, it is sufficient to disable interrupts and remove +a timer from the set of active timers to stop it. In SMP configurations, +however, the timer service routine may already run and wait on an SMP lock +owned by the thread which is about to stop the timer. This opens the door to +subtle synchronization issues. During destruction of objects, special care +must be taken to ensure that timer service routines cannot access (partly or +fully) destroyed objects. + +False Sharing of Cache Lines Due to Objects Table +------------------------------------------------- + +The Classic API and most POSIX API objects are indirectly accessed via an +object identifier. The user-level functions validate the object identifier and +map it to the actual object structure which resides in a global objects table +for each object class. So, unrelated objects are packed together in a table. +This may result in false sharing of cache lines. The effect of false sharing +of cache lines can be observed with the `TMFINE 1 +`_ test program +on a suitable platform, e.g. QorIQ T4240. High-performance SMP applications +need full control of the object storage :cite:`Drepper:2007:Memory`. +Therefore, self-contained synchronization objects are now available for RTEMS. + Directives ========== -- cgit v1.2.3