DSP OS and resource management

QuRT

Following is a brief overview of Qualcomm Hexagon RTOS (QuRT) and some usage examples involving QuRT API calls. For more details on QuRT, see the QuRT user guide.

QuRT is a simple real-time operating system (RTOS) that runs on the DSP and supports multithreading, thread communication and synchronization, interrupt handling, and memory management.

The DSP can execute a ﬁxed number of threads simultaneously. These threads are referred as hardware threads. For example, the cDSP of the SM8150 and SM8250 devices includes four hardware threads, while Lahaina's cDSP includes six hardware threads. These hardware threads might need to share resources. For example, the six hardware threads in Lahaina share four HVX contexts and the memory subsystem. QuRT provides support for sharing these hardware resources to run a large number of software threads with different requirements.

Specifically, QuRT supports real-time priority-based preemptive multithreading:

Multithreading means that multiple software threads can execute at the same time in a user program. QuRT initially assigns a single thread of execution to a user executable, and the program can then create additional threads.
Priority-based means that each thread is assigned a priority level. The priority determines which threads have execution priority.
Preemptive means that a thread can be preempted (the processor is taken away)when a higher priority thread is ready to execute.
Real-time means the operating system can perform operations within fixed time constraints.

QuRT consists of the a kernel and a user library:

The kernel provides a minimal set of operating system facilities, including thread creation, scheduling, and blocking. It also performs basic memory management.
The QURT user library provides an API to the kernel operations and some additional library functions to aid in programming.

Thread management

QuRT supports real-time priority-based preemptive multithreading, with no form of time-slicing for same-priority threads. For an example that demonstrates QuRT thread APIs, see multithreading.

Thread attributes and contexts

QURT provides the qurt_thread_create() function to create a thread with specified attributes. Threads have two kinds of attributes:

Static attributes, which cannot be changed after a thread is created.

Static attributes are set before a thread is created (using the qurt_thread_attr_init() and qurt_thread_attr_set() functions) and when a thread is created (by directly passing the attributes as arguments to qurt_thread_create()).
Dynamic attributes, which can be changed after the thread is created.

Dynamic attributes are set after a thread is created using one of the qurt_thread_set_*() functions such as qurt_thread_set_priority().

Thread priority is an example of an attribute that allows both static and dynamic configurations. It is important to set the initial thread priority statically to avoid its being given a very low default priority by QuRT. See the discussion on thread priorities below for recommendations for selecting appropriate priority values. If in doubt, using the same priority as the current thread is typically a good default choice:

qurt_thread_attr_t attr;
qurt_thread_attr_init(&attr);
qurt_thread_attr_set_priority(&attr, qurt_thread_get_priority(qurt_thread_get_id()));

Thread Scheduling

The Hexagon DSP supports executing several threads simultaneously in hardware - typically four to six on most current DSPs. The QuRT OS supports a much larger number of software threads, and, like most operating systems, uses a scheduler to select which threads to run at any given time.

All threads are initialized to the Ready state. During system startup, the scheduler selects the highest-priority threads for execution and changes their thread state to Running. The action of suspending one thread and resuming another is called a context switch.

The following operations can cause a context switch:

Creating or exiting a thread
Changing a thread priority
Waiting on or releasing a mutex or semaphore
Waiting on or resuming from a signal, barrier, or condition variable
Reading or writing from a pipe
Servicing an Interrupt

Note that the QuRT scheduler does not time-slice execution of threads at the same priority level, and has no attempt at "fairness" to ensure lower-priority threads get to run in the presence of higher-priority threads. Instead the X (where X is the number of HW threads in the system) highest-priority threads in the Ready state get to run until something in the system causes a context switch as discussed above.

Thread Priorities

QuRT uses thread priority values 1 through 255, with smaller values representing higher priorities. The same values are used for both operating system thread scheduling and resource management prioritization using the Compute Resource Manager. See the system integration page for a discussion of priority level limitations and appropriate priorities to use for different applications.

Thread synchronization mechanisms

Mutex

Threads use mutexes to synchronize their execution to ensure mutually exclusive access to shared resources. If a thread performs a lock operation (using qurt_mutex_lock()) on a mutex that is not being used, the thread gains access to the shared resource that is protected by the mutex, and it continues executing.

If a thread performs a lock operation on a mutex that is already being used by another thread, the thread is suspended on the mutex. When the mutex becomes available (because the other thread has unlocked it), the suspended thread is awakened and gains access to the shared resource.

Signals

Threads use signals to synchronize their execution based on the occurrence of one or more internal events. If a thread is waiting on a signal object for a specified set of signals to be set, and one or more of those signals is set in the signal object, the thread is awakened. The qurt_signal_wait() and qurt_signal_wait_cancellable() functions wait for any or all signals, depending on their wait type arguments.

Semaphores

Threads use semaphores to synchronize their access to shared resources. When a semaphore is initialized, it is assigned an integer count value. This value indicates the number of threads that can simultaneously access a shared resource through the semaphore. The default value is 1.

When a thread performs a Down operation on a semaphore, the result depends on the semaphore count value:

If the count value is nonzero, it is decremented. The thread gains access to the shared resource and continues executing.
If the count value is zero, it is not decremented, and the thread is suspended on the semaphore. When the count value becomes nonzero (because another thread released the semaphore) it is decremented, and the suspended thread is awakened and gains access to the shared resource.

When a thread performs an Up operation on a semaphore, the semaphore count value is incremented. The result depends on the number of threads waiting on the semaphore:

If no threads are waiting, the current thread releases access to the shared resource and continues executing.
If one or more threads are waiting and the semaphore count value is nonzero, the kernel awakens the highest priority waiting thread and decrements the semaphore count value. If the awakened thread has a higher priority than the current thread, a context switch might occur.

Barriers

Threads use barriers to synchronize their execution at a specific point in a program. When a barrier is initialized, it is assigned a user-specified integer value. This value indicates the number of threads to synchronize on the barrier. When a thread waits on a barrier, it is suspended on the barrier:

If the total number of threads waiting on the barrier is less than the barrier’s assigned value, no other action occurs.
If the total number of threads waiting on the barrier equals the barrier’s assigned value, all threads currently waiting on the barrier are awakened, allowing them to execute past the barrier.

After a barrier's waiting threads are awakened, it is automatically reset and can be used again in the program without the need for reinitialization.

Condition variables

Threads use condition variables to synchronize their execution based on the value in a shared data item. Condition variables are useful in cases where a thread would continuously poll a data item until it contained a specific value. Using a condition variable, the thread can efficiently accomplish the same task without the need for polling.

Memory management

QuRT offers memory mapping and allocation APIs that are exposed indirectly to user PDs through the HAP_mem APIs.

Cache management

The DSP has a two-level cache memory subsystem. On the cDSP with HVX, L1 is only accessible to the scalar unit, making L2 the second level memory for the scalar unit and the first level memory for the HVX coprocessor. For more details on L1 and L2 caches, see the Memory subsystem documentation.

To maintain coherence, cache IO coherency allows the DSP to snoop into the CPU L2 cache. IO coherency is enabled by default for CPU-cached buffers shared with the DSP. However, clients can disable IO coherency for ION buffers by registering buffers as non-coherent using remote_register_buf_attr() with the FASTRPC_ATTR_NON_COHERENT attribute. The remote_register_buf_attr() function is explained in Remote APIs.

Another option is to map a buffer as non-coherent upon its allocation by passing the RPCMEM_HEAP_NONCOHERENT flag when calling rpcmem_alloc().

uint32 flags = ION_FLAG_CACHED | RPCMEM_HEAP_NONCOHERENT
rpcmem_alloc(heapid, flags, size);

The rpcmem_alloc() function is documented as part of the RPCMEM APIs.

IO-coherent buffers require no CPU-side cache maintenance, thus facilitating reduced FastRPC overhead with little dependence on the total size of buffers being shared with the DSP.

PMU management

QuRT offers PMU configuration APIs for counting various hardware events that are useful in profiling. These APIs are exposed indirectly to the user space through the SysMon application. To profile the DSP workload, see the Sysmon_Profiler application.