This section covers the Micro-scheduler's low-level Task interface. It is the interface used to build all the Parallel Patterns.
- Warning
- The Task interface is for low-level, high performance algorithm design. It is recommended to only use this interface if there is not an available Parallel Pattern. For a thorough understanding of the pitfalls, please read this entire section especial (Pitfalls).
Tasks
In GTS a Task is modeled by the Task class. Tasks can be syntactically represented in three ways:
{
IncTask(int& v) : val(v) {}
{
++val;
return nullptr;
}
int val;
};
int val = 3;
gts::Task* pTask = microScheduler.allocateTask<IncTask>(val);
microScheduler.spawnTaskAndWait(pTask);
A Task payload that embeds TFunc and TArgs into the Task's data. It makes it easy to construct a Task...
Definition: Task.h:120
#define GTS_ASSERT(expr)
Causes execution to break when expr is false.
Definition: Assert.h:144
The context associated with the task being executed.
Definition: MicroSchedulerTypes.h:54
int val = 3;
{
++val;
return nullptr;
});
microScheduler.spawnTaskAndWait(pTask);
- By use of C-style functions
{
int& val = *(int*)pData;
++val;
return nullptr;
}
int val = 3;
gts::Task* pTask = microScheduler.allocateTask<CStyleTask>(incrementFunction, &val);
microScheduler.spawnTaskAndWait(pTask);
- Note
- Much of the Task interface for GTS is inspired by TBB. If you already know TBB, GTS will be familiar to you.
Activation Frame
The activation frame of a Task is the gts::Task execute method. It is were a Task's work is done, and it is where dependenies can be dynaimcally added to executing Task.
Task Context
The gts::TaskContext is a parameter that is passed into the activation frame. It contains contextual information about the Task like
- The Task itself, which is useful for labmda Task
- A reference to the Micro-scheduler that created the Task, which is needed for added dependecies.
- A OwnedId of the Worker thread executing the Task, which is useful for binning resources.
Return Task
The Task returned from the Activation Frame is executed immediately, completely bypassing the scheduler. We'll see how to used this as an optimization later.
Spawning
- Fire-and-forget: Spawns a Task that does not need to be waited on for completion.
microScheduler.spawnTask(pTask);
- Fire-and-wait: Spawns a Task and waits for it to complete.
microScheduler.spawnTaskAndWait(pTask);
- Affinity: Forces a Task to run on a specific Worker thread. It must be set before spawning.
GTS_INLINE void setAffinity(uint32_t workerIdx)
Definition: Task.inl:78
- Priority: Spawns a Task the specified Task priority.
uin32_t priority = 2
microScheduler.spawnTask(pTask, priority);
Lifetime
Task are dynamic, reference counted objects. They are created by a Micro-scheduler with a reference count of one and they lose a reference when executed. If their reference count becomes zero after execution, they are destroyed.
microScheduler.spawnTaskAndWait(pTask);
pTask->doSomething();
Later we'll explore how to manipulate the reference count.
Dependencies
Task dependencies are modeled by making a Task a child of another Task, or by making a Task a continuation of another Task.
Children
Child Tasks model forks. Children refer to their parent Task directly, and a parent Task refers to its children in reference count only. When a child Task has finished executing, it decrements its parent’s reference count.
The safest way to add a child is with a reference count, as it automatically adds the reference to the parent.
{
addChildWithRef(pChild);
...
void spawnTask(Task *pTask, uint32_t priority=0)
Spawns the specified 'pTask' to be executed by the scheduler. Spawned tasks are executed in LIFO orde...
GTS_INLINE TTask * allocateTask(TArgs &&... args)
Allocates a new Task object of type TTask.
Definition: MicroScheduler.h:145
MicroScheduler * pMicroScheduler
The MicroScheduler executing the Task.
Definition: MicroSchedulerTypes.h:59
Yet, this can be inefficient when adding multiple children. In this case, it is better to manually adjust the reference count and then add the children without reference increment
{
addRef(2, memory_order::relaxed);
addChildWithoutRef(pChild1);
addChildWithoutRef(pChild2);
...
- Warning
- Forgetting to add reference counts will cause undefined behavior.
Continuations
Continuations are Tasks that join two or more child Tasks. The child Tasks refer directly to the continuation and the continuation Task refers to its children in reference count only. When creating a continuation, the current Task is unlinked from the task graph and the continuation in link in its place.
When a child finishes executing, it decrements its continuations’s reference count. If the reference count is one, meaning that it is the last child, it executes the continuation.
Here is an example of inserting a continuation Task into a Task graph.
{
setContinuation(pContinuation);
pContinuation->
addRef(2, memory_order::relaxed);
pContinuation->addChildWithoutRef(pChild1);
pContinuation->addChildWithoutRef(pChild2);
...
GTS_INLINE int32_t addRef(int32_t count=1, gts::memory_order order=gts::memory_order::seq_cst)
Definition: Task.inl:84
- Warning
- Forgetting to add reference counts will cause undefined behavior.
Joining
GTS offer two options for joins: blocking joins and continuations.
Blocking
Blocking joins are joins that wait for a Task to complete. Blocking joins block the current execution stream, however they do not block the executing thread. Instead the executing thread executes work in the Micro-scheduler.
- Note
- Blocking joins are the simplest form of joining in the Task model, yet they can also introduce the most latency. The latency comes from executing Tasks in the Micro-scheduler while waiting, which can cause the blocked execution stream to not immediately resume when ready.
Here is an example of blocking with gts::Task::waitForAll. Note the extra reference count added for the wait. /(This is done to distinguish a waiting parent from a waiting continuation./)
{
addRef(2 + 1, memory_order::relaxed);
addChildWithoutRef(pChild1);
addChildWithoutRef(pChild2);
waitForAll();
return nullptr;
}
We can also used gts::Task::spwandWaitForAll, which combines spawning and blocking. It is slightly more efficient for the last child Task.
{
addRef(2 + 1, memory_order::relaxed);
addChildWithoutRef(pChild1);
addChildWithoutRef(pChild2);
spawnTaskAndWaitForAll(pChild2);
return nullptr;
}
Continuations
Continuations are explicit join Tasks that execute immediately when their dependencies have completed. Generally, continuations do not suffer the latency of blocking joins, but they do require explicit linking into a Task graph.
{
setContinuation(pContinuation);
pContinuation->
addRef(2, memory_order::relaxed);
pContinuation->addChildWithoutRef(pChild1);
pContinuation->addChildWithoutRef(pChild2);
return nullptr;
}
Optimizations
Bypassing
Bypassing is an optimization where a Task is returned from the activation frame instead of being spawned. It is an optimization because the returned Task is executed immediately instead of being store and retrieved from from the Micro-scheduler’s internal Task storage. Bypassing with a child Tasks requires the use of continuations becuase there is no longer an option of waiting. Bypassing fire-and-forget Tasks have no constraints.
{
setContinuation(pContinuation);
pContinuation->
addRef(2, memory_order::relaxed);
pContinuation->addChildWithoutRef(pChild1);
pContinuation->addChildWithoutRef(pChild2);
return pChild2;
}
Task Recycling
Task recycling is an optimization that reuses the currently executing Task as a new Tasks. Once the current Task finished excuting it is not destroyed and instead treaded as a newly spawned Task.
Here are the safe ways to recycle a Task:
- As a child with bypassing:
{
setContinuation(pContinuation);
pContinuation->
addRef(2, memory_order::relaxed);
pContinuation->addChildWithoutRef(pChild1);
recyleAsChildOf(pContinuationTask);
return this;
}
- As a continuation:
{
recycle();
setContinuation(this);
addRef(2, memory_order::relaxed);
pContinuation->addChildWithoutRef(pChild1);
pContinuation->addChildWithoutRef(pChild2);
return pChild2;
}
- As automatic fire-and-forget:
{
recycle();
return nullptr;
}
- As bypass fire-and-forget:
{
recycle();
return return;
}
Full Examples
- Blocking
- Continuations
- Bypassing
- Recycling
General Task Graphs
The following examples demonstrate the creation of weakly dynamic and strongly dynamic graphs in GTS. The examples create graphs for a 2D prefix sum using a naïve wavefront algorithm.
- Note
- Examples of static Task graphs are omitted because execution time is not considered in the Micro-scheduler’s greedy scheduling algorithm.
Weakly Dynamic Task Graph
This example prebuilds a Task graph and then executes its. It also demonstrates manipulating a root Task's reference count.
Strongly Dynamic Task Graph
This example generates the graph as the execution unfolds. It moves the data out of the Tasks and into a memoization table. It does not require special treatment if the last Task like the previous example did. Further, it is more efficient since parallelizes the creation of the graph.
- Note
- GTS includes a production version of a wavefront algorithm gts::ParallelWavefront.
Pitfalls
This section will cover various race condition that can occur when setting up dependencies
Accessing a Spawned Task
microScheduler.spawn(pTask);
pTask->doSomething();
Children and Reference Counting
Blocking and Reference Counting
{
addChildWithRef(pChild1);
addChildWithRef(pChild2);
waitForAll();
...
Continuations and Reference Counting
{
setContinuation(pContinuation);
pContinuation->addChildWithoutRef(pChild1);
pContinuation->addChildWithoutRef(pChild2);
return nullptr;
}
Blocking and Stack Overflow
Prefer continuations to blocking, as blocking can overflow the stack.
gts::Task* stackOverflowTask(TaskContext
const& ctx)
{
addRef(2, memory_order::relaxed);
gts::Task* pTask = ctx.pMicroScheduler->allocateTask(taskFunc);
addChildWithoutRef(pTask);
ctx.pMicroScheduler->spawnTask(pTask);
waitForAll();
}
Cycles
pTaskA->addChildWithRef(pTaskB);
pTaskB->addChildWithRef(pTaskC);
pTaskC->addChildWithRef(pTaskA);
void spawnTaskAndWait(Task *pTask, uint32_t priority=0)
Spawns the specified 'pTask' to be executed by the scheduler and then waits for its reference count t...