Software Thread Integration (STI) methods are software transformation and
design techniques which interleave functions from two or more threads
or portions of a program at the assembly language level to produce a
single implicitly multithreaded function. This resulting function
provides two large advantages over the original function: minimal
context-switching overhead and increased instruction-level parallelism
Reducing context-switching overhead increases program efficiecy in
applications with frequent switches. This is useful when performing
hardware-to-software migration, as it lowers the processor throughput
requirements for an application, and increases the maximum performance
of a processor. This software improvement enables the use of a slower
and less expensive processor.
Increasing ILP allows more efficient scheduling of
instructions. Dependences between instructions typically limit the
efficiency of processors with multiple instruction issue or moderately
deep pipelines to levels well below the capabilities of the hardware.
As STI creates an implicitly multithreaded function from separate
functions, this function as much more instruction-level parallelism
that the original functions, allowing much more efficient scheduling
and hence processor use.
This tutorial presents Software Thread Integration methods and
applications. We first introduce the software transformations used
for STI, and present desirable characteristics of target hardware and
software. We present and discuss the run-time model for STI. We then
present applications in which STI provides a benefit.
Next we present Asynchronous STI (ASTI), which uses the STI transformations
in conjunction with coroutine calls to create code which enables
independent progress among integrated threads. This extends the
range of applications which can benefit from these technologies. We
present and discuss the transformations and desirable characteristics
of target hardware and software, as well as the run-time model. We then
present applications which benefit from ASTI.
We finish by presenting STI as used for increasing ILP on a
very-long-instruction-word (VLIW) digital signal processor. In this
case the C source code rather than assembly code can be integrated,
enabling the developer to leverage the software pipelining, predication
and other powerful features which modern compilers rely upon to improve
performance. We present the techniques and demonstrate results on various
DSP library functions.