OpenMP Accelerator -- CIM and OpenMP
Convolution C code example, with OpenMP |
Matrix multiply C code example, with OpenMP style pragma |
Section of arbitrary C code marked for offloading to the CIM array |
Separate compute intensive sections of C code, marked |
Overview
CIM technology software supports three (3) basic modes of acceleration- OpenMP
- API based (planned for a future release)
- CUDA emulation (planned for a future release)
- Parallel sections -- sections of code that run on different cores at the same time
- Parallel for-loops -- a nested for-loop that shares work among some number of cores. In this case, "work" typically means the innermost processing performed by the loop
- Offloaded sections -- arbitrary sections of code, related or unrelated, marked for offloading to CIM array
CIM Software Build Process
Here is a brief overview of the CIM software build process:- Separates C source into two (2) separate code streams, one for x86 / ARM (the server "host" CPU), and one for TI multicore CPUs
- Modifies the generated code streams to deal with run-time communication (moving data between host CPU memory and CIM array memory)
- Builds both code streams, creating a host executable file, and one or more (typically many) CIM array executable files
CIM at Run-Time
- CIM and OpenMP pragmas may be used at the same time
- CIM pragmas use the "cim" keyword, OpenMP pragmas use "omp"
- Programs can make run-time decisions whether to use CIM and how much. If the CIM array is not available or offline, code is still functional (but runs slower)
Multicore Hardware Supported
Several types of multicore hardware are supported for CIM acceleration, including:- Advantech 32-core and 64-core PCIe cards (DSPC-8681 and DSPC-8682). Clock rates 1, 1.25, and 1.5 GHz. DDR3 mem amounts 1 and 2 GByte
- Comm Agility 32-core uATCA modules
- Advantech 160-core ATCA boards (8901 board). DDR3 mem amounts 1 and 2 GByte