Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

Processor SDK Linear Algebra Library

From Texas Instruments Wiki
Jump to: navigation, search

The TI Linear Algebra library (LINALG) is an optimized library for performing dense linear algebra computations. It includes BLAS which is based on BLIS 0.1.6 (https://github.com/flame/blis) and LAPACK which is based on CLAPACK 3.2.1 (http://www.netlib.org/clapack/).

Supported Devices and SDKs

LINALG has been ported to the following devices and SDKs:

  • K2H/Processor-SDK Linux
  • AM572x/Processor-SDK Linux
  • C6678/Processor-SDK RTOS (only BLAS. LAPACK not available on C6678.)

For other devices not listed above, users can port the LINALG by themselves following the instructions given in Porting LINALG.

Standard API

TI LINALG adapts the CBLAS API for BLAS and CLAPACK API for LAPACK. Refer to following web sites for detailed API documentation and/or user's guide.

CBLAS API Extension for TI DSP Applications

For TI DSP applications using BLAS, there is CBLAS API extension for memory management. For detailed documentation of this API extension, refer to LINALG User's Guide in the docs folder in the installed package.

Integrating LINALG Library

For Processor-SDK Linux, LINALG has the following object archives to be linked into application programs:

  • BLAS library: <LINALG_installation_folder>/packages/ti/linalg/lib/libcblas_armplusdsp.a.
  • LAPACK library: <LINALG_installation_folder>/packages/ti/linalg/lib/libcblaswr.a, liblapack.a, libf2c.a. Note that BLAS library needs to be linked as well when using LAPACK.
  • Source code: <LINALG_installation_folder>/packages/ti/linalg/blis, <LINALG_installation_folder>/packages/ti/linalg/ticblas, etc.

where LINALG_installation_folder is <Processor-SDK Linux installation root>/linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/share/ti/ti-linalg-tree.

For Processor-SDK RTOS, LINALG has the following object archive to be linked:

  • BLAS library: <LINALG_installation_folder>/packages/ti/linalg/lib/libcblas.ae66

where <LINALG_installation_folder> is <Processor-SDK RTOS installation root>/linalg_<version>.

LINALG header files are located at <LINALG_installation_folder>/packages/ti/linalg:

  • BLAS header files: cblas.h, ticblas.h
  • LAPACK header files: f2c.h, blaswrap.h, clapack.h

Note that LAPACK header fc2.h has a complex type which is different from the complex type in C99 complex.h. If f2c.h is included, C99 complex.h should not be used.


BLAS Configuration for ARM+DSP Applications

BLAS can be configured to run on either ARM or DSP (offloading). LAPACK can only run on ARM (BLAS functions invoked by LAPACK may run on DSP according to configuration and problem size). When BLAS runs on ARM, it can be configured to run on 1 or more cores. When BLAS runs on DSP, it will always run on all cores.

The configuration is done through the following environment variables:

  • BLIS_IC_NT: to configure number of ARM cores to run BLAS. Value can be 1, 2, 3, ... # of ARM cores.
  • TI_CBLAS_OFFLOAD: to configure BLAS offloading. Set to xyz, where x,y,z correspond to level 1, level 2, and level 3 functions respectively and can take any of 3 values below:
    • 0: no offloading to DSP, i.e. always running on ARM
    • 1: forced offloading to DSP, i.e. always running on DSP
    • 2: optimum offloading to DSP based on problem size in order to achieve best performance in terms of execution time

For example, TI_CBLAS_OFFLOAD=012 means level 1 functions will always run on ARM, level 2 functions will always run on DSP, and level 3 functions may run on ARM or DSP depending on matrix sizes.

Default configuration:

  • Default number of ARM cores is 1 if environment variable BLIS_IC_NT is not set.
  • Default offloading configuration, when environment variable TI_CBLAS_OFFLOAD is not set, is 002 (no offloading for level 1 and 2, optimum offloading for level 3)
  • Note: in this release, optimum offloading (value 2) is not available for level 1 and level 2. If this option is configured for level 1 and level 2, functions will be always offloaded to DSP.

Benchmarking

BLAS functions were benchmarked for all devices supported in this release: K2H, C6678, and AM572x. The benchmarking was performed on corresponding EVMs, with memory usage specified in Memory Models:

  • K2H EVM: 1.2GHz ARM, 1.2GHz DSP, 1600MHz DDR3. Benchmark data can be found here.
  • C6678 EVM: 1.0GHz DSP, 1600MHz DDR3. Benchmark data can be found here.
  • AM572x EVM: 1.5GHz ARM, 750MHz DSP, 1600MHz DDR3. Benchmark data can be found here.


Recompiling LINALG Source Code

The LINALG source code can be recompiled for various devices and under either Processor-SDK RTOS or Processor-SDK Linux. To rebuild LINALG, one needs to:

  • Locate the tools and software components in Processor-SDK that LINALG depends on
  • Specify the memory model that LINALG is optimized with
  • Define the device to build LINALG for
  • Specify whether to build for DSP-only library (RTOS) or ARM+DSP library (OpenCL)

Tools and Software Components

The tools and software components necessary for rebuilding LINALG are included in Processor-SDK. For RTOS, they are located in the installation root folder. For Linux, they are located in linux-devkit/sysroots/cortexa15hf-neon-linux-gnueabi/usr/share/ti.

Following environment variables need to be set in order to build DSP-only LINALG under Processor-SDK RTOS:

  • export CGTROOT=<TI_CGT_INSTALLATION_ROOT>/cgt-c6x"
  • export PDK_DIR=<Processor-SDK-RTOS-installation-root>/pdk_c667x_<version>
  • export FC_DIR=<Processor-SDK-RTOS-installation-root>/framework_components_<version>
  • export XDAIS_DIR=<Processor-SDK-RTOS-installation-root>/xdais_<version>
  • export BIOS_DIR=<Processor-SDK-RTOS-installation-root>/bios_<version>
  • export OMP_DIR=<Processor-SDK-RTOS-installation-root>/openmp_dsp_c667x_<version>
  • export XDC_DIR=<Processor-SDK-RTOS-installation-root>/xdctools_<version>_core
  • export IPC_DIR=<Processor-SDK-RTOS-installation-root>/ipc_<version>
  • export EDMA3_DIR=<Processor-SDK-RTOS-installation-root>/edma3_lld_<version>
  • export PATH=<TI_CGT_INSTALLATION_ROOT>/cgt-c6x/bin:$PATH

Following environment variables need to be set in order to build ARM+DSP LINALG under Processor-SDK Linux:

  • export TI_OCL_INSTALL_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/opencl
  • export CGTROOT=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x
  • export TI_OCL_CGT_INSTALL=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x
  • export XDC_DIR=<Processor-SDK-RTOS-installation-root>/xdctools_<version>_core
  • export BIOS_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-sysbios-tree
  • export XDAIS_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-xdais-tree
  • export FC_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-framework-components-tree
  • export PDK_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-pdk-tree
  • export OMP_DIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-omp-tree
  • export TARGET_ROOTDIR=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi
  • export PATH=<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/x86_64-arago-linux/usr/share/ti/cgt-c6x/bin:<Processor-SDK-Linux-installation-root>/linux-devkit/sysroots/x86_64-arago-linux/usr/bin:$PATH

Memory Models

To facilitate porting of LINALG to various devices, LINALG is implemented for three memory models, with each corresponding to a category of available SRAM in L1D/L2/L3(or MSMC). These memory models and the corresponding supported devices in current release are listed below:

Memory Model L1D SRAM L2 SRAM L3/MSMC SRAM Devices
Large 28KB 768KB 4.5MB K2H
Medium 28KB 256KB 2.5MB C6678
Small 28KB 128KB 1MB AM572x

To build LINALG for a specific device, a memory model needs to be selected according to available SRAM in L1D/L2/L3(MSMC). L1D and L2 cache can be reconfigured to satisfy the need on SRAM.

Devices

Refer to Supported Devices for currently supported devices. Refer to Porting Linalg for how to port LINALG to other devices.

The right device name needs to be specified when building LINALG, for example, SOC_K2H for K2H, SOC_C6678 for C6678, and SOC_AM572x for AM572x.

Operating Systems

From the view point of LINALG, the operating system may be either OpenCL for Processor-SDK Linux or RTOS for Processor-SDK RTOS.

Rebuilding LibArch

In order to rebuild LINALG, the Library Architecture and Framework (LibArch) will need to be rebuilt first. Follow the steps below to rebuild LibArch:

  • Go to <LibArch installation root folder>/packages/ti/libarch
  • Type "make lib TARGET=<device_name> LIBOS=<os_name>", where
    • <device_name> must be one of SOC_K2H, SOC_C6678, SOC_AM572x.
    • <os_name> must be one of LIB_OPENCL or LIB_RTOS.
  • The rebuilt object library is located at: <installation root folder>/packages/ti/libarch/lib
  • Set LIBARCH_DIR environment variable needed by rebuilding LINALG: export LIBARCH_DIR="<installation root folder>"

Rebuilding LINALG

  • Go to <LINALG installation root folder>/packages/ti/linalg
  • Type "make <make_target> MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=<os_name>", where
    • <make_target> must be one of ARMplusDSP or DSPlibs,
    • <memory_model_name> must be one of Large, Medium, or Small.
    • <device_name> must be one of SOC_K2H, SOC_C6678, SOC_AM572x.
    • <os_name> must be either LIB_OPENCL for ARMplusDSP or LIB_RTOS for DSPlibs.
  • The rebuilt object library is located at: <installation root folder>/packages/ti/linalg/lib
  • Note: a memory model can be used for a device which has more memory than required by that memory model. For example, Small memory model can be used for SOC_C6678 or SOC_K2H in addition to SOC_AM572x.


Porting LINALG to More Devices

LINALG can be ported to more devices in addition to those listed in Supported Devices.

Adding Device Support to LibArch

For OpenCL version, verify that the Texas Instruments OpenCL implementation is supported on the device according to TI OpenCL.

For RTOS version, device support needs to added to LibArch by defining the following macros for the specific device in src/ti/libarch/src/lib_cachecfg.h (this means source code change):

  • #define LIB_L1D_SIZE_TOTAL
  • #define LIB_L2_SIZE_TOTAL
  • #define LIB_L1D_BASE_ADDRESS
  • #define LIB_L2_BASE_ADDRESS

Adding Device Support to LINALG

Change LINALG's Makefile to define NUM_DSP_CORES and NUM_ARM_CORES for the new device.

  ifeq ($(TARGET),SOC_K2H)
  NUM_ARM_CORES=4
  NUM_DSP_CORES=8
  else ifeq ($(TARGET),SOC_C6678)
  NUM_DSP_CORES=8
  else ifeq ($(TARGET),SOC_AM572x)
  NUM_ARM_CORES=2
  NUM_DSP_CORES=2
  else ifeq ($(TARGET),<new device name>)   -> add new TARGET
  NUM_ARM_CORES=<number of ARM cores>
  NUM_DSP_CORES=<number of DSP cores>
  else
  $(call error, ERROR - TARGET NOT DEFINED OR SUPPORTED.)
  endif


Choosing the Right Memory Model

Depending on the available SRAM in L1D/L2/L3(MSMC) on the specific device, one of the three memory models listed in Memory Models must be selected.

Rebuilding LibArch and LINALG

Follow instructions in Rebuilding LINALG to rebuild the LibArch and LINALG.


Tuning ARM+DSP LINALG for Optimum Offloading

When level 3 BLAS is configured for optimum offloading according to BLAS Configuration, the offloading decision will be based on matrix sizes. Automatic tuning can be performed to find the matrix sizes for which offloading to DSP is faster than running on ARM.

The released BLAS library is tuned for the following devices and configuration:

  • K2H: 3 ARM cores at 1GHz and 8 DSP cores at 1.2GHz, DDR3 1600MHz
  • AM572x: 1 ARM core at 1GHz and 2 DSP cores at 750MHz, DDR3 1600MHz

To redo tuning for different devices or same device but different configurations, please follow these steps:

  1. Follow instructions in Rebuilding LINALG to rebuild LINALG library.
  2. Go to <LINALG_installation_folder>packages/ti/linalg/tuning
  3. Type "make" to build the tuning code
  4. Copy whole folder <LINALG_installation_folder>packages/ti/linalg/tuning to device EVM
  5. On device EVM, set environment variable BLIS_IC_NT to number of ARM cores to run BLAS on (1, 2, etc)
  6. Go to tuning folder
  7. Type "make tune" to run the tuning
  8. After the above step is finished, copy tuning/ofld_tbls/ofld_tbl_*.c on EVM to <LINALG_installation_folder>/packages/ti/linalg/blasblisacc/src/ofld_tbls_<device_name> on Linux PC, where <device_name> must be SOC_K2H, SOC_AM572x, etc.
  9. On Linux PC, follow the build instructions in Rebuilding LINALG to rebuild LINALG. The same <device_name> as in previous step must be used in the make command: make ARMplusDSP MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=LIB_OPENCL.


Build and Run Examples

A few examples are provided to show how to use LINALG with CBLAS and CLAPACK API. They are located in <LINALG_installation_folder>/examples:

  • ARM+DSP examples in arm+dsp folder: examples all run on the host (ARM) and may offload BLAS functions to DSP according to BLAS Configuration. Examples include:
    • Matrix multiplication (dgemm)
    • Symmetric rank k operation (dsyrk)
    • Triangular matrix multiplication (dtrmm)
    • Triangular matrix equation solver (dtrsm)
    • Eigen decomposition and matrix inversion (eig)
    • LU decomposition and matrix inversion (ludinv)
    • xGEMM benchmarking (gemm_bench)
    • To run these examples, follow the steps listed below:
      • Set environment variable LINALG_DIR to <LINALG_installation_folder>
      • Go to folder <LINALG_installation_folder>/examples/arm+dsp
      • Type "make" to build the examples
      • Copy the executable of a desired example to device EVM and run
  • DSP-only example in dsp folder: running on DSP through CCS and JTAG. Examples include:
    • Matrix multiplication (dgemm)
    • To run these examples, follow the steps listed below:
      • Set environment variable LINALG_DIR to <LINALG_installation_folder>, in addition to the environment variables listed in Rebuilding LINALG.
      • Go to folder <LINALG_installation_folder>/examples/dsponly
      • Type "make TARGET=<device_name>" to build the examples
      • Load the executable (.out file) to CCS and run

Rebuild and Run Test Suites

LINALG was tested extensively through various test suites:

  • ARM+DSP library: tested through BLIS test suite provided by BLIS, and BLAS and LAPACK test suites provided by CLAPACK.
  • DSP-only library: tested through BLIS test suite

These test suites can be rebuilt according to steps below:

  • Follow instructions in Rebuilding LINALG to rebuild LINALG library.
  • Set environment variable LINALG_DIR to <LINALG_installation_folder>
  • Go to folder <LINALG_installation_folder>/packages/ti/linalg
  • Rebuild ARM+DSP test suites:
    • BLIS test suite: make BLIStest MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=LIB_OPENCL
    • BLAS test suite: make BLAStest MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=LIB_OPENCL
    • LAPACK test suite: make CLAPACKtest MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=LIB_OPENCL
  • Rebuild DSP-only test suite:
    • BLIS test suite: make BLIStestDSP MEM_MODEL=<memory_model_name> TARGET=<device_name> LIBOS=LIB_RTOS

Follow the steps below to run these tests:

  • BLIS test suite for ARM+DSP:
    • copy folder <LINALG_installation_folder>/packages/ti/linalg/blis/testsuite to device EVM
    • go to folder blis/testsuite on device EVM and type "test_libblis_cortex-a15.x"
  • BLIS test suite for DSP-only:
    • load <LINALG_installation_folder>/packages/ti/linalg/blis/testsuite/blistestDSP.out to CCS and run
  • BLAS test suite for ARM+DSP:
    • copy folder <LINALG_installation_folder>/packages/ti/linalg/clapack/BLAS to device EVM
    • go to folder clapack/BLAS on device EVM and type "./run_blas_tests.sh "
  • LAPACK test suite:
    • copy folder <LINALG_installation_folder>/packages/ti/linalg/clapack/TESTING to device EVM
    • go to folder clapack/TESTING on device EVM and type "./run_clapack_tests.sh "