Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

TopTenCodecPackageCheckList

From Texas Instruments Wiki
Jump to: navigation, search

This document serves as a kind of 10 commandments list for codec developers to follow when creating IP that is expected to be integrated into larger customer systems

Consumers can use this as a checklist when receiving a codec, before integrating it into a large system i.e. if several of these requirements are not satisfied it could take much longer to integrate than necessary.

When followed, these codec recommendations enable system integrators to rapidly evaluate different codecs - that is, the time taken to change from say an MPEG4 decoder to an H264 decoder should be minimal.

1. XDAIS-compliance

The codec/algorithm should be XDAIS compliant.

For C64x+ DSP core

see also: I just want my video codec to work with the DVSDK
Sample QualiTI_XDAIS_Compliance_Tool XDAIS-compliance check screenshot: Qualiti ScreenShot043.jpg

The QualiTI compliance test tool is undergoing a revival and now catches most of the typical problems that cause integration issues. For example: -

  • flags global symbols that do not meet the XDAIS namespace requirements and could clash with other algorithms
  • checks for poorly named sections which could cause code or data to be mis-categorized and linked to inconsistent memory sections causing crashes.
  • checks for the appropriate IALG, IMODULE, IDMAx (if implemented), IRES (if implemented) interface symbols.

This tool will be available from XDAIS release >= 6.10 at this link.

A complete description of the rules and methodology is available in this article.

Naturally it makes sense to plan to be XDAIS-compliant upfront. These docs show how to achieve this: -

There are also many application notes provided in the XDAIS Developer's Kit giving instructions on how to use e.g. ACPY3, or providing sample code.

For ARM core

see also: ArmCodecPackageCheckList
For targets which do not presently have QualiTI support e.g. dm355, algorithm writers should run basic utilities to ensure that key rules affecting integration are satisfied. For example, rules 8, 9, and 10 deal with namespace compliance. A global function in a libmp4enc.a library called myCopy could easily collide with a user's application code since that name is unlikely to be unique. This would result in a linker error. To check for this, simply run the age-old Unix/Linux nm tool as follows: -

/opt/mv_pro_4.0.1/montavista/pro/devkit/arm/v5t_le/bin/arm_v5t_le-nm -g libmp4enc.a

or, if you have cygwin or MKS installed the i386 linux nm.exe utility will also work fine.

nm.exe -g libmp4enc.a

The -g qualifier limits the output to only global-scope symbols.

This yields an output something like: -

mp4venc_bitstream.o:
00000000 T MP4VENC_TI_DM350_flush_bits
00000044 T MP4VENC_TI_DM350_flush_bits1
00000218 T MP4VENC_TI_DM350_no_bits
...

Anything with MODULE_VENDOR_ prefix wont clash with other libraries or application code. memcpy etc is fine since it comes from the allowed RTS functions list in the Appendix of spru352. The algorithm producer should look for noncompliant symbols such as myCopy, and then take steps to address these such as making them local or renaming.

2. Footprint, CPU usage & resource usage documentation

QualiTI will automatically produce footprint spreadsheets - these should be included in the codec package. In particular the following XDAIS rules can be satisfied statically (i.e. without running the code) via QualiTI: -

  • Rule 20 - All algorithms must characterize their worst-case stack space memory requirements (including alignment).
  • Rule 21 - Algorithms must characterize their static data memory requirements.
  • Rule 22 - All algorithms must characterize their program memory requirements.

The vendor should also supply data for the following rules: -

  • Rule 19 - All algorithms must characterize their worst-case heap data memory requirements (including alignment).
  • Rule 23 - All algorithms must characterize their worst-case interrupt latency for every operation.
  • Rule 24 - All algorithms must characterize the typical period and worst-case execution time for each operation.

Resource usage is most often an issue in the context of video algorithms. For example, some c6x video algorithms use chained Transfer Completion Codes (TCCs) and can thus consume e.g. 4 x 8 = 32 TCCs (4 QDMA channels with 8-deep chaining). This needs to be clearly documented.

Algorithm writers are encouraged to use ACPY2 or ACPY3 to access DMA resources uniformly, however XDAIS does not mandate this. Hence, if a custom API is used to access DMA resources the vendor must clearly document which TCCs, QDMA channels, DMA queue priorities or other resources are being claimed/expected by the algorithm. Codecs should not 'hardcode' any of these i.e. the codec vendor should request the resources needed from the framework. Hardcoding causes very difficult to debug problems - for example hardcoding the 'wrong' DMA channel or Parameter RAM set in a DSP codec could crash an audio codec on the Arm side!

This documentation should show the number of 'logical' DMA channels requested (if any), along with each channel's attributes (e.g. numTransfers, numWaits, priority, protocol, persistent, etc.) Failure to provide such documentation causes extremely difficult to debug problems.

In line with #6 below, performance benchmarks need to be shown in the context of a "real system" e.g. a Codec Engine based application such as the Digital Video Test Bench (DVTB). For example, a simple test-application might assume entirely on-chip memory, however in a larger system such resources may be scarce, hence benchmarks on what the customer can expect in a real system with cache effects etc is very valuable. Obviously there is no perfect metric here - algorithm performance may be different from one framework environment to another - however picking a solid "real-world" framework-based application, integrating the codec, and benchmarking it provides the customer key data.

Customers should be able to quickly reproduce the benchmarks provided.

3. VISA XDM API usage

If the codec/algorithm is a Video, Image, Speech or Audio codec (VISA), it should be XDM compliant and follow one of the released public XDM interface specifications. This also applies to more recent classes such as IVIDANALYTICS or IVIDTRANSCODE.

Codecs should document which interface they use e.g. IAUDDEC, IVIDDEC2 etc. It is important to specify which version of an interface an algorithm implements. For example, it is insufficient to say "this is an XDM MP3DEC algorithm". If a client application only supports IAUDDEC (XDM 0.9) then algorithms implementing IAUDDEC1 will naturally fail in such an application!

The codec should aim to use only base-class parameters whenever possible to facilitate creation by the end-customer of common higher level application code, which is not tied to a particular vendor. This allows customers to swap out 1 version of a codec for another with possibly different features, price-point etc.

Extended parameters should only be used when the functionality cannot be achieved in any other way. When such extended parameters are used it requires custom application code and algorithm "A" cannot be swapped out for another implementation since they implement different interfaces. It therefore defeats the original intent of XDM.

If extended params are required, the codec must distribute a header file that defines the extended structs. It is critical that this header be capable of being #included into non-DSP code - that is, an ARM-side application running WinCE, Linux, or even BIOS(!) may need to include this header - so it must not depend on HW or SW-specific environment (e.g., don't #include <dman3.h> in this codec-specific header).

Algorithms that extend creation-time Params e.g. IAUDENC_Params and/or run-time Params e.g. IAUDENC1_DynamicParams can often be handled without affecting the application flow i.e. its basically handled through a type cast. As noted in #8, good defaults must be used for such extended parameters; if the customer only passes in base parameters the codec should still function correctly. The extra parameters would be available for advanced use-cases.

However extending InArgs and OutArgs e.g. IAUDENC_InArgs has an enormous impact on application code since it basically nullifies "plug and play". Custom application logic for that specific codec becomes necessary. This is strongly discouraged - algorithm writers should re-examine the XDM interface and determine whether a more recent version e.g. IAUDENC1 solves the problem, then subsequently implement to that interface.

A wiki topic describing the complete application impact of codec inArgs/outArgs customization has also been written.

Finally, if extended parameters are necessary, codec vendors should submit an API enhancement request to TI so that this can be reviewed as part of next generation XDM interface specifications.

4. Adhere to XDM semantics, not just syntax.

The Codec Engine Configuration Reference documentation, as well as the XDM collateral, define the behavior specification for the VISA classes.

Common semantics are essential - without these 1 vendor may interpret a structure's field differently from another vendor. If that happens we lose plug and play and the standard is meaningless. This document XDM_1.x_Semantics defines the precise semantics for each of the VISA classes. Codec producers should adhere to these semantics.

For example, in IVIDDEC2 null-termination of the outputId array is required.

Also in IVIDDEC2 the freeBufID array must be populated correctly to identify buffers that are "unlocked" by the algorithm, and can finally be recycled by the application.

The above examples enable uniform application-code.

In general, for new designs, XDM 1.x is recommended. For example IVIDDEC2 is recommended over IVIDDEC because the latter cannot support B-frames in the base interface. Details on the relationship between XDM nomenclature and the VISA interface revisions can be found here. The shortcomings of the original XDM 0.9 interface are clearly outlined in this document. As part of this migration codec vendors are strongly encouraged to implement XDM_GETVERSION as outlined here since customers routinely ask for this kind of run time information.

5. Codecs should be delivered as a RTSC package

The unit of delivery for a codec is a RTSC package. Libraries/Include-files/Docs should not be delivered standalone. Packages should contain all necessary libraries, documentation and linker command file contributions required for the integrator (customer) to use the codec/algorithm package.

It is highly recommended to use the Codec Engine GenCodecPkg wizard to generate these packages, especially if the codec will be used in a Codec Engine based environment since there are several additional methods to implement.

Recall that required placement attributes can be specified in the codec vendor's linker template file link.xdt. Here is how to do this: -

SECTIONS
{
% /* important code that needs 32kb alignment for best L1P cache performance */
%if (this.MPEG2DEC.codeSection) {
    .text:_mp2VDEC_TII_decode > `this.MPEG2DEC.codeSection`, align = 0x8000
%}
}

Do not do this!

SECTIONS
{
% /* important code that needs 32kb alignment for best L1P cache performance */
%if (this.MPEG2DEC.codeSection) {
    .text:_mp2VDEC_TII_decode > L2SRAM, align = 0x8000
%}
}

Memory names are different across chips. Nothing in the codec package should be chip-specific.

However if a codec vendor expects the user to place code sections in a particular type of memory (e.g. onchip) to meet performance requirements, it must be clearly documented.

The preferred release mechanism is via the xdc release methodology. The makefiles generated by GenCodecPkg perform this step, as well as creating a complete .zip file appropriate for distribution. Appropriate name and version information (compatibility key and 'marketing version') should be built-in.

The versions of XDC tools and Codec Engine used to build the codec are important. Stick with the following guidelines: -

  • Use XDC >= 2.95.01 (link) and Codec Engine >= 1.20.02 (link)
    • XDC versions prior to 2.95 incorrectly built a dependency into every package on the xdc.runtime package. Codec Engine 1.20.01 was released with XDC 2.94 whilst CE 1.20.02 was built with XDC 2.95.01, hence the requirement for codec producers to have >= CE 1.20.02. This link provides more details.
  • Build codec packages with the lowest common denominator XDC and Codec Engine versions unless a particular new feature is required.
    • codecs sit at the bottom of the tree. For example, on ARM + DSP platforms server-combos will consume these codecs and GPP-applications will in turn use these combo & codec packages. In truth the compatibility from XDC 2.95.01 and CE 1.20.02 onwards is very good hence you could build a codec with CE 2.0 and consume it in a server and application that relied on CE 1.20.02. However it makes more sense to have codecs built with the lowest common denominator that satisfies the need. For example a video encoder implementing IVIDENC1 on DM6446 could leverage CE 1.20.02. However if a codec implements an interface that is only in a more recent CE, clearly that version of CE is a pre-requisite e.g. a transcoding algorithm implementing IVIDTRANSCODE needs >= CE 2.0. [[Codec Engine Roadmap|This article] provides details on which features were introduced in which CE releases.
  • Server creators and application writers are free to use more recent Codec Engine and XDC tools releases
    • CE 2.0 introduced several new features which benefit system integrators. For example, application writers can call Engine.createFromServer() in their configuration scripts which automatically picks up all the codecs in a server and eliminates the need to maintain DSP Link memory maps. Config scripts can be cut in half and be less error prone. Cache management for video codecs improved and Codec Engine debugging also improved. The XDC 3.x releases continue to make many bug-fixes and support for new platforms, hence you could use the same XDC tools for multiple platforms. If CE requires at least a certain version of XDC this will be flagged in the build stage by virtue of the compatibility key mechanism.

6. Codec vendors should provide a 'unit-server' example, tested in a 'real-world' framework-based standard format test application.

Simple File I/O test harnesses often fail to find issues such as symbol clashes, performance mis-characterization (stack size etc), resource allocation arbitration etc.

Framework such as TI's Codec Engine are full-featured, preemptive frameworks with well-defined memory & DMA resource management policies.

In a DaVinci context, codec vendors should provide a 'unit-server' Codec Engine based example. This is basically an executable containing 1 codec invoked by Codec Engine APIs. The RTSC Server Wizard offers a simple automated method for creating a unit-server.

  • RTSC Server Wizard screenshot:

Rtsc serv wiz ScreenShot043.jpg

Again, in a DaVinci context, products such as the Digital Video Test Bench (DVTB) can be used to run such unit-servers. So too can the DVSDK demos.

The test application should run on a standard TI EVM. It should include instructions on how to create the working environment including board level settings like bootloader (U-Boot), kernel (uImage) and any necessary system files.

Note that although 'servers with remote codecs' only applies to SoC Arm+DSP devices the requirement is still valid on DSP-only or Arm-only devices i.e. the codec should be supplied with test cases that mirror how customers will actually use these codecs in a real-system.

If extended parameters are being used, a test application must be provided which exercises them correctly.

The documentation should specify if a customer needs special tools (such as a stream analyzer) to check correct behavior of the codec.

Note - the unitserver wizard has now been superceded by the combo wizard supplied with >= Codec Engine 2.20. The best part about the combo wizard is that it can be extended to support multiple codecs.

7. Documentation - including coprocessor usage

Every codec should include the following documents: -

  • Release notes
  • Datasheet:
    • Should document clearly if codec is C64x+ generic or if it uses a hardware accelerator specific to a platform.
    • Should document DMA resources used
  • User's Guide
    • Should document valid range for codec parameters (this is the range checked when codec is created)
    • Should document default values for codec parameters
    • Should document XDM commands supported (XDM_RESET, XDM_FLUSH ...)
  • Sample Application Documentation
  • Test results

Some devices such as DM6467 have coprocessors. Usage of such coprocessors should be clearly documented, including pre/post conditions for the hardware coprocessors. For example, usage of the Image Coprocessor (IMCOP) on DM6446 requires that it be powered up & initialized - somewhat obvious, however it is not the codec's responsibility to do this (can't have 10 codecs all reinitializing a hardware resource!). Hence it is imperative that such coprocessor dependence be documented.

Codecs leveraging coprocessors should implement the XDAIS IRES interface, available in XDAIS >= 6.0

8. Publish 'known good' parameters

It is frustrating to empirically tweak codec parameters until the codec functions as expected. The XDM interfaces have quite a large number of parameter settings. Codec vendors should therefore document 'known good' parameter sets. This can take any form e.g. a C file with suggested default parameter structure field assignments, DVTB script files to get/set parameters, doc-files etc.

Go through all parameters and list the valid range of values. Explain restrictions and how to accomplish common configuration tasks that may be non-intuitive. This goes for InArgs, Params, DynamicParams etc

This should typically be in the context of the test application showing real-world usage of codec, with configuration settings and usage parameters (base and/or extended), buffer management and error handling.

If the speed/quality can be fine-tuned with extended parameters, the documentation should provide some information about MHz/PSNR measurements for recommended configurations.

The codec provider should clearly show how the codec operates in error conditions e.g. when should the client application perform XDM_RESET, or re-configure parameters. In time, such information may be folded into future XDM Semantics specifications. At present though, codec providers should provide error handling documentation and/or examples.

Discontinuities from 1 version of a codec to the next should be clearly documented. For example, TI's h264 encoder previously did not require the user to pass a bitrate parameter. This was not an issue because it disabled rate control however from v1.2 this is considered an error because of stringent parameter testing. Applications/demos using the new encoder would need to comprehend this, hence the need for discontinuity documentation. Ideally the codec would return a well-defined specific XDM error indicating the problem cause. Nobody likes surprises! (except at Christmas!)

The example below is from Codec-Engine - it clearly indicates discontinuities i.e. changes that the client is required to make.

  • Discontinuities / Compatibility-breaks good documentation:

Ce relnotes ScreenShot043.jpg

Note - the XDM / VISA specification requires codec vendors to work with base parameters however it does not state which defaults must be supported by all implementations. This can lead to the unfortunate situation where a standard, common application using base parameters works for 1 vendor's codec, but not another because the latter chose not to support a given mode of a parameter. This is 1 reason for the Testing checklist guideline i.e. codec vendors should know whether their implementation fails with a standard test application before they go to production to avoid customer pains.

9. Preserving all important codec performance in any framework via link.xdt

The first question is "what is link.xdt?" This is basically a linker contribution from a component (e.g. codec) perspective. It doesn't need to be called link.xdt but that's the most typical filename you'll see in codec packages.

This is quite clever - the linker is typically a flat system-integrator step - i.e. if you have 10 codecs in a system the typical flow was always for codec vendors to document "please add these N lines into your final linker command file so that you get the best performance". With link.xdt the contribution can truly come from the codec-side. Its basically a template file that gets expanded by RTSC tooling at the final link to flesh out the actual MEMORY assignments on the platform that your system is using.

In its simplest form this file might contain e.g.

SECTIONS {
    .text:MODULE_VENDOR_cSect1 {
        *(.text:_MODULE_VENDOR_func_abc)
        *(.text:_MODULE_VENDOR_func_def)
    } > `this.MODULE.codeSection`

    .far:MODULE_VENDOR_uSect1 {
        *(.params)
    } > `this.MODULE.dataSection`
}

This helps because it avoids potential crashes - the .params section is not prefixed with a standard compiler section prefix (.far, .text, .const etc) hence its linker allocation is arbitrary unless we 'wrap' it as above. Without the wrapping the .params data section might get placed in Program-only memory (e.g. L1PSRAM) and crash the system. Complete details on this problem can be found in this article.

However a more frequent and subtle cause of problems is forgetting to include the appropriate link.xdt placement directives for cache performance. This is often seen in video algorithms which are very sensitive to L1P cache conflicts. 'Forgetting' the link.xdt can cause as much as a 10% performance hit.

To ensure performance is retained when integrated into the application / combo / server, something similar to the following needs to be supplied: -

SECTIONS
{
        /* need 128 byte alignment to preserve performance */
        .const:MODULE_VENDOR_tables > `this.MODULE.dataSection`, align = 128

   GROUP :
	{
	/* Functions in call order - worst case path - L1P cache analysis */
	.text:_MODULE_VENDOR_func_abc
	.text:_MODULE_VENDOR_func_def
	.text:_MODULE_VENDOR_func_ghi
	} > `this.MODULE.codeSection`
}

Observe that: -

  • we can specify the alignment in this codec component contribution and get it to 'stick' in the final link.
  • we abstract out the MEMORY names i.e. instead of hardwiring MEMORY names like 'DDR2' or 'IRAM' we use `this.MODULE.dataSection` - the system integrator then does the memory assignment at the configuration step.
  • and most important of all, cache conflicts are minimized because we preserve the carefully placed function ordering that the codec author specified. The GROUP directive ensures this. Another way to preserve such ordering is simply to put all of the input sections into a larger output section as we did with .text:MODULE_VENDOR_cSect1 above.

As an aside, you can find information on how to optimize for cache here. You can see how to create partially linked algorithms to enable clever code placement here.

The bottom line is that it is essential that (a) link.xdt gets included in the final system if the codec vendor supplies such a file, (b) supplying a link.xdt can have a major performance benefit on some codecs.

10. Testing

In a Codec Engine / DVSDK context, the codec should be tested with 1 or more of the following: -

  • DMAI Sample Application.
    • For example video_decode_io2 could be used to test e.g. an H.264 or an MPEG4 (etc) decoder that implements the IVIDDEC2 interface. The io refers to the fact that file I/O is used in the test app as opposed to a video driver thus making it simpler to test. The example application suite is quite complete for the set of VISA interfaces that exist. For example audio_encode_io performs a file I/O test on an IAUDENC codec (e.g. AAC Encode) whilst sample app audio_encode1 uses the audio input driver and encodes based on IAUDENC1.
    • Any required changes to the Sample Application should be documented in the release notes.
  • Digital_Video_Test_Bench_(DVTB)
    • The DVTB test bench is most valuable in its scripting capability. Hence if you are a 3rd party and have multiple codecs you should set up .dvs scripts to validate a set of codecs.
    • DVTB allows setting of individual parameters so the same codec could be tested in a variety of different configurations with zero changes to the code.
    • 1 point to note is that DVTB is not a recommended codebase for application design. It is designed strictly as a testbench - it does not consider the various threading aspects necessary to construct a real-time system.
  • DVSDK demos
    • It may require a little more work to add a new codec to the DVSDK demos as outlined here. However the benefit is that the demos are designed to be starterware for customer applications. So if your codec works well in this environment it will likely work well in most 'real' applications.
  • Gstreamer
    • TI's Gstreamer port is directly built on top of Codec Engine and DMAI. Gstreamer does full A/V sync and also has open-source plugins for a variety of operations such as network streaming. Like the demos, the benefit here is the 'testing in a real world environment'. The TI gstreamer port is not a test application - it is an A/V framework implementation used in 100s of TI customer applications. Hence if you are able to plug in your codec to the underlying codec-combo, add a mime-type (if required), and create a pipeline proving your codec works in this framework then it provides customers with a high degree of confidence.