GStreamer Plug-in 2.x Design

From Texas Instruments Wiki
Jump to: navigation, search

Contents

Introduction

Code Availability

Most of these features are available now using software on SVN BRANCH_DDOMPE. The software has not been reviewed and tested as well as software on the SVN trunk.

A patch for OpenEmbedded was submitted here. This patch will not be accepted on OE since the preferred path is to merge the branch into trunk instead of supporting the branch on OE. You can use the recipe on your local build on mean time that happens.

Support

RidgeRun provides professional support for BRANCH_DDOMPE. The features found in BRANCH_DDOMPE were funded by request from various companies. If your company requires assistance with the TI DMAI GStreamer plug-in, please contact RidgeRun. Right

Background

The GStreamer TI Plugin is an open source effort started by Texas Instruments (TI) with the objective of providing a solution for synchronized audio and video hardware accelerated recording and playback for Linux™ running on TI's SystemOnChips (SoCs).

TI has a long history of offering a set of frameworks that enables the use of hardware-accelerated video and audio codecs in their SoCs, for example Codec Engine, DMAI, and DSPLink. The GStreamer TI Plugin builds upon this foundation, particularly over DMAI, to provide the required underlying functionality for GStreamer elements.

RidgeRun has joined TI's effort stepping up to provide development and testing resources as well as professional services for TI's customers requiring expertise on GStreamer and companion technologies.

This document outlines the functional design of version 2.x of the plugin, which provides a new set of functionalities and design changes against the 1.x version, based on customer feedback and experience gathered by RidgeRun. Much of the design detailed in the following sections has been implemented on DDOMPE's branch in the project version control system repository at the time of this writing, and is being used in shipping products. The final implementation of the 2.x plugin will be done on a separated branch in the version control system based on the design detailed on this document.

Rationale

The GStreamer TI plugin was originally intended as a way to provide a software platform for synchronized audio/video playback using TI's Codec Engine and DMAI (since neither of these frameworks provides lip-sync capabilities). Another driving goal was that the same plugin source could be built to run on several TI platforms by utilizing the DMAI framework. Once this objective was accomplished, the community using the plugins started looking into supporting other GStreamer functionality that were not considered by the original design. Desired functionality enhancements includes trick-play modes, zero-memcpy encoding and playback, dual encoding, video zooming and panning, etc. Some companies contracted with RidgeRun to enhance the plugin, adding these capabilities. RidgeRun has implemented the new features on a separate branch, which is used as the 2.x code base.

The new plugin changes its revision number (from 1.x to 2.x) for several reasons:

  • Element naming and usage is different from 1.x, causing incompatibilities.
  • Element internal design and memory requirements changed.
  • The code is heavily re-factored to improve orthogonality.
  • The build system is different.

Design Details

Migration path from 1.x to 2.x

Since the release 1.01 of the plugin, the main reference point for its use has been the examples page. For the 2.x release there will be a page in the wiki, explaining the usage for all the previously documented cases and pointing out any special details required for the new elements.

General Design

The 2.x plugin design follows the GStreamer design guidelines (opposed to the 1.x plugin that followed the DMAI guidelines). Other unique requirements that have arisen as GStreamer was used in production embedded devices, like explicit thread priority control, are also addressed.

Simplifying Customization

The 1.x plugin is extensively tested against the default codecs provided for each supported platform. It is however the exception and not the rule that a TI customer will use the default codec server for several reasons:

  • The codec servers have build time assigned memory locations, making re-compilation of the codec server necessary for supporting the customer's hardware design.
  • The customer may require algorithms not provided in the default codec server.
  • The customer doesn't require algorithms provided in the default codec server, and wants them removed.

The new plugin is still tested against the default codec servers, but will simplify and document the process of modifying the existing code to use a different codec server, trying to minimize the amount of changes required in the process. This will also greatly simplify the task of supporting new platforms.

The procedure to add support for a custom codec server for the 2.x plugin will be documented in the project wiki.


Another aspect included in the new plugins is run-time detection of codecs available in the server. When the plugin is executed, it won't show elements that aren't supported by the existing codec server.

Code Orthogonality

The 2.x plugin code layout attempts to maximize code orthogonality: creating independent pieces of code that can be re-used in many cases. This is accomplished by:

  • Re-factoring the code of the 1.x branch to consolidate nearly identical code.
  • Coding new functionality with re-usability in mind. For example the new code for extended codec argument handling uses structures with function pointers to simplify adding new arguments.

The rule of thumb is: if you are writing the same code more than once, then your code is a good candidate for re-factoring.

We also avoiding conditional compiling as much as possible and instead rely on run-time detection of specific features. For example instead of using an \#ifdef condition for setting the colorSpace type for DM6467, we add code that picks the colorSpace from the plugin caps (and the caps reflect the right DM6467 colorSpace using conditional compiling only once).

Coding Standard

The 2.x plugins uses the standardized GStreamer indentation, as validated by the gst-indent script provided by the source code of the GStreamer package.

Build system

The 2.x plugin removes building the so-called open source components of the project (which is currently responsible for cross-compiling the gstreamer target platform binaries). The 2.x branch will hold in the control version system only the source code of the plugin and not the build system for it (following the pattern used by most open source projects). The build system used to validate the plugin will be the Arago project.

The project's website will provide a tarball containing pre-built GStreamer binaries that can be overlayed on top of a the target file system. Instructions on how that file system was built using arago will be available here.

The build instructions will need to clearly document the work flow for downloading the plugin, building it, making a change, rebuilding it, submitting the change.

Run time behavior

  • The 2.x plugin will avoid using environment variables for selecting run-time options as much as possible. Using environment variables is problematic for development proposes since it makes the process of reproducing issues much more difficult. We have found cases where customers couldn't get something to work, or where we couldn't reproduce an issue because there was/wasn't a environment variable set.
  • The plugin uses run time codec detection.
  • The element properties use optimized values by default, falling back to more conservative values as needed (printing visible warnings on the screen). For example, by default the element doing memory copies should attempt to do hardware accelerated memory copies but shouldn't panic in case it can't use the hardware accelerator, instead falling back to software memcpy and printing a warning.
  • Errors or warning in the elements should be reported using the GST_ELEMENT_ERROR() or GST_ELEMENT_WARNING() instead of GST_ERROR() and GST_WARNING(). The former generate error messages that are passed down the pipeline, allowing the application to identify the error and get the string of the error message (for example, by default gst-launch will display such messages). The GST_ERROR() or GST_WARNING() calls only generate messages in the stdout when the proper gst debug level is enabled, and those errors can't be catch by the application creating the pipeline.
  • Some existing elements in the 1.x branch may crash or exit without reporting any useful message. To improve supportability, the user should be able to control the message output for all abnormal behavior.

Element naming

The 2.x elements are going to be named following this convention: ti_codecenc and ti_codecdec

For example, if the codec is h264, and the operation is encoding: ti_h264enc.

Other elements that are not encoders or decoders will use custom names prefixed by the atom ti:

Elements that aren't specific to TI platforms but belong to the project while accepted on mainstream won't use the ti prefix. For example:

Codec Engine Integration

Codec Servers Management

  • The codec server will be expected by default at the path /usr/share/ti/codec-combo/, instead of the current working directory. This default path could be changed at build time using a configure option instead of a environment variable as is currently done in the 1.x branch.
  • The codec server name for encoding and decoding could be overwritten at build time setting a variable in the configure system. This procedure will be documented in the project wiki as part of the custom codec server setup documentation.

Buffer management

  • The 2.x branch will try to minimize as much as possible the CMEM memory footprint, since it's a vital resource on very constrained embedded systems.
  • For audio streams the size of the input/output buffers will be allocated by querying the codec about the minimum input/output sizes.
  • For video/image streams the elements will identify the capabilities of the stream being processed and allocate the minimum memory required. For example if the stream is at CIF resolution (320x240), then the element will allocate CMEM buffers required for that size of video encoding/decoding. This requires that all streams going into the elements provides capabilities with information regarding width and height, otherwise the video elements will default to querying the codec for the default input/output buffers sizes, wasting memory. If the capabilities lack the required information to calculate the buffer size, the element will fallback to query the codec for the minimum buffer size supported and use that.

In the 1.x branch the video/image decoders didn't require height and width capabilities from the input stream, since it allocated the size indicated by the codec and then re-sized the input buffers once the real size had been identified. However this approach wastes memory during the bootstrap without providing a way to work around it. Since there are already parsers implemented for all of the video types currently supported by the plugin, we rely on using parser elements when decoding elementary streams from a file, or in the worst case capabilities may be manually inserted with a caps filter. This usage model will be documented in the new 2.x plugin example pipelines page. More details about the behavior of the parsers could be found on the section regarding input buffer management for the decoders.

Handling of CMEM buffers inside gstreamer

  • The 2.x plugin maintains and extends the dmai transport buffer object. Dmai transport buffers are used as the exclusive way to identify when a single memory buffer is contiguous in physical memory. Therefore all of the contiguousInputBuffer properties has been removed.
  • Dmai transport buffers are extend to support an optional call back release function. This functionality is required by the new design of some of the elements: the encoders and the display sink.
  • The 2.x elements that require contiguous input buffers will identify if the incoming buffers are dmai transport buffers and use them as such, or otherwise internally move the data into CMEM memory (details on specific examples for encoder, decoders and transform elements are provided in the sections below).
  • There is a new element tiaccel which is a transform element that receives buffers and performs the following actions:
    • If an input buffer is a dmai transport buffer already, leave it untouched.
    • If an input buffer is a standard gst buffer, take the buffer pointer and call Memory_getBufferPhysicalAddress() to identify if the buffer is potentially a contiguous memory buffer (like the ones provided by the v4l2src when the property always-copy is false). If the buffer is contiguous in memory, wrap into a dmai buffer transport object and pass it downstream, otherwise do a memcpy of the data into a CMEM buffer (from an internal buffer tab) and create a dmai transport buffer object for the new buffer.
      • It is well documented in the CMEM API that CMEM_getPhys(), which is used by Memory_getBufferPhysicalAddress(), can't translate all the virtual address to physical addresses. Particularly CMEM doesn't handle non-direct-mapped kernel addresses except the ones that correspond to CMEM's managed memory block(s). This is not a problem for the logic of the tiaccel element, since the API doesn't generate false positives when identifying contiguous memory buffers, but may generate false negatives. The effect of a false negative in the tiaccel means the element will copy the data into CMEM buffer unnecessarily (since the input buffer was already contiguous on memory). This behavior will be well documented in the wiki and example pipelines, and the element will display a warning when using input buffers that aren't contiguous in memory.
      • It's important to note that the only scenario documented so far where tiaccel element is used to wrap contiguous memory buffers into dmai buffer transport object is while encoding video from v4l2src, and for this case the CMEM API properly identifies the pointers from v4l2src as contiguous memory buffers.
    • If the property hwAccel is set to true, try the DMAI hardware accelerated frame copy API. The hwAccel property defaults to true. If hardware acceleration isn't available, fall back to software memcpy and display a warning.

The design rationale of tiaccel is to provide a separate gstreamer element for performing memory copies outside encoder/decoder/resizer elements (which is desirable on multiple stream scenarios, because any memory copies are made only once instead of for every element using CMEM buffers). Having an element perform the memory copies independently is necessary as well in order to have full control of thread prioritization in the pipeline.

Thread prioritization and handling

There are several different possible usage scenarios for elements, each one with specific requirements regarding real time behavior. The elements should be flexible enough to provide full control of the threading behavior, ultimately allowing users to achieve their real time goals. Initially, thread control appears to add unnecessary complexity, however in many practical cases this approach is far simpler than the alternatives (like using a home grown implementation tuned to a specific use case).

The 1.x branch elements were designed by embedding a real time thread inside those elements responsible for receiving input buffers and sending the buffers to hardware accelerators or the DSP for processing. The rationale behind this logic is to parallelize the data processing by using specialized hardware along with the memory copies into the CMEM input buffers.

The 2.x branch removes all the intra-element threading and instead provides the same functionality by allowing new elements to be included in the pipeline. The main rationale for removing the internal threading was to reduce design complexity, while simplifying the implementation of certain asynchronous functions such as flushing. Also the new design provides a more flexible control allowing developers to perform optimizations that were not possible before (like tuning the priorities of different thread segments in the pipe) or allowing the developers to remove the thread prioritizatoin solution if the extra control is not required for their use case.

There is a new priority element introduced in the 2.x branch which is responsible for providing control properties for the priority and scheduling of the thread segment at the point in the pipeline where the priority element occurs. All the 2.x elements are single-threaded.

The priority element provides the following properties:

  • nice: allows to add to the nice level of the thread following the same behavior of the nice application in the Linux shell.
  • scheduler: allows to select the scheduler to use: OTHER, RT FIFO, or RT RoundRobin (see man page for sched_setscheduler()).
  • rtpriority: when using an RT scheduler, sets the value of the real time priority.

The idea behind this design is based on the following lessons learned from the 1.x branch:

  • Having real time threads embedded in the elements makes it harder to tune the system for specific scenarios. For example video encoding where no frame drops are allowed, combined with the system being under heavy load, and other real time processes are executed along with GStreamer.
  • Having real time threads embedded in the elements doesn't work if the code can't execute as root for security reasons. In the 1.x design the only work-around was disabling the real time threads, and the code was tested and designed with the assumption that those threads were used.
  • The same behavior and performance of the 1.x branch regarding parallelized data copy and processing can be achieved using standard GStreamer threads and the priority element while providing a more flexible approach. The data copy of incoming non-CMEM buffers can be performed in the tiaccel element in a separate thread (thanks to the queue element) since the data processing element can be scheduled in real time using priority.

The priority element works along with the standard queue element to control which threads will receive the right priority. For example, let's consider the scenario of NTSC video capturing and encoding without frame drops. In this case we have the following requirements:

  1. The video source element needs to be scheduled to run on average less than every 33ms, so we require this element to have the highest priority.
  2. The encoder element thread must have the second highest priority as any encoder CPU starvation will cause the video source to run out of buffers (since the encoder will not consume, and thus return, the buffers output by the NTSC source at a rate fast enough).
  3. The multiplexer and file sink have lower priorities since there is enough buffering capacity from the encoder element to avoid being hit by scheduler delays (see output buffer management section).

The figure below provides the detail of where we locate the queues in the pipeline for creating three separate threads. Notice that we use the tiaccel element to transform the buffers output by v4l2src into dmai transport buffers, so there is no memcpy performed in the input data at all but instead the video capture buffers are processed directly by the encoder element.

Threads image

In this example, we can add a priority somewhere in the portion of the pipeline where separate threads are executing, allowing us to raise or lower the priority (or change the scheduler) of the thread where the priority element is executed. Figure bellow shows the location of the priority element in the example pipeline in order to achieve the appropriate control we outline before. Notice that the priority could be placed at different points in the pipeline as well to achieve the same result, just as long as there is a priority for each thread.


Threads Prioritization image

Integration with playbin/decodebin

A downside of using a separate thread priority element and removing the internal threading of the decoder is that when the decoder is used with playbin or decodebin, it is possible that there won't be parallelization of the decoding process. However there are two approaches to solve this problem:

  • Do no solve it. Tests with the current DDOMPE's branch show that for most usage cases there is not measurable overhead (compared with the 1.x releases) for the decoding scenario by not parallelizing the decoding and the memory copy of the input buffers on most platforms.
  • Create a new element that encapsulates a tiaccel, a queue, a priority and any given ti_codecdec}. If this element uses a rank equal or better than the standard decoders, playbin/decodebin will pick it up for decoding. Such plugin won't be included in the 2.0 release, but the solution will be documented in the wiki. If the community contributes an implementation, it could be included for future releases.

Color Space Conversion

The 2.x branch introduces an element for color space conversion named ticolorspace (following the well-known element ffmpegcolorspace). The default behavior is for ticolorspace to use DMAI Ccv module.

RidgeRun is working on creating an open source IUniversal Ccv module, and this element may be added as a future capability for run-time detection of available codecs to use this module instead of the DMAI API. However this functionality is not scheduled for the 2.0 release.

Video/Image/Audio Encoding

The 2.x branch merges into a single code base the functionality of the video, image and audio encoders, given that most of the infrastructure for handling of the input/output buffers is the same for the encoding scenarios. Function calls for specific processing APIs from the Codec Engine are handled with function pointer tables. This approach presents several advantages:

  • Bug fixes and improvements are shared by all the encoders, while logic specific for each codec type (video, image or audio) is minimized.
  • Buffer handling for the encoding scenarios is shared by all the elements.

Encoder elements design overview

The general design of the encoder elements is presented in figure below. The main considerations used for this design are:

  • Data flow: The encoder elements receive uncompressed buffers to the sink pad and output compressed data, so the design should focus on minimizing data copies or overhead in the processing of incoming data.
  • Memory requirements: There are two scenarios that require efficient output memory buffer allocation:
  1. Some muxers require queuing several buffers before they start freeing them.
  2. Some devices encode continuously but don't release the output buffers immediately, instead queuing them for some seconds and later discarding unless there is some external event (for example surveillance cameras that are trigger-activated). In this scenario, its ideal to avoid output buffer memcpy operations to minimize CPU overhead and improve battery life.

For these reasons, the output buffers from the encoder should be as small as possible, and use the CMEM memory as efficiently as possible.

The main encoder data structures are:

  • Input GstAdapter: a structure that receives input buffers and potentially merges them to provide enough encoding data to the codecs. For detailed design see the input buffer management section.
  • Output CMEM buffer: a specialized CMEM buffer for outputing encoded buffers without wasting memory and avoiding memcpy of the output data. For detailed design see the output buffer management section.
Encoder elements design

Resource management

The encoder elements will follow this protocol for handling resources during the different element states, in accordance with gstreamer plugin guidelines:

  • During the state change from NULL to READY, the engine will be open, but no stream-specific resources are allocated.
  • During the state change from READY to PAUSED, no operation is done, since we need to have the stream capabilities information to allocate the right memory for the stream.
  • During the first call to the setcaps function of the sink pad, the element will define memory requirements for input and output buffers and proceed to open the codec. Buffers won't be allocated yet, since it is possible that upstream element provide dmai transport buffers already.
  • During the chain function, input and output buffers will be allocated the first time they are required.
  • During the state change from READY to NULL, all allocated CMEM resources are released, and the codec and engine are close.

Since the state_change function is asynchronous to the processing thread of GStreamer, the CMEM and edma library is required to have the capability of releasing resources from a thread different to that where they were allocated from. Some versions of the CMEM and edma libraries don't have this ability (this was a problem introduced on linuxutils 2.23.01, but fixed on revision 2.25), so a version of the DVSDK for the respective platform with an updated package may be required.

Input buffer management

The handling of the incoming buffers to the encoder is done with the following rules:

  • Buffers are checked upon arrival to identify if they are dmai transport buffers (for example in case those are buffers coming from a video source and passing through the tiaccel) or not (for example reading an elementary stream with a filesrc).
  • If the incoming buffer is a dmai transport and the size is large enough for the encode operation, then the buffer is immediately passed to the processing function (assuming this is already a full frame).
  • If the incoming buffer wasn't a dmai transport, then the data is pushed into the GstAdapter. If the GstAdapter is empty, the timestamp of the buffer being pushed is stored in a variable.
  • If the GstAdapter has enough data for a full input buffer for the encoder, this data is copied into an internal CMEM buffer and passed to the processing function. Again, a hardware accelerator is used to perform the copy, if such an accelerator is available.
  • In the case the buffer passed to the encoder is not fully consumed, just the consumed data is flushed from the adapter and the unprocessed data will be resend for encoding in the next call to the encoder. The chain() function of the element does a while sending data to the encoder while there is enough data in the adapter.

In the case of audio streams, the minimal amount of data to be processed is defined by the codec, so buffers are accumulated by the adapter until there is enough data to be passed to the encoder. However the timestamps and duration of the buffers need to be calculated based on capabilities of the stream: depth, sample rate, and channels. If the audio input stream doesn't provide this capability, then the element panics due to the lack of timestamps.

Output buffer management

A single CMEM output buffer is allocated for the resulting encoded buffers. By default this buffer is 3 times the size of the input buffers passed by the input adapter to the encoder, but the size could be controlled with an element property that defines how many times the size should be.

The function of this output buffer is to contain several slices of memory that are the resulting encoded data from the encoder. This slices are created dynamically from split and merge free memory slices. At the initialization of the element, the output buffer is a single free memory slice. The element maintains a list of free and used memory slices for the output buffer.

The output buffer has the following behavior:

  • Every time a memory slice is needed for output data from the encoder, the element search for a free memory slice in the output buffer with a size equal or bigger to the input buffer passed by the adapter. If there isn't a free memory slice available of the required size, the element will sleep until there is one and generate a warning message. If the free memory slice is available, then a memory slice equal to the input buffer passed by the input adapter is marked as used and inserted into the list of used memory slices, and the corresponding amount of memory is removed from the free memory list.
  • Once the encoded frame is produced, the element obtains the amount of memory actually used, and the memory unused. The unused memory is merged back into the free memory list.
  • Output buffers are pushed downstream through the src pad. These are dmai transport buffers set with a special release callback function. Once the buffer is released, its release callback function merges the freed memory back into the list of free memory slices (potentially merging contiguous free memory areas into a new and bigger slice) and remove it from the list of used memory slices.

This design is optimized to avoid memcpys and maximize the utilization of CMEM buffers. If there is a need for preventing the encoder from going to sleep when running out of output memory, there is a boolean property named copyOutput that will instruct the element to allocate standard GStreamer buffers and memcpy the encoded data in them for output on the src pad (releasing the CMEM buffers immediately after created).

The following diagram shows an example of the lists of free and used buffers inside the CMEM output buffer. In this scenario there are three buffers in use by downstream elements, splitting the CMEM output buffer in two free areas.

EncoderBuffer.png

Let's assume the third buffer used is released by a downstream element. In this scenario the last node of the used buffer list will be removed, and the memory area merged with the adjacent free memories. During the merge operation the two existing free memory areas will be now unified into a single free memory area, triggering the removal of the last node in the free buffer list. The following diagram shows the state after the third used buffer is released and the free memory areas merged.

EncoderBuffer2.png

Output buffer transformations

In some data streams it may be required to perform data transformations on the output from the encoder before send it downstream. For example, the typical h264 encoder generates byte stream h264 format, but this needs to be transformed into "packetized" stream (removing SPS/PPS NALUs from the stream and embedding them into codec_data buffer, and exchanging the NALU headers from other NAL types by size headers) in order to send this format into a container like quicktime or mp4 file format.

To support this feature and the codec_data generation (see next section), there is an optional structure with callback operations that can be registered with the encoder. This structure is different for each kind of data stream. For example in the h264 encoder the stream structure registers the callback that perform data transformation for h264 bytestream into packetized stream.

Codec Data generation

Some data streams (like h264, mpeg4, aac), require the generation of the "codec_data" buffer to be passed off band (in the caps of the sink pad), in order to be properly muxed into container formats like mp4 or avi.

To generate the codec_data buffer the encoder checks if there is a codec_data generation function registered in the structure with callback operations for this data stream (see previous section), and call it passing the output of the first encoded buffer. The codec_data generation buffer is only called once, with the first buffer, since this is typically the one containing the required headers to generate the codec_data buffer.

Extended arguments

The standard XDM APIs for encoders lack most of the basic quality control features required for fine tuning the encoder algorithms. 2.x plugin encoder elements support optional codec specific extensions and control their extended arguments using GStreamer element properties.

Codec specific properties are supported using a table of function pointers and other parameters required to provide the custom codec control. The members of this structure are:

  • srcCaps: custom static caps definition reflecting the real capabilities of the codec. For example limiting the maximum supported height and width.
  • sinkCaps: custom static caps definition reflecting the real capabilities of the codec. For example defining the color spaces supported for codec output buffers.
  • setup_params(): function responsible for allocating the static and dynamic parameter arguments for codec creation. This function is expected to setup codec default parameters.
  • set_codec_caps(): function that is called once the caps of the stream have been defined. This is used for setting values in the extended arguments that were unknown at the time of the codec creation (since the caps had not been set at that point).
  • install_properties(): function that is called to allow the extend args to install custom GStreamer element properties controlling the extended arguments.
  • get_property(): function to get values corresponding to the extended properties installed by the install_properties() function.
  • set_property(): function to set values corresponding to the extended properties installed by the install_properties() function.
  • max_samples: integer value used for certain audio encoders that has limits on the number of audio samples that can be processed, but not reported in the XDM functionality.

These function definitions will be documented at the wiki site, along with a how-to describing the process of adding support for extended arguments for other algorithms. Future enhancements will be documented in the wiki as well.

Timestamping

The encoder elements preserves the timestamps provided by the incoming buffers. This is achieved with two separate procedures depending on the stream type:

  • For video and image streams, the input buffers are processed immediately (without going through the GstAdapter), so their timestamps are put directly in the output buffers.
  • For audio streams, the input buffers may be merged together in the GstAdapter, so the encoder needs to keep track of the timestamp of the buffer on the start of the adapter, and calculate the time lapse for the amount of data pushed into the encoder. For this calculations the information regarding the number of channels, sample rate and sample depth is required in the input caps. The element would panic if any of that information is missing from the caps.

Pixel aspect ratio

Current XDM APIs doesn't define support for non-square pixel encoding, so any support for this is left to extended codec parameters if a particular codec supports it. However the video / image elements may support pixel aspect ratio in future custom functionality for specific codec types (like h264 and mpeg4). This involves modifications in the headers to insert proper pixel aspect ratio information, but this feature is not scheduled for 2.0 release.

Video/Image/Audio Decoding

As with the encoder elements, the 2.x branch merges into a single code base the functionality of the video, image and audio decoders, given that most of the infrastructure for handling of the input/output buffers is the same for the decoding scenarios. Function calls for specific processing APIs from the Codec Engine are handled with function pointer tables.

Decoder elements design overview

The general design of the decoder elements is presented in figure below. The main considerations used for this design are:

  • Data flow: The encoder elements receive compressed buffers to the sink pad and output uncompressed data, so the design should focus on minimizing data copies or overhead in the processing of output data.
  • Data parsing:
  • Trick modes support:

The main encoder data structures are:

  • Parser:
  • Input buffer:
  • Output buffer array:

Resource management

Input buffer management

Parsers

Output buffer management

Extended arguments

Support for extended arguments in decoders is not implemented since most decoders doesn't require them to work properly. Support for decoders extended arguments could be added in the future using the same design of the extended arguments for encoders.

Flush handling

QoS handling

Clipping

Timestamping

Reverse Playback

Video Transcoding

Transcoding is the operation for transforming from one encoded format into another, and it usually involves the operation of a decoder and an encoder.

Transcoding speed is the more important factor, and to achieve the maximum possible throughput, two considerations need to be done:

  • Minimize memory copies, since these are slow operations for ARM cores.
  • Enable parallelism of the encoder/decoder operations on the DSP and the ARM operations.

Since decoders generates CMEM transport buffers, these can be used directly by the encoders without having to perform any memory copies operations. No extra consideration is required to avoid memory copies.

To achieve the proper parallelism between the encoders and decoders, a queue is required between them when creating the transcoding pipeline.

Example pipelines for transcoding will be documented on the wiki.

Video Resizing

The 2.x plugin provides two different elements for video resizing:

  • tivideoscale (formerly TIVidResize): this element behaves similar to the standard gstreamer videoscale element, where the output resolution is determined by the output caps, and the resizing operation is performed by the hardware acceleration.
  • tiresizer: this is a new element introduced in the 2.x plugin, that behaves differently from the tivideoscale and provide access to a full range of functionality available thanks to the hardware accelerators:
    • The output resolution is determined by properties of the element. The max width and height can be specified to configure the max output buffer size that is intended.
    • Zooming and panning is posible through the use of properties.
    • Pixel aspect ratio correction and letter boxing is supported.
    • Color space conversion is supported.

Pixel Aspect Ratio handling

The tiresizer element provides a boolean property that enables pixel aspect ratio normalization to 1/1 in case the input pixel aspect ratio in the caps is not 1/1.

For example if the source is an NTSC signal with PAR 4/3 at 720x480 pixels, when the par normalization feature is enabled the output buffer will be of 960x480.

The par normalization is overwritten if an specific output with and height is requested on the element properties.

Letter Boxing

The tiresizer element provides a boolean property that enables keeping the picture aspect ratio (also known as letter boxing). Any letter boxing is done after applying pixel aspect ratio normalization if it is enabled.

Color Space Conversion

Some hardware resizers provide functionality for color space conversion. This functionality will be available in the tiresizer and tivideoscale elements in the output caps to reflect the available color formats.

The supported color space conversions will be documented on the wiki.

Video Rendering

Video Capturing

Testing Design

The testing focus is on proper GStreamer TI Plugin operation, with the goal to make it possible to automate as much of the testing as possible given the available resources.. The surrounding components are assumed to be defect free, meaning any testing of surrounding components happens as a side effect of testing the GStreamer TI Plugin. The surrounding components include other GStreamer plugins, DMAI, Codec Engine, DSPLink and the codec server running on the DSP (if part of the system).
Each test is run on a host computer, connected to the unit under test via serial connection and in some cases can require a network connection as well. As with most test cases, (1) the unit under test is put in a known state, (2) the test input is provided, and (3) the results are compared against known expected results.

Feature Analysis

The GStreamer TI Plugin external interfaces that can be tested in a deterministic manner include:

  • Consumption of correct data passed to sink pad
  • Production of correct data received from src pad
  • Proper handling of GStreamer pipeline events
  • Proper handling of dynamic changes to element properties

There are many other aspects of the GStreamer TI Plugin that could be tested as well:

  • Endurance testing – letting a pipeline, such as network streaming video from a camera, run for many weeks
  • Abusive testing – bombarding element with pipeline events or property changes
  • Interaction testing – test proper operation with all the existing GStreamer elements.

In addition, key aspects of the GStreamer TI Plugin could be parameterized:

  • ARM processor load
  • DSP load (if applicable)
  • Memory utilization

GStreamer TI Plugin testing exercises the documented external interfaces of each element.

Testing Tools

Previously the primary testing tool was the gst-launch command. Test pipelines were created using gst-launch and manual interaction verified the pipeline ran to completion without error. To move toward more automated testing, attributes of executing a test are compared to expected results. The downside is any data generated by the GStreamer TI Plugin could be bogus and the test would pass. Human interaction to monitor the generated audio and video is still required.

GStreamer Daemon – gstd

To extend the concept of gst-launch, GStreamer daemon was created. In addition to executing pipelines described using a simple text string notation, GStreamer daemon allows control of the pipeline state and dynamically changing an element's properties while the pipeline is playing. The two key pieces of the GStreamer daemon include gstd, the actual daemon that responds to d-bus messages and gst-client, similar to gst-launch, which can send d-bus messages to gstd, and to report any d-bus messages produced by gstd.

Perl Expect

Each test case is written in Perl utilizing the Perl Expect module. The expect testing paradigm is used since it allows for either local or remote control of the testing sequence. By local, we mean run the test and the test execution on the unit under test and by remote we mean run the test on the unit under test and the test execution on a desktop computer.

The other big advantage of Perl is the create of individual test cases can be greatly simplified by using a Perl module specific to the testing of the GStreamer TI Plugin. Additional Perl subroutines can be defined to refactor code commonly found in individual tests. Here is a simple (not real) test case comparison. In the 1.x version of the GStreamer TI Plugin test suite, one of the test cases for the DM6467 audio decode testing is:

PIPELINE=filesrc location=/opt/media_files/davincieffect_HEv2.aac ! ti_aacdec ! tiperf
 engine-name=decode print-arm-load=TRUE ! alsasink
gst-launch --gst-debug-no-color --gst-debug=TI*:3 $PIPELINE 

This test verifies AAC encoded audio data is decoded and plays correctly without any obvious errors. As an example, the test case can be expressed using Perl Expect and GStreamer TI Plugin specific subroutines as:

p=gtp_init($pipeline);

gtp_expect_avg_arm_load(p,12);

gtp_expect_avg_dsp_load(p,25);

gtp_run(p);

gtp_fini(p);

where the output generated by –gst-debug=TI*:3 and dmaiperf is checked to make sure the no error are reported and the average ARM and DSP loads don't exceed the specified values. To test the AAC decode function handles pipeline events correctly, an example test case could be:

p=gtp_init($pipeline);

gtp_run(p);
$i=10;
while ($i > 0) {

  msleep(100);
 
  gst_pause(p);
 
  msleep(10);
 
  gst_resume(p);

} 

gtp_fini(p);

Of course it would be likely that a gtp_pause_resume() subroutine would be added instead of including the above while loop in many tests.

Test Categorization

Pipeline Data Consumption and Production Tests

The existing gst-launch based test suite that has been used on the 1.x version of the GStreamer TI Plugin generally focused on correct pipeline operation as perceived by a human operator. Indications used identify improper operation include unexpected error messages, pipelines that never start, and pipelines that never finish.
With a human operator it was also possible to also detect garbled or incorrectly timed audio or video. These issues are more difficult detect with automated tests.
The focus of these tests is proper operation in static conditions. Tests should be generated to cover the following important cases:

  • Common input settings. For example, if a video encoder on a particular platform can receive video in different frame sizes depending on which CMOS sensor is installed, then each of those resolutions should be tested.

Tests for Known and Expected Pipelines

Exhaustive testing of all possible inputs and settings for an element is not an objective of this test suite. Instead, the focus is on verifying pipelines work for the known and anticipated use models. The expected use models will vary based on the capabilities of the processor.

Dynamic Pipeline Tests

The GStreamer pipeline state will be changed and the resultant element behavior monitored for expected operation.

Dynamic Element Property Tests

Element properties than can be modified while the pipeline exists are changed and the resultant element behavior monitored for expected operation.