GStreamer Plug-in 2.x Design

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search
Translate this page to   

Contents

Introduction

Code Availability

Most of these features are available now using software on SVN BRANCH_DDOMPE. The software has not been reviewed and tested as well as software on the SVN trunk.

A patch for OpenEmbedded was submitted here. This patch will not be accepted on OE since the preferred path is to merge the branch into trunk instead of supporting the branch on OE. You can use the recipe on your local build on mean time that happens.

Support

RidgeRun provides professional support for BRANCH_DDOMPE. The features found in BRANCH_DDOMPE were funded by request from various companies. If your company requires assistance with the TI DMAI GStreamer plug-in, please contact RidgeRun. Right

Background

The GStreamer TI Plugin is an open source effort started by Texas Instruments (TI) with the objective of providing a solution for synchronized audio and video hardware accelerated recording and playback for Linux™ running on TI's SystemOnChips (SoCs).

TI has a long history of offering a set of frameworks that enables the use of hardware-accelerated video and audio codecs in their SoCs, for example Codec Engine, DMAI, and DSPLink. The GStreamer TI Plugin builds upon this foundation, particularly over DMAI, to provide the required underlying functionality for GStreamer elements.

RidgeRun has joined TI's effort stepping up to provide development and testing resources as well as professional services for TI's customers requiring expertise on GStreamer and companion technologies.

This document outlines the functional design of version 2.x of the plugin, which provides a new set of functionalities and design changes against the 1.x version, based on customer feedback and experience gathered by RidgeRun. Much of the design detailed in the following sections has been implemented on DDOMPE's branch in the project version control system repository at the time of this writing, and is being used in shipping products. The final implementation of the 2.x plugin will be done on a separated branch in the version control system based on the design detailed on this document.

Rationale

The GStreamer TI plugin was originally intended as a way to provide a software platform for synchronized audio/video playback using TI's Codec Engine and DMAI (since neither of these frameworks provides lip-sync capabilities). Another driving goal was that the same plugin source could be built to run on several TI platforms by utilizing the DMAI framework. Once this objective was accomplished, the community using the plugins started looking into supporting other GStreamer functionality that were not considered by the original design. Desired functionality enhancements includes trick-play modes, zero-memcpy encoding and playback, dual encoding, video zooming and panning, etc. Some companies contracted with RidgeRun to enhance the plugin, adding these capabilities. RidgeRun has implemented the new features on a separate branch, which is used as the 2.x code base.

The new plugin changes its revision number (from 1.x to 2.x) for several reasons:

Design Details

Migration path from 1.x to 2.x

Since the release 1.01 of the plugin, the main reference point for its use has been the examples page. For the 2.x release there will be a page in the wiki, explaining the usage for all the previously documented cases and pointing out any special details required for the new elements.

General Design

The 2.x plugin design follows the GStreamer design guidelines (opposed to the 1.x plugin that followed the DMAI guidelines). Other unique requirements that have arisen as GStreamer was used in production embedded devices, like explicit thread priority control, are also addressed.

Simplifying Customization

The 1.x plugin is extensively tested against the default codecs provided for each supported platform. It is however the exception and not the rule that a TI customer will use the default codec server for several reasons:

The new plugin is still tested against the default codec servers, but will simplify and document the process of modifying the existing code to use a different codec server, trying to minimize the amount of changes required in the process. This will also greatly simplify the task of supporting new platforms.

The procedure to add support for a custom codec server for the 2.x plugin will be documented in the project wiki.


Another aspect included in the new plugins is run-time detection of codecs available in the server. When the plugin is executed, it won't show elements that aren't supported by the existing codec server.

Code Orthogonality

The 2.x plugin code layout attempts to maximize code orthogonality: creating independent pieces of code that can be re-used in many cases. This is accomplished by:

The rule of thumb is: if you are writing the same code more than once, then your code is a good candidate for re-factoring.

We also avoiding conditional compiling as much as possible and instead rely on run-time detection of specific features. For example instead of using an \#ifdef condition for setting the colorSpace type for DM6467, we add code that picks the colorSpace from the plugin caps (and the caps reflect the right DM6467 colorSpace using conditional compiling only once).

Coding Standard

The 2.x plugins uses the standardized GStreamer indentation, as validated by the gst-indent script provided by the source code of the GStreamer package.

Build system

The 2.x plugin removes building the so-called open source components of the project (which is currently responsible for cross-compiling the gstreamer target platform binaries). The 2.x branch will hold in the control version system only the source code of the plugin and not the build system for it (following the pattern used by most open source projects). The build system used to validate the plugin will be the Arago project.

The project's website will provide a tarball containing pre-built GStreamer binaries that can be overlayed on top of a the target file system. Instructions on how that file system was built using arago will be available here.

The build instructions will need to clearly document the work flow for downloading the plugin, building it, making a change, rebuilding it, submitting the change.

Run time behavior

Element naming

The 2.x elements are going to be named following this convention: ti_codecenc and ti_codecdec

For example, if the codec is h264, and the operation is encoding: ti_h264enc.

Other elements that are not encoders or decoders will use custom names prefixed by the atom ti:

Elements that aren't specific to TI platforms but belong to the project while accepted on mainstream won't use the ti prefix. For example:

Codec Engine Integration

Codec Servers Management

Buffer management

In the 1.x branch the video/image decoders didn't require height and width capabilities from the input stream, since it allocated the size indicated by the codec and then re-sized the input buffers once the real size had been identified. However this approach wastes memory during the bootstrap without providing a way to work around it. Since there are already parsers implemented for all of the video types currently supported by the plugin, we rely on using parser elements when decoding elementary streams from a file, or in the worst case capabilities may be manually inserted with a caps filter. This usage model will be documented in the new 2.x plugin example pipelines page. More details about the behavior of the parsers could be found on the section regarding input buffer management for the decoders.

Handling of CMEM buffers inside gstreamer

The design rationale of tiaccel is to provide a separate gstreamer element for performing memory copies outside encoder/decoder/resizer elements (which is desirable on multiple stream scenarios, because any memory copies are made only once instead of for every element using CMEM buffers). Having an element perform the memory copies independently is necessary as well in order to have full control of thread prioritization in the pipeline.

Thread prioritization and handling

There are several different possible usage scenarios for elements, each one with specific requirements regarding real time behavior. The elements should be flexible enough to provide full control of the threading behavior, ultimately allowing users to achieve their real time goals. Initially, thread control appears to add unnecessary complexity, however in many practical cases this approach is far simpler than the alternatives (like using a home grown implementation tuned to a specific use case).

The 1.x branch elements were designed by embedding a real time thread inside those elements responsible for receiving input buffers and sending the buffers to hardware accelerators or the DSP for processing. The rationale behind this logic is to parallelize the data processing by using specialized hardware along with the memory copies into the CMEM input buffers.

The 2.x branch removes all the intra-element threading and instead provides the same functionality by allowing new elements to be included in the pipeline. The main rationale for removing the internal threading was to reduce design complexity, while simplifying the implementation of certain asynchronous functions such as flushing. Also the new design provides a more flexible control allowing developers to perform optimizations that were not possible before (like tuning the priorities of different thread segments in the pipe) or allowing the developers to remove the thread prioritizatoin solution if the extra control is not required for their use case.

There is a new priority element introduced in the 2.x branch which is responsible for providing control properties for the priority and scheduling of the thread segment at the point in the pipeline where the priority element occurs. All the 2.x elements are single-threaded.

The priority element provides the following properties:

The idea behind this design is based on the following lessons learned from the 1.x branch:

The priority element works along with the standard queue element to control which threads will receive the right priority. For example, let's consider the scenario of NTSC video capturing and encoding without frame drops. In this case we have the following requirements:

  1. The video source element needs to be scheduled to run on average less than every 33ms, so we require this element to have the highest priority.
  2. The encoder element thread must have the second highest priority as any encoder CPU starvation will cause the video source to run out of buffers (since the encoder will not consume, and thus return, the buffers output by the NTSC source at a rate fast enough).
  3. The multiplexer and file sink have lower priorities since there is enough buffering capacity from the encoder element to avoid being hit by scheduler delays (see output buffer management section).

The figure below provides the detail of where we locate the queues in the pipeline for creating three separate threads. Notice that we use the tiaccel element to transform the buffers output by v4l2src into dmai transport buffers, so there is no memcpy performed in the input data at all but instead the video capture buffers are processed directly by the encoder element.

Threads image

In this example, we can add a priority somewhere in the portion of the pipeline where separate threads are executing, allowing us to raise or lower the priority (or change the scheduler) of the thread where the priority element is executed. Figure bellow shows the location of the priority element in the example pipeline in order to achieve the appropriate control we outline before. Notice that the priority could be placed at different points in the pipeline as well to achieve the same result, just as long as there is a priority for each thread.


Threads Prioritization image

Integration with playbin/decodebin

A downside of using a separate thread priority element and removing the internal threading of the decoder is that when the decoder is used with playbin or decodebin, it is possible that there won't be parallelization of the decoding process. However there are two approaches to solve this problem:

Color Space Conversion

The 2.x branch introduces an element for color space conversion named ticolorspace (following the well-known element ffmpegcolorspace). The default behavior is for ticolorspace to use DMAI Ccv module.

RidgeRun is working on creating an open source IUniversal Ccv module, and this element may be added as a future capability for run-time detection of available codecs to use this module instead of the DMAI API. However this functionality is not scheduled for the 2.0 release.

Video/Image/Audio Encoding

The 2.x branch merges into a single code base the functionality of the video, image and audio encoders, given that most of the infrastructure for handling of the input/output buffers is the same for the encoding scenarios. Function calls for specific processing APIs from the Codec Engine are handled with function pointer tables. This approach presents several advantages:

Encoder elements design overview

The general design of the encoder elements is presented in figure below. The main considerations used for this design are:

  1. Some muxers require queuing several buffers before they start freeing them.
  2. Some devices encode continuously but don't release the output buffers immediately, instead queuing them for some seconds and later discarding unless there is some external event (for example surveillance cameras that are trigger-activated). In this scenario, its ideal to avoid output buffer memcpy operations to minimize CPU overhead and improve battery life.

For these reasons, the output buffers from the encoder should be as small as possible, and use the CMEM memory as efficiently as possible.

The main encoder data structures are:

Encoder elements design

Resource management

The encoder elements will follow this protocol for handling resources during the different element states, in accordance with gstreamer plugin guidelines:

Since the state_change function is asynchronous to the processing thread of GStreamer, the CMEM and edma library is required to have the capability of releasing resources from a thread different to that where they were allocated from. Some versions of the CMEM and edma libraries don't have this ability (this was a problem introduced on linuxutils 2.23.01, but fixed on revision 2.25), so a version of the DVSDK for the respective platform with an updated package may be required.

Input buffer management

The handling of the incoming buffers to the encoder is done with the following rules:

In the case of audio streams, the minimal amount of data to be processed is defined by the codec, so buffers are accumulated by the adapter until there is enough data to be passed to the encoder. However the timestamps and duration of the buffers need to be calculated based on capabilities of the stream: depth, sample rate, and channels. If the audio input stream doesn't provide this capability, then the element panics due to the lack of timestamps.

Output buffer management

A single CMEM output buffer is allocated for the resulting encoded buffers. By default this buffer is 3 times the size of the input buffers passed by the input adapter to the encoder, but the size could be controlled with an element property that defines how many times the size should be.

The function of this output buffer is to contain several slices of memory that are the resulting encoded data from the encoder. This slices are created dynamically from split and merge free memory slices. At the initialization of the element, the output buffer is a single free memory slice. The element maintains a list of free and used memory slices for the output buffer.

The output buffer has the following behavior:

This design is optimized to avoid memcpys and maximize the utilization of CMEM buffers. If there is a need for preventing the encoder from going to sleep when running out of output memory, there is a boolean property named copyOutput that will instruct the element to allocate standard GStreamer buffers and memcpy the encoded data in them for output on the src pad (releasing the CMEM buffers immediately after created).

The following diagram shows an example of the lists of free and used buffers inside the CMEM output buffer. In this scenario there are three buffers in use by downstream elements, splitting the CMEM output buffer in two free areas.

EncoderBuffer.png

Let's assume the third buffer used is released by a downstream element. In this scenario the last node of the used buffer list will be removed, and the memory area merged with the adjacent free memories. During the merge operation the two existing free memory areas will be now unified into a single free memory area, triggering the removal of the last node in the free buffer list. The following diagram shows the state after the third used buffer is released and the free memory areas merged.

EncoderBuffer2.png

Output buffer transformations

In some data streams it may be required to perform data transformations on the output from the encoder before send it downstream. For example, the typical h264 encoder generates byte stream h264 format, but this needs to be transformed into "packetized" stream (removing SPS/PPS NALUs from the stream and embedding them into codec_data buffer, and exchanging the NALU headers from other NAL types by size headers) in order to send this format into a container like quicktime or mp4 file format.

To support this feature and the codec_data generation (see next section), there is an optional structure with callback operations that can be registered with the encoder. This structure is different for each kind of data stream. For example in the h264 encoder the stream structure registers the callback that perform data transformation for h264 bytestream into packetized stream.

Codec Data generation

Some data streams (like h264, mpeg4, aac), require the generation of the "codec_data" buffer to be passed off band (in the caps of the sink pad), in order to be properly muxed into container formats like mp4 or avi.

To generate the codec_data buffer the encoder checks if there is a codec_data generation function registered in the structure with callback operations for this data stream (see previous section), and call it passing the output of the first encoded buffer. The codec_data generation buffer is only called once, with the first buffer, since this is typically the one containing the required headers to generate the codec_data buffer.

Extended arguments

The standard XDM APIs for encoders lack most of the basic quality control features required for fine tuning the encoder algorithms. 2.x plugin encoder elements support optional codec specific extensions and control their extended arguments using GStreamer element properties.

Codec specific properties are supported using a table of function pointers and other parameters required to provide the custom codec control. The members of this structure are:

These function definitions will be documented at the wiki site, along with a how-to describing the process of adding support for extended arguments for other algorithms. Future enhancements will be documented in the wiki as well.

Timestamping

The encoder elements preserves the timestamps provided by the incoming buffers. This is achieved with two separate procedures depending on the stream type:

Pixel aspect ratio

Current XDM APIs doesn't define support for non-square pixel encoding, so any support for this is left to extended codec parameters if a particular codec supports it. However the video / image elements may support pixel aspect ratio in future custom functionality for specific codec types (like h264 and mpeg4). This involves modifications in the headers to insert proper pixel aspect ratio information, but this feature is not scheduled for 2.0 release.

Video/Image/Audio Decoding

As with the encoder elements, the 2.x branch merges into a single code base the functionality of the video, image and audio decoders, given that most of the infrastructure for handling of the input/output buffers is the same for the decoding scenarios. Function calls for specific processing APIs from the Codec Engine are handled with function pointer tables.

Decoder elements design overview

The general design of the decoder elements is presented in figure below. The main considerations used for this design are:

The main encoder data structures are:

Resource management

Input buffer management

Parsers

Output buffer management

Extended arguments

Support for extended arguments in decoders is not implemented since most decoders doesn't require them to work properly. Support for decoders extended arguments could be added in the future using the same design of the extended arguments for encoders.

Flush handling

QoS handling

Clipping

Timestamping

Reverse Playback

Video Transcoding

Transcoding is the operation for transforming from one encoded format into another, and it usually involves the operation of a decoder and an encoder.

Transcoding speed is the more important factor, and to achieve the maximum possible throughput, two considerations need to be done:

Since decoders generates CMEM transport buffers, these can be used directly by the encoders without having to perform any memory copies operations. No extra consideration is required to avoid memory copies.

To achieve the proper parallelism between the encoders and decoders, a queue is required between them when creating the transcoding pipeline.

Example pipelines for transcoding will be documented on the wiki.

Video Resizing

The 2.x plugin provides two different elements for video resizing:

Pixel Aspect Ratio handling

The tiresizer element provides a boolean property that enables pixel aspect ratio normalization to 1/1 in case the input pixel aspect ratio in the caps is not 1/1.

For example if the source is an NTSC signal with PAR 4/3 at 720x480 pixels, when the par normalization feature is enabled the output buffer will be of 960x480.

The par normalization is overwritten if an specific output with and height is requested on the element properties.

Letter Boxing

The tiresizer element provides a boolean property that enables keeping the picture aspect ratio (also known as letter boxing). Any letter boxing is done after applying pixel aspect ratio normalization if it is enabled.

Color Space Conversion

Some hardware resizers provide functionality for color space conversion. This functionality will be available in the tiresizer and tivideoscale elements in the output caps to reflect the available color formats.

The supported color space conversions will be documented on the wiki.

Video Rendering

Video Capturing

Testing Design

The testing focus is on proper GStreamer TI Plugin operation, with the goal to make it possible to automate as much of the testing as possible given the available resources.. The surrounding components are assumed to be defect free, meaning any testing of surrounding components happens as a side effect of testing the GStreamer TI Plugin. The surrounding components include other GStreamer plugins, DMAI, Codec Engine, DSPLink and the codec server running on the DSP (if part of the system).
Each test is run on a host computer, connected to the unit under test via serial connection and in some cases can require a network connection as well. As with most test cases, (1) the unit under test is put in a known state, (2) the test input is provided, and (3) the results are compared against known expected results.

Feature Analysis

The GStreamer TI Plugin external interfaces that can be tested in a deterministic manner include:

There are many other aspects of the GStreamer TI Plugin that could be tested as well:

In addition, key aspects of the GStreamer TI Plugin could be parameterized:

GStreamer TI Plugin testing exercises the documented external interfaces of each element.

Testing Tools

Previously the primary testing tool was the gst-launch command. Test pipelines were created using gst-launch and manual interaction verified the pipeline ran to completion without error. To move toward more automated testing, attributes of executing a test are compared to expected results. The downside is any data generated by the GStreamer TI Plugin could be bogus and the test would pass. Human interaction to monitor the generated audio and video is still required.

GStreamer Daemon – gstd

To extend the concept of gst-launch, GStreamer daemon was created. In addition to executing pipelines described using a simple text string notation, GStreamer daemon allows control of the pipeline state and dynamically changing an element's properties while the pipeline is playing. The two key pieces of the GStreamer daemon include gstd, the actual daemon that responds to d-bus messages and gst-client, similar to gst-launch, which can send d-bus messages to gstd, and to report any d-bus messages produced by gstd.

Perl Expect

Each test case is written in Perl utilizing the Perl Expect module. The expect testing paradigm is used since it allows for either local or remote control of the testing sequence. By local, we mean run the test and the test execution on the unit under test and by remote we mean run the test on the unit under test and the test execution on a desktop computer.

The other big advantage of Perl is the create of individual test cases can be greatly simplified by using a Perl module specific to the testing of the GStreamer TI Plugin. Additional Perl subroutines can be defined to refactor code commonly found in individual tests. Here is a simple (not real) test case comparison. In the 1.x version of the GStreamer TI Plugin test suite, one of the test cases for the DM6467 audio decode testing is:

PIPELINE=filesrc location=/opt/media_files/davincieffect_HEv2.aac ! ti_aacdec ! tiperf
 engine-name=decode print-arm-load=TRUE ! alsasink
gst-launch --gst-debug-no-color --gst-debug=TI*:3 $PIPELINE 

This test verifies AAC encoded audio data is decoded and plays correctly without any obvious errors. As an example, the test case can be expressed using Perl Expect and GStreamer TI Plugin specific subroutines as:

p=gtp_init($pipeline);

gtp_expect_avg_arm_load(p,12);

gtp_expect_avg_dsp_load(p,25);

gtp_run(p);

gtp_fini(p);

where the output generated by –gst-debug=TI*:3 and dmaiperf is checked to make sure the no error are reported and the average ARM and DSP loads don't exceed the specified values. To test the AAC decode function handles pipeline events correctly, an example test case could be:

p=gtp_init($pipeline);

gtp_run(p);
$i=10;
while ($i > 0) {

  msleep(100);
 
  gst_pause(p);
 
  msleep(10);
 
  gst_resume(p);

} 

gtp_fini(p);

Of course it would be likely that a gtp_pause_resume() subroutine would be added instead of including the above while loop in many tests.

Test Categorization

Pipeline Data Consumption and Production Tests

The existing gst-launch based test suite that has been used on the 1.x version of the GStreamer TI Plugin generally focused on correct pipeline operation as perceived by a human operator. Indications used identify improper operation include unexpected error messages, pipelines that never start, and pipelines that never finish.
With a human operator it was also possible to also detect garbled or incorrectly timed audio or video. These issues are more difficult detect with automated tests.
The focus of these tests is proper operation in static conditions. Tests should be generated to cover the following important cases:

Tests for Known and Expected Pipelines

Exhaustive testing of all possible inputs and settings for an element is not an objective of this test suite. Instead, the focus is on verifying pipelines work for the known and anticipated use models. The expected use models will vary based on the capabilities of the processor.

Dynamic Pipeline Tests

The GStreamer pipeline state will be changed and the resultant element behavior monitored for expected operation.

Dynamic Element Property Tests

Element properties than can be modified while the pipeline exists are changed and the resultant element behavior monitored for expected operation.

Leave a Comment

Comments

Comments on GStreamer Plug-in 2.x Design


Bksingh said ...

Comments for Build Section


Overall idea looks nice but i've some concerns

1) How you are planning to handle the cases when customer wants to run gst-ti with older DVSDK 2.x (based on MVL releases)? IIRC, Arago supports building from CS toolchain not from MVL and there may be some issues running CS built binary on MVL platforms. (because of glibc version differences). Even if we get lucky then also we need to ensure that the prebuilt binaries are generated from right toolchain.

2) Today we patch open-source packages as we progress and we need to ensure that there is very easy way to do the same without using Arago/OE build system. E.g If i want to modify something in core gstreamer elements then today all i do is generate patch and put in our patches directory and this gets applied as part of build system. How you are planning to solve this problem with proposed build changes? I assume once you provide the prebuilt binary then there are very good chances that customer may be using old prebuilt binary unless they read some webpage explaining that they need to use different version.

3) Since Arago/OE supports GIT kernel only hence you also need to make sure that they have proper recipies for LSP 2.x kernel header as well.

4) Finally you need to ensure that there is proper software manifest document which contains license details etc for all those prebuilt binaries. I think we have gst-ti software manfiest and can be reused.

Thanks Brijesh


--Bksingh 15:46, 10 March 2010 (CST)

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox