DaVinciHD Codecs FAQ

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search
Translate this page to   

Contents

General FAQ

In DM6467, the video codecs are partitioned between the DSP and the HDVICP. The process() API is called on the DSP. But after some initializations (loading the HDVICP, header parsing, etc.) , the process call returns back to application (SEM_Pend). Now the Hdvicp calls the DSP in interrupt mode. The total cycles required to do encode/decode is the Blocking Cycles. The cycles for which the DSP is utilized for the codec tasks are the Non-Blocking Cycles. After the last Macro-Block in the picture is processed by the Hdvicp, the codec ISR posts a SEM_Post. The codec task now wakes up to do the end of frame processing.

Yes, both the Hdvicps can execute simultaneously. Both the Hdvicps are totally independent of each other - but they share common resources (DSP, EDMA).

Hdvicp0 has the capability to do both Encode and Decode. Hdvicp1 can do only Decode. This is because Hdvicp1 does not have the IPE and ME engines that are required in encoding.

No, the Hdvicps cannot access the DDR directly. It can trigger a DMA to fetch/write data to DDR.

Yes, the ARM968 can program, trigger and wait on EDMA channels. The ARM 968 is also a master on the System Config Bus which is used to write into the EDMA PaRAM and registers.

Yes, the DSP can access the Hdvicp IP buffers and registers.

The frequency ratio between the DSP and the Hdvicp is 2:1. So if the DSP is clocked at 594 MHz, the Hdvicp is clocked at 297 MHz.

No, it is not possible to program the filter co-efficients in the LPF. The LPF supports filtering as per the standards (H264/VC-1).

No, it is not possible to program the filter weights. Only the standards specific interpolation can be implemented.

No, the Hdvicps do not support 422 chroma format.

The VDCE engine on the DM6467 can be used for the chroma conversions. Alternately, you could even use the DSP to do the chroma conversion.

No, we do not use the VDCE to do the edge padding in TI codecs. We use the EDMA and DSP for this. The reason is that if we use the VDCE, this operation will be sequential with the actual encoding/decoding of the frame (the reference obtained by padding is required to encode/decode the immediately following frame). Since it is not in parallel with the DSP or Hdvicp, using the VDCE does not give us any advantage. We also wanted the VDCE to be free to be used for any scaling/chroma conversions in the application.

No, the VDCE can be used only for down-scaling the YUV. It does not support up-scaling.

No, the HW does not support RGB to YUV conversion.

The DM6467 codecs use the semi-planar format. The Luma component is 1 plane. The other plane is CbCr interleaved. This is the format that the Hdvicp uses internally for processing.

The work-around to avoid the deadlock is : The same Master cannot write to the L2 and DDR. The same Master cannot write to L2 and Hdvicp buffers. So allocate one TC for all writes to L2 in the codecs. No writes to DDR or Hdvicp should be done on this TC. In TI codecs we use TC0 for all writes to L2. The remaining 3 TCs do not write to L2.

For the codec to be real-time 3 threads need to be within the budgeted cycles. First, the DSP cycles and Hdvicp cycles should fit within the budget. Note that the DSP and Hdvicp should be running in parallel to utilize the capabilities of the HW effectively. The EDMA is the third thread that runs in parallel with the DSP and the Hdvicp. There are 4 TCs in DM6467 EDMA. Try to distribute the transfers on the TCs such the load is balanced between these TCs. Usually, it happens that one TC is much more loaded than the other TCs, and hence this particular TC could become a bottleneck.

Note that there are 4 ports on each Hdvicp for the transfer of data in/out of the Hdvicp buffers. The 4 ports are UMAP1, R/W port, R port and the W port. Check if the transfers are also distributed between these ports. It could happen that all your transfers are happening on a particular port, and now this port is becoming a bottleneck!

You can choose the port through which data gets transferred by choosing the respective address for the source/destination. For the same physical Hdvicp buffer, each port has its own address map. Just by choosing the right addresses for the source and destination you can choose the port. Please refer to the DM6467 Address Map spreadsheet for the exact addresses.

The DSP loads the ARM968 code into the ARM ITCM and starts off the Hdvicp. This is done as a part of the frame level initializations of the process call.

Please request for the datasheets of the individual codecs for the accurate performance details. Note that the performance depends on a number of factors (features that are being turned on/off, Cache Sizes, type of content). The datasheet provides these details.

The performance of the codecs depends on the DSP/Hdvicp frequency as well as the DDR2 speed. Note that since the DDR is not being scaled up linearly for the various DM6467 parts, the performance will not scale up linearly.

H264 Decoder FAQs (Reference Codec Version -- REL_200_V_H264AVC_D_HP_DM6467_1_10_012)

The DM6467 H264 Decoder supports all 3 profiles -- BP/MP/HP. Note that it does not support ASO/FMO in the Baseline Profile. The maximum resolution supported is 1920 x 1088 (1080HD). The max level supported is 4.0

Yes, the decoder supports all the above tools. Note that these tools are a part of MP/HP profiles and the decoder supports all these tools.

No, the decoder only decodes 420 encoded streams. The HDVICP does not have support for 422 format.

Yes, the decoder can use any of the 2 Hdvicps; both the Hdvicps support decoding.

The decoder implements the XDM IVIDDEC2 interface.

The decoder supports resolutions above 64x64 upto 1920x1088. Note that the width and height need to be a multiple of 16.

Yes, the decoder can decode streams with multiple slices. We have already tested the decoder with a number of streams with multiple slices (JVT Conformance streams, Professional test suites available in the market, and our internally generated streams). Support of multiple slices is not dependent on the Hdvicp; this is totally a SW feature.

Yes, the TI decoder can be used for multi-channel cases. There is nothing in the decoder that restricts this. The application can create multiple instances of the codec without requiring any change in the codec library.

The decoder expects an entire frame/picture before it starts decoding. We are currently not supporting Slice Level APIs, all APIs are at frame/picture level.

Yes, the decoder supports Error Resilience and Concealment. The Error Resiliency feature is very robust; we have tested the decoder with ~9000 Error streams. For Concealment, if the current picture is in error we copy the pixels from the previously decoded picture.

The decoder does not need an IDR to start decoding the sequence. It can start decoding from any frame assuming the reference to be 128. Of course, there will be error propagation because of this. The decoder will then re-sync once an IDR or a picture with Recovery Point SEI is received.

The decoder will decode the remaining error-free slices correctly. If a slice is in error, the decoder will latch on the next slice in the frame. It will apply concealment only for the missing slices, and not for the entire frame.

The SEI messages supported by the decoder are: Buffering period SEI message Recovery point SEI message Picture timing SEI message Pan-scan rectangle SEI User data registered by ITU-T Recommendation T.35 SEI All the SEI messages are passed back to the application/or used by the decoder in the decode order.

The decoder needs the input buffer to consist of at least one complete picture before the decode_process () is called. The codec SW does not support slice level process calls. Note that this is how the codec SW is architected; the Hdvicp does not have any limitation to support slice level APIs.

Yes, the decoder has the flexibility to skip the decoding of non-reference pictures. This has to be communicated via the inArgs to the decoder. We can use the skipNonRefPictures field in the inArgs for this. The supported values are: 0: Do not skip the pictures 1: Skip a non-reference Frame or Field 2: Skip a non-reference Top Field 3: Skip a non-reference Bottom Field. When the decoder is called with this field set to 1/2/3 it does not decode the picture if it is a non-reference picture; but it returns the number of bytes that were consumed to skip the current picture. This is helpful if the application needs to move the input pointer to the next NAL Unit.

We use the DSP and EDMA to do the padding. We pad only the left and right edges at the end of the frame. For the top and bottom edges, padding is done on the fly when the padded data is required for interpolation by the Motion Compensation HW.

For CABAC encoded streams the H264 decoder will be real-time (on 594 MHz part) for bitrates up to 14 Mbps. For CAVLC encoded streams the decoder will be real-time for bit-rates as allowed by the standard for level 4.0 (H264 standard allows a max bit rate of 24 Mbps for level 4.0 streams).

Currently, TI has 2 separate encoders on DM6467. The 720p encoder can support resolutions from QCIF up to 720p. It can also support 1920 x 544 resolution as a special case. The 1080p encode can support resolutions from 640 x 448 up to 1920 x 1088. The restriction on 1080p30 encoder is that the width must be a multiple of 32, and height must be a multiple of 32.Also if the input resolution is 1920x1080, the last 8 lines of the frame need to padded and provided by the application to the encoder

The basic difference between the 2 encoders is that they use different Motion Estimation Algorithms. Because of this, the performance and quality is different for the 2 encoders. Also there is a difference in the number of resources being used by the encoders. 720p encoder uses 21 EDMA channels and 32 Kbytes of L2 as SRAM. 1080p30 encoder uses 49 EDMA channels and 64 Kbytes of L2 as SRAM.

The encoder implements the XDM IVIDENC1 interface.

No, only the Hdvicp 0 has the support to do encoding. Hdvicp 1 does not have the Motion Estimation and Intra Prediction Estimation engines, hence it cannot support encoding.

Yes, the TI encoder supports multiple slices. In the 720p encoder, a slice can be a multiple of rows. The application can choose the number of rows that will be encoded in a slice. The 720p encoder currently does not have support for H241 (slices based on number of encoded bytes/slice). The 1080p encoder supports the H241 feature. The user can input the maximum number of bytes in a slice, and the encoder will encode the slices such that the number of bytes in the slices does not exceed the specified value. The 1080p encoder also supports slices based on the number of rows.

The TI encoders are Baseline Profile encoders with support of some MP/HP tools (CABAC, 8x8IPE Modes and 8x8 transform). We do not support B-frames in the encoders. Note that the Hdvicp has support to encode B-frames; the codec SW is not supporting this today.

For the 720p encoder, we support 3 types of Motion Estimation. ME Type 0 (or Original ME) is recommended for resolutions greater than or equal to D1. ME Type 1 (Low Power ME) is a modification of the Original ME to reduce the DDR BW. Similar to the Original ME, the LPME is recommended for resolutions of D1 and above. Both the LPME and original ME give 1 MV/MB. For resolutions below D1, it is recommended to use ME Type 2 (Hybrid ME). This algorithm gives upto 4 MV/MB. The encoder offers the flexibility to the application to select any of the 3 ME schemes during the codec instance creation. For the 1080p encoder, currently we support the Decimated ME Scheme ( a coarse search followed by a refinement search). This ME scheme is different from the ME schemes in the 720p encoder. Decimated ME is suitable for high resolution and high motion sequences. It is recommended that the Decimated ME be used for 1080p resolution. Note that Decimated ME generates 1 MV/MB. There is a plan to add the LPME scheme to the 1080p encoder, and the application can select either of the 2 ME schemes during the codec instance creation.

Yes, both the TI encoders can be used for multi-channel use-cases. The application can create multiple instances of the encoder without any change required in the codec library. The application can also run multiple instances of the TI encoder/decoder.

No, the ME HW on Hdvicp supports search based on SAD; we do not have support for SATD on HW.

The top neighboring pixels are the unfiltered reconstructed pixels. Due to the pipeline constraints on the DM6467, the left pixels cannot be the reconstructed pixels. Hence we use the original pixels for the left neighbors.

No, the ME HW does not allow you to modify the interpolation co-efficients.

Yes, you could insert SPS/PPS as required by your application. In the dynamicParams, set the generateHeader field as 1 and call the control API with XDM_SETPARAMS command. Now the next process call will generate the SPS/PPS. Note that this process call will not encode a picture; it will only generate the headers. Once the headers are generated, set generateHeader field as 0, and again call the control API with XDM_SETPARAMS. Now the process calls will actually encode the frames. This sequence can be repeated whenever the application requires the encoder to generate the SPS/PPS headers.

Call the control() API with the XDM_SETPARAMS command, and set dynamicParams.forceFrame = IVIDEO_IDR_FRAME. Then, the next process() call will generate the IDR frame.

Once you get the IDR frame call control() again with dynamicParams.forceFrame = -1. The next process() call will generate the P frame. This sequence can be repeated whenever the application requires the encoder to generate IDR frames.

No, the H264 encoders (both 72030 and 1080p30) cannot do 422 to 420 chroma conversion. The application can use the VDCE to do this..

No, the encoder does not support PICAFF or interlace encoding. The 1080i/p Encoder would support interlaced coding by Oct. 2009.

Yes, the 1080p30 encoder can encode 720p@60fps. But the quality of 720p30 encoder encoding @60 fps will be better than the 1080p30 encoder for 720p@60 fps.

Yes, the Hdvicp can support de-blocking across slices. The encoder provides control whether to enable or disable this feature at the time of codec instance creation.

The encoder expects a complete frame as input to the encode_process call. Each encode_process call generates compressed stream for an entire frame. Presently, we do not support slice level APIs.

MPEG2 Decoder FAQs (Reference Codec Version --- REL_200_V_MPEG2_D_MP_DM6467_1_01_007)

The MPEG2 decoder supports Main Profile @ High level. It supports interlace and B-pictures decoding.

The MPEG2 Decoder uses XDM 1.0 (ividdec2.h) interface.

Yes, the MPEG2 Decoder can run on either of the 2 Hdvicps.

NO, unlike the other codecs on DM6467, the MPEG2 decoder does not interrupt the DSP every MB pair. The DSP only does the frame level initializations and the end of frame processing in case of MPEG2 decoder. All the remaining tasks are being done by the ARM 968 of the Hdvicp.

Yes, the MPEG2 decoder has the option of returning back the De-blocked/De-ringed data. This feature can be turned off during the instance creation if it is not required by the application.

Yes, the MPEG2 Decoder codec supports creation of multiple-instances. You can use the codec without any changes for multi-channel applications.

Yes, Error Resiliency and Concealment are supported. The Error Concealment is implemented by copying the co-located pixels from the previously decoded picture in case of errors.

VC1 Encoder FAQs (Reference Codec Version --- REL.200.V.VC1.E.MP.DM6467.01.00.00.07)


The VC1 Encoder supports Main Profile@ Medium Level. It can encode resolutions from QCIF upto 1280x720. Note that it is only supports progressive encoding. It does not support Interlace and B-frames encoding. Note that the Hdvicp has support for B-frames and interlace encoding, but the codec SW does not support these tools.

The VC1 encoder supports only the 8x8 transform. Note that the Hdvicp supports all the variable size transforms in the VC1 standard, but the codec uses only the 8x8 Transform. Also, the codec SW only supports uniform quantization type.

Yes, the VC1 encoder supports Overlap Transform and Intensity Compensation tools.

The VC1 encoder supports 2 ME schemes; Original ME and Hybrid ME. Original ME is recommended to be used for resolutions of D1 and above. This ME generates 1 MV/MB. Hybrid ME is used for resolutions lower than D1, and generates up to 4 MVs/MB. The application can choose which ME scheme the encoder uses during the codec instance creation.

No, the VC1 encoder can execute only on the Hdvicp 0, since Hdvicp 1 does not support encoding (it does not have the ME engine).

Yes, the application can choose to force a particular frame to be encoded as an Intra frame. In the dynamicParams, set the forceFrame field as IVIDEO_I_FRAME (defined as 0 in ivideo.h). Call the control API with XDM_SETPARAMS command. Now the next process call will generate the I frame. Once you get the IDR frame set the forceFrame field as -1, and call the control API. Now the next process call will generate the P frame. This sequence can be repeated whenever the application requires the encoder to generate I frames.

You can use the targetBitRate field in the dynamicParams to change the bit rate during runtime. Set the targetBitRate to the bitrate that you want the encoder to generate. Call the control API with XDM_SETPARAMS command. Now the encoder will use this bitrate for encoding from the next process call onwards.

H264 Transrater FAQs (Reference Codec Version --- REL.200.V.H264AVC.TR.HP.DM6467.01.00.00.10)


The input to the Transrater is a H264 encoded bit stream. The output is also a H264 compressed bitstream but at a lower bit rate.

For example can the input be a 1920x1088 encoded stream, and output is a 1280x720 encoded stream? No, the Transrater can only change the bit rate; it cannot change the resolution of the output bitstream.

The maximum resolution supported by the transrater is 1920x1088. The transrater supports resolutions from QCIF up to 1080HD.

Yes, the Transrater supports interlace and B-frames. Note that the transrater does not change the frame type. If a frame in the incoming bit-stream is a P frame, the transrater will re-encode it as a P frame. There is no option to change this frame as a I/B. If the input is a P-field, the output will also be a P-field. Similarly for B-pictures.

No, the transrater does not support MBAFF.

No, the currently the transrater supports only 1 slice/frame. The HW supports multiple slices, but the SW today is not supporting this feature.

Yes.

The Transrater performance is very much dependent on the GOP structure of the input stream. We are able to meet real time performance (on 675 MHz) for GOP structures in which the 50% of the pictures are non-ref B pictures. For other GOP structures, we will not be able to meet the real time requirements on 675 MHz part.

The transrater API interface support various performanceLevel controls through the extended InArgs structure to manage peak MHz loading conditions. Refer the User Guide for details.

If there are no non-ref B pictures, whether or not the transrater can be real time depends on the GOP structure. For example, if at least 50% of the pictures are ref-B pictures (e.g. P P RefB RefB P P RefB RefB P P ...), then fallback options are provided in the transrater to meet real time constraints. These fallback options are controllable by the application through performanceLevel field in the extended InArgs structure. By default, performanceLevel 0 is used which maps to the best quality transrated output. To transrate unfriendly GOP structure streams (if there are no non-ref B pictures, for example), higher performanceLevel values (e.g., 1 or more) may be used. Refer the User Guide for details

See the component datasheet for details regarding usage of DMA channels, TCs and TC priorities

No, transrater can start from any valid SPS, PPS and Slice combination

MHz load is computed using the 8 field or frame (for interlaced and progressive respectively) moving average method. First the average of every 8 fields or frames is computed using a Sliding window method. Average Cyles for a field or frame is the average of all these moving averages for the entire stream. Peak Cycles for a field or frame is the maximum of all these moving averages for the entire stream. MHz is computed by multiplying by frames/fields per second and dividing by 106.

No, the Non-Blokcing MHz quoted in the datasheet indicates the DSP load due to the transrater alone. Audio, ISR context switching overheads, encryption or any other component DSP load needs to be measured separately to get the accurate DSP load of the entire transrating system

The application needs to set the bit rate using the targetBitRate[0] field in IVIDTRANSCODE_DynamicParams and then call Control API using the XDM_SETPARAMS command. The bit rate is expressed in bits per second. Therefore for 4Mbps, set this field to 4000000. The bit rate can be changed run time as well. Note that there is only one Rate Control algorithm supported in transrater. Therefore the application need not set the rateControl[0] field in IVIDTRANSCODE_DynamicParams structure. This field is ignored by the transrater.

Since the primary purpose of using the transrater is reducing the bit rate, the rate control algorithm does not allow the achieved output bit rate to be greater than the input bit rate, even if the application sets the target bit rate to a value greater than the input bit rate.

No the application need not estimate/compute the input bit rate. It only needs to set the target bit rate.

The instantaneous bit rate deviation pattern of the output as compared to the targeted bit rate will follow the instantaneous bit rate deviation pattern of the input as compared to its average.

The transrater produces a bit-stream that has an average bit rate which is within +/- 5% of the target bit rate

Transrater will be real time (on 675 MHz part)? For CABAC encoded streams the Transrater will be real time for bitrates up to 14 Mbps. For CAVLC, it supports the maximum bit rate as per the standard for level 4.0 (24 Mbps).

The transrater actually uses both the Hdvicps simultaneously. The decoding of the input bit-stream happens on Hdvicp1, and the re-encoding happens on Hdvicp 0.


Yes, the Decoder Hdvicp ARM 968 (Hdvicp 1) interrupts the DSP ever 2 Macro Blocks. So there are still some cycles left on the DSP which can be used to schedule an audio task. But scheduling this requires very careful optimization so as avoid cache thrashing for the video for every MB-pair.

Leave a Comment
Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox