Extracting MPEG-4 Elementary Stream from MP4 Container

From Texas Instruments Wiki
Jump to: navigation, search

Overview

TI DM355/365 MPEG-4 decoder accepts an elementary stream input only. So if you try to decode a MPEG-4 stream in MP4 container format with the DVSDK demo or example application, you need to extract a MPEG-4 elementary stream from the container at first. FFmpeg has the capability to to do it. But the extracted elementary streams are lacking the Video Object Layer (VOL) and the upper layers. An extracted elementary stream by FFmpeg contains just sequence of Video Object Plane (VOP). The following explains how to regenerate the VOL header. You can reconstract a standard elementary stream by joining the regenerated VOL header and the stream lacking in VOL.
Because FFmpeg supports various container formats, this technique should be applicable to other containers than MP4 (e.g. AVI).
Please note TI DM355/365 MPEG-4 decoder does not decode all MPEG-4 streams. See Transcode from MPEG4 to Restricted DM355/DM365 MPEG-4

Extract Elementary Stream with FFmpeg

Use ffmpeg with the -vcodec copy -f m4v option to extract the raw video codec data as it is.

$ ffmpeg -i test.mp4 -an -vcodec copy -f m4v body.m4v

FFmpeg version SVN-rUNKNOWN, Copyright (c) 2000-2007 Fabrice Bellard, et al.
  configuration: --enable-gpl --enable-pp --enable-swscaler --enable-pthreads --enable-libvorbis --enable-libtheora --enable-libogg --enable-libgsm --enable-dc1394 --disable-debug --enable-shared --prefix=/usr
  libavutil version: 1d.49.3.0
  libavcodec version: 1d.51.38.0
  libavformat version: 1d.51.10.0
  built on Mar 16 2009 21:16:26, gcc: 4.2.4 (Ubuntu 4.2.4-1ubuntu3)

Seems stream 0 codec frame rate differs from container frame rate: 1000.00 (1000/1) -> 29.97 (30000/1001)
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mp4':
  Duration: 00:00:18.4, start: 0.000000, bitrate: 2089 kb/s
  Stream #0.0(eng): Video: mpeg4, yuv420p, 640x480, 29.97 fps(r)
 Output #0, m4v, to 'body.m4v':
  Stream #0.0: Video: mpeg4, yuv420p, 640x480, q=2-31, 29.97 fps(c)
 Stream mapping:
  Stream #0.0 -> #0.0
 Press [q] to stop encoding
 frame=  553 q=0.0 Lsize=    4703kB time=18.5 bitrate=2088.1kbits/s
 video:4703kB audio:0kB global headers:0kB muxing overhead 0.000000%

You can get the vop_time_increment_resolution, video_object_layer_width and video_object_layer_height values from the FFmpeg message output to the console. You need to know these values to regenerate a VOL header at the next step. For the above example, you can read:

  • vop_time_increment_resolution = 1000
  • video_object_layer_width = 640
  • video_object_layer_height = 480

Generating VOL Header

The following is a pseudo C code to generate a VOL header. Assume that BITS_put(), BITS_putUShort() and BITS_putULong() put the bits specified with the unsigned 8-bit, unsigned 16-bit and unsigned 32-bit arguments respectively. The third argument of the each function is the number of bits to add to the bitstream. The vopTimeIncrementResolution, videoObjectLayerWidth and videoObjectLayerHeight variables can be obtained from the output message of FFmpeg described in the above step.

unsigned char Buffer[256];
 
/* initialze the bitstream object with buffer */
BITS_init(bits, Buffer);
 
/* video_object_start_code */
BITS_putULong(bits, 0x00000101, 32);
 
/* video_object_layer_start_code */
BITS_putULong(bits, 0x00000120, 32);
 
/* random_accessible_vol */
BITS_put(bits, 0x0, 1);
 
/* video_object_type_indication */
BITS_put(bits, 0x01, 8);
 
/* is_object_layer_identifier */
BITS_put(bits, 0x0, 1);
 
/* aspect_ratio_info */
BITS_put(bits, 0x1, 4);
 
/* vol_control_parameters */
BITS_put(bits, 0x0, 1);
 
/* video_object_layer_shape */
BITS_put(bits, 0x0, 2);
 
/* marker_bit */
BITS_put(bits, 0x1, 1);
 
/* vop_time_increment_resolution */
BITS_putUShort(bits, vopTimeIncrementResolution, 16);
 
/* marker_bit */
BITS_put(bits, 0x1, 1);
 
/* fixed_vop_rate */
BITS_put(bits, 0x0, 1);
 
/* marker_bit */
BITS_put(bits, 0x1, 1);
 
/* video_object_layer_width */
BITS_putUShort(bits, videoObjectLayerWidth, 13);
 
/* marker_bit */
BITS_put(bits, 0x1, 1);
 
/* video_object_layer_height */
BITS_putUShort(bits, videoObjectLayerHeight, 13);
 
/* marker_bit */
BITS_put(bits, 0x1, 1);
 
/* interlaced */
BITS_put(bits, 0x0, 1);
 
/* obmc_disable */
BITS_put(bits, 0x1, 1);
 
/* sprite enable */
BITS_put(bits, 0x0, 1);
 
/* not_8_bit */
BITS_put(bits, 0x0, 1);
 
/* quant_type */
BITS_put(bits, 0x0, 1);
 
/* complexity_estimation_disable */
BITS_put(bits, 0x1, 1);
 
/* resync_maker_disable */
BITS_put(bits, 0x0, 1);
 
/* data_partioned */
BITS_put(bits, 0x0, 1);
 
/* scalability */
BITS_put(bits, 0x0, 1);
 
/*
 * stuffed bits
 * insert 01..1 to align the next code on the byte boundary
 */
BITS_stuffBits(bits);
 
/* get stream size in byte */
size = BITS_getSize(bits);
 
/* write bitstream data to file */
n = fwrite(Buffer, 1, size, fp);

For the condition vop_time_increment_resolution=1000, video_object_layer_width=640 and video_object_layer_height=480, the header should be the following 18 bytes:

00 00 01 01 00 00 01 20 00 84 40 FA 28 A0 21 E0
A2 1F

Joining VOL Header and VOP data

Lastly, concatenate the streams generated at the above steps. The concatenated stream is decodable by TI DM355/365 MPEG-4 decoder.

$ cat head.m4v body.m4v >test.m4v