[FFmpeg-devel] mpegts_write_header(), the buffering and thus reordering of audio samples.

Thu Nov 3 00:13:57 CET 2011

On Tue, Nov 1, 2011 at 11:39 AM, Will Korbe <wkorbe at gmail.com> wrote:
> On Mon, Oct 31, 2011 at 7:41 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Mon, Oct 31, 2011 at 06:24:07PM -0700, Will Korbe wrote:
>>> Hello,
>>>
>>> I hopefully have a minor question about the
>>> libavformat/mpegtssec.c:mpegts_write_header() implementation. My hours
>>> of online searching and specification reading for the answer has been
>>> unsuccessful.
>>>
>>> I see there is an efficiency built in to bundle multiple audio samples
>>> into one PES packet and was wondering if someone could point out a
>>> paragraph from the MPEG2 specification which indicates the reordering
>>> of TS Packets this causes is okay? I have a video with many 41 byte
>>> audio samples at the start, and this seems to push the first audio
>>> samples 5 seconds into the output. At this point, the PTS and DTS for
>>> the audio samples are beyond the last PCR value by 5 seconds. The
>>> video plays fine in QuickTime, the audio and video are precisely in
>>> sync, so I assume QuickTime has logic to buffer the video frames until
>>> seeming the first audio sample to account for this, but I wonder if
>>> most MPEG2TS players handle this the same way. If this is spelled out
>>> in the specification, I would sure feel more comfortable with it.
>>>
>>> Any information would be useful.
>>
>> I dont remember a tight limit on the PCR-DTS distance in the spec but
>> then i dont remember the spec very well ...
>> but having this large is bad because this means a long startup delay.
>> So 5 seconds is IMHO not appropriate by default.
>> Of course that delay would only affect actual broadcast and not playing
>> from a local file with a modern player.
>> the user parameter max_delay should be able to limit this, if not
>> a bugreport & patch are welcome
>>
>>
>> [...]
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> Asymptotically faster algorithms should always be preferred if you have
>> asymptotical amounts of data
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>>
>
> Hi Michael,
>
> Thank you for your response. From what I can tell, max_delay does not
> limit the amount of audio sample buffering, rather, it is a macro,
> DEFAULT_PES_PAYLOAD_SIZE (2930 bytes). In my test case,
> AVFormatContext::max_delay is set to 50000 (50 ms). My sample video is
> actually a movie trailer with close to 5 seconds of silence before the
> audio sample byte sizes become substantial, after that point, the
> audio samples are large enough that the PTS/DTS values become just 500
> ms behind the last PCR. Given this, it seems when there are
> multi-second periods of silence, the PTS/DTS of an audio sample has
> the potential to become multi-seconds behind the last PCR in the TS
> Packet Stream. As you mentioned, this may be acceptable with regards
> to the specification and/or current playback implementations, however,
> to attempt to handle more cases, perhaps the buffering of audio
> samples should be limited by max_delay as you suggest, that would seem
> to make it more inline with the mpeg2 timing model, but I'm not and
> expert in this area. I plan to investigate further to see if a BUG
> report is warranted.
>
> Thank you,
> Will
>

When I study the MPEG2 timing and buffering model, it would seem to
indicate the PCR value is the system-clock used to synchronize access
units (audio samples and video frames). For the decoding side, the
only delay specified is the buffers after the demux step, apparently
one for the video frames, and one for audio samples. If the MPEG2TS
stream was processed serially where the PCR value indicates the
current system time as it is received in the MPEGTS stream, then in
the audio or video buffers, if any access units have a DTS value which
matches the current PCR, those access units are instantly removed from
the buffer and decoded. If we don't have a DTS, the PTS is considered.
When any of the access units have a PTS equal to the current PCR, then
those access units are presented. Given this, it seems apparent to me
that this is not how some players are implemented, otherwise the audio
would arrive late, after the video was decoded and presented, and
would have been dropped. I suspect players have an additional layer of
logic and an initial delay built-in to handle MPEG2TS streams that are
not quite spec compliant, otherwise my test video stream wouldn't have
played correctly. If anyone knows more about this and can point out a
flaw in my reasoning, please let me know.

If I compare ffmpeg's MPEG2TS output with Apple's mediastreamsegmenter
MPEG2TS output, the mediastreamsegmenter tool also groups audio
samples together to form larger PES packets, however, it places this
group of audio samples in the correct location in the MPEG2TS stream
(DTS and PTS < PCR). I suspect they buffer both audio samples and
video frames to be able to do this, using considerably more memory in
the process.

I work on an application that doesn't have the luxury to use extra
memory, so I patched the mpegts_write_pes function making a trade-off
which increases the output overhead slightly in some cases, but it
does not use extra memory. It also was a minor modification. In
mpegts_write_pes(), when it is determined a PCR will be written, which
is just for video frames, I check the AVStreams associated to the
AVFormatContext, and if they are buffering access units where the
first DTS is not less than the PCR value, I write out the buffered
access units before continuing to write out the PCR value. This keeps
everything in order.

The extra overhead bytes introduced is due to the additional stuffing
bytes which results from not filling the TS Packets as completely as
before with audio samples, and due to the additional PES headers. The
patch still allows audio samples to get grouped, but only while their
DTS value is less than the next PCR value mpegts_write_pes will add to
the output stream. With a max_delay of 50ms, I noticed an increase in
bytes of 0.8%. With a max_dealy of 1000ms, I didn't see an increase in
overhead since this allows more audio samples to get grouped.

I suspect this patch isn't general purpose enough, some would rather
use more memory instead of increasing the potential container
overhead, for example. Also, this was just tested with ffmpeg 0.5
since that is the version we are currently working with here, but I
forward ported it below to the latest version and it is shown below to
see if anyone has comments regarding it.

Thank you,
Will

--- mpegtsenc.c	2011-11-01 11:00:36.000000000 -0700
+++ mpegtsenc.c.new	2011-11-02 15:56:00.000000000 -0700
@@ -772,6 +772,9 @@ static void mpegts_write_pes(AVFormatCon
     int afc_len, stuffing_len;
     int64_t pcr = -1; /* avoid warning */
     int64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE);
+    AVStream *st2 = 0;
+    MpegTSWriteStream *ts_st2 = 0;
+    int payload_index = 0;

     is_start = 1;
     while (payload_size > 0) {
@@ -788,6 +791,34 @@ static void mpegts_write_pes(AVFormatCon
             }
         }

+        if (write_pcr) {
+          if (ts->mux_rate > 1)
+              pcr = get_pcr(ts, s->pb);
+          else
+              pcr = (dts - delay)*300;
+
+          if (dts != AV_NOPTS_VALUE && dts < pcr / 300)
+              av_log(s, AV_LOG_WARNING, "dts < pcr, TS is invalid\n");
+
+          /* if other streams are buffering data before this pcr, write that
+             data to the stream first */
+          for (i = 0; i < s->nb_streams; ++i) {
+              st2 = s->streams[i];
+
+              if (st2 == st)
+                  continue;
+
+              ts_st2 = st2->priv_data;
+
+              if (ts_st2->payload_index && pcr / 300 >= ts_st2->payload_dts) {
+                  payload_index = ts_st2->payload_index;
+                  ts_st2->payload_index = 0;
+                  mpegts_write_pes(s, st2, ts_st2->payload, payload_index,
+                                   ts_st2->payload_pts, ts_st2->payload_dts);
+            }
+          }
+        }
+
         if (ts->mux_rate > 1 && dts != AV_NOPTS_VALUE &&
             (dts - get_pcr(ts, s->pb)/300) > delay) {
             /* pcr insert gets priority over null packet insert */
@@ -819,12 +850,6 @@ static void mpegts_write_pes(AVFormatCon
             set_af_flag(buf, 0x10);
             q = get_ts_payload_start(buf);
             // add 11, pcr references the last byte of program clock
reference base
-            if (ts->mux_rate > 1)
-                pcr = get_pcr(ts, s->pb);
-            else
-                pcr = (dts - delay)*300;
-            if (dts != AV_NOPTS_VALUE && dts < pcr / 300)
-                av_log(s, AV_LOG_WARNING, "dts < pcr, TS is invalid\n");
             extend_af(buf, write_pcr_bits(q, pcr));
             q = get_ts_payload_start(buf);
         }