[FFmpeg-devel] mpegts_write_header(), the buffering and thus reordering of audio samples.

Michael Niedermayer michaelni at gmx.at
Thu Nov 3 20:09:06 CET 2011

On Wed, Nov 02, 2011 at 04:13:57PM -0700, Will Korbe wrote:
> On Tue, Nov 1, 2011 at 11:39 AM, Will Korbe <wkorbe at gmail.com> wrote:
> > On Mon, Oct 31, 2011 at 7:41 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> On Mon, Oct 31, 2011 at 06:24:07PM -0700, Will Korbe wrote:
> >>> Hello,
> >>>
> >>> I hopefully have a minor question about the
> >>> libavformat/mpegtssec.c:mpegts_write_header() implementation. My hours
> >>> of online searching and specification reading for the answer has been
> >>> unsuccessful.
> >>>
> >>> I see there is an efficiency built in to bundle multiple audio samples
> >>> into one PES packet and was wondering if someone could point out a
> >>> paragraph from the MPEG2 specification which indicates the reordering
> >>> of TS Packets this causes is okay? I have a video with many 41 byte
> >>> audio samples at the start, and this seems to push the first audio
> >>> samples 5 seconds into the output. At this point, the PTS and DTS for
> >>> the audio samples are beyond the last PCR value by 5 seconds. The
> >>> video plays fine in QuickTime, the audio and video are precisely in
> >>> sync, so I assume QuickTime has logic to buffer the video frames until
> >>> seeming the first audio sample to account for this, but I wonder if
> >>> most MPEG2TS players handle this the same way. If this is spelled out
> >>> in the specification, I would sure feel more comfortable with it.
> >>>
> >>> Any information would be useful.
> >>
> >> I dont remember a tight limit on the PCR-DTS distance in the spec but
> >> then i dont remember the spec very well ...
> >> but having this large is bad because this means a long startup delay.
> >> So 5 seconds is IMHO not appropriate by default.
> >> Of course that delay would only affect actual broadcast and not playing
> >> from a local file with a modern player.
> >> the user parameter max_delay should be able to limit this, if not
> >> a bugreport & patch are welcome
> >>
> >>
> >> [...]
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>
> >> Asymptotically faster algorithms should always be preferred if you have
> >> asymptotical amounts of data
> >>
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel at ffmpeg.org
> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >>
> >
> > Hi Michael,
> >
> > Thank you for your response. From what I can tell, max_delay does not
> > limit the amount of audio sample buffering, rather, it is a macro,
> > DEFAULT_PES_PAYLOAD_SIZE (2930 bytes). In my test case,
> > AVFormatContext::max_delay is set to 50000 (50 ms). My sample video is
> > actually a movie trailer with close to 5 seconds of silence before the
> > audio sample byte sizes become substantial, after that point, the
> > audio samples are large enough that the PTS/DTS values become just 500
> > ms behind the last PCR. Given this, it seems when there are
> > multi-second periods of silence, the PTS/DTS of an audio sample has
> > the potential to become multi-seconds behind the last PCR in the TS
> > Packet Stream. As you mentioned, this may be acceptable with regards
> > to the specification and/or current playback implementations, however,
> > to attempt to handle more cases, perhaps the buffering of audio
> > samples should be limited by max_delay as you suggest, that would seem
> > to make it more inline with the mpeg2 timing model, but I'm not and
> > expert in this area. I plan to investigate further to see if a BUG
> > report is warranted.
> >
> > Thank you,
> > Will
> >
> When I study the MPEG2 timing and buffering model, it would seem to
> indicate the PCR value is the system-clock used to synchronize access
> units (audio samples and video frames). For the decoding side, the
> only delay specified is the buffers after the demux step, apparently
> one for the video frames, and one for audio samples. If the MPEG2TS
> stream was processed serially where the PCR value indicates the
> current system time as it is received in the MPEGTS stream, then in
> the audio or video buffers, if any access units have a DTS value which
> matches the current PCR, those access units are instantly removed from
> the buffer and decoded. If we don't have a DTS, the PTS is considered.
> When any of the access units have a PTS equal to the current PCR, then
> those access units are presented. Given this, it seems apparent to me
> that this is not how some players are implemented, otherwise the audio
> would arrive late, after the video was decoded and presented, and
> would have been dropped. I suspect players have an additional layer of
> logic and an initial delay built-in to handle MPEG2TS streams that are
> not quite spec compliant, otherwise my test video stream wouldn't have
> played correctly. If anyone knows more about this and can point out a
> flaw in my reasoning, please let me know.
> If I compare ffmpeg's MPEG2TS output with Apple's mediastreamsegmenter
> MPEG2TS output, the mediastreamsegmenter tool also groups audio
> samples together to form larger PES packets, however, it places this
> group of audio samples in the correct location in the MPEG2TS stream
> (DTS and PTS < PCR). I suspect they buffer both audio samples and
> video frames to be able to do this, using considerably more memory in
> the process.
> I work on an application that doesn't have the luxury to use extra
> memory, so I patched the mpegts_write_pes function making a trade-off
> which increases the output overhead slightly in some cases, but it
> does not use extra memory. It also was a minor modification. In
> mpegts_write_pes(), when it is determined a PCR will be written, which
> is just for video frames, I check the AVStreams associated to the
> AVFormatContext, and if they are buffering access units where the
> first DTS is not less than the PCR value, I write out the buffered
> access units before continuing to write out the PCR value. This keeps
> everything in order.
> The extra overhead bytes introduced is due to the additional stuffing
> bytes which results from not filling the TS Packets as completely as
> before with audio samples, and due to the additional PES headers. The
> patch still allows audio samples to get grouped, but only while their
> DTS value is less than the next PCR value mpegts_write_pes will add to
> the output stream. With a max_delay of 50ms, I noticed an increase in
> bytes of 0.8%. With a max_dealy of 1000ms, I didn't see an increase in
> overhead since this allows more audio samples to get grouped.
> I suspect this patch isn't general purpose enough, some would rather
> use more memory instead of increasing the potential container
> overhead, for example. Also, this was just tested with ffmpeg 0.5
> since that is the version we are currently working with here, but I
> forward ported it below to the latest version and it is shown below to
> see if anyone has comments regarding it.

i have some comments
i think write_pcr can be set later and pcr is not set then and thus
used uninitialized

also somehow it feels like the code would fit better in
I mean calling mpegts_write_pes() for all streams out of
mpegts_write_pes() itself feels like a recipe for overflowing the
stack if theres some way to cause this to call too deep recursively 

and i think the code needs to be always called not just when
write_pcr is 1

either way the patch is a needed bugfix and if you improve it iam
happy to include it

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111103/3d34c512/attachment.asc>

More information about the ffmpeg-devel mailing list