[FFmpeg-devel] [RFC] Support multiple frames in a singe AVPacket in avcodec_decode_subtitle2

Wed Oct 16 21:42:48 CEST 2013

On Wed, Oct 16, 2013 at 08:48:56PM +0200, wm4 wrote:
> On Wed, 16 Oct 2013 14:15:51 -0400
> "Ronald S. Bultje" <rsbultje at gmail.com> wrote:
> 
> > Hi,
> > 
> > On Wed, Oct 16, 2013 at 1:38 PM, wm4 <nfxjfg at googlemail.com> wrote:
> > 
> > > On Tue, 1 Oct 2013 23:16:01 +0200 (CEST)
> > > Marton Balint <cus at passwd.hu> wrote:
> > >
> > > > Hi,
> > > >
> > > > When I implemented the DVB teletext decoder, I faced a problem: If
> > > > multiple teletext pages are in a single teletext packet, the decoder has
> > > > no way to return multiple AVSubtitles. So the current decoder only return
> > > > one AVSubtitle in that case, an AVSubtitle containing the first decoded
> > > > page from the packet.
> > > >
> > > > This is not a problem if the user wants to decode only a single teletext
> > > > page (subtitle page), because the same page is not sent twice in a single
> > > > packet. However, if somebody wants to decode all pages, he probably won't
> > > > be able to do so without losing a page here or there.
> > > >
> > > > I could have split the teletext PES packets (usually around 1472 bytes)
> > > at
> > > > the demuxer level to 46-byte packets to overcome this, but I thought it
> > > > would be much better to extend the API the same way it is used now for
> > > > audio decoding, where a single packet can contain multiple frames.
> > > >
> > > > If I combine this with CODEC_CAP_DELAY, the teletext decoder can store
> > > the
> > > > remaining pages of a teletext packet (unfortunately libzvbi parses all
> > > > pages in the packet in a single pass), and return them to the user on the
> > > > next call to avcodec_decode_subtitle2. In that case the decoder obviously
> > > > would not consume anything from the next packet until its buffer
> > > > containing teletext pages from the previous packet is not empty.
> > > >
> > > > If we do this, we will have to make sure that the current subtitle
> > > > decoders will always return the full buffer size as the number of
> > > consumed
> > > > bytes. I've checked, and it seems that only 3 decoders are problematic,
> > > > but they only need a one-line patch to fix them. Movtext (patch is
> > > already
> > > > on the mailing list), srtdec and dvbsub are the three.
> > > >
> > > > So, what do you think?
> > >
> > > Sounds like a bad idea.
> > >
> > > First, this kind of partial packet decoding seems to be in decline in
> > > ffmpeg. Video doesn't use it anymore, audio uses it only for some
> > > obscure formats (hopefully one day it won't require this anymore). It's
> > > also additional pain for the user to keep around a packet and to slice
> > > it. This is pretty unintuitive API and increases the amount of
> > > boilerplate needed to decode something. It's also not entirely robust
> > > and foolproof. And now you want to introduce a new API which uses this
> > > API anti-pattern?
> > >
> > > Second, the API is in need for a better design. AVSubtitle still sucks,
> > > and I'm very doubtful about how subtitle->ASS conversion is done. I
> > > think the next iteration of the subtitle API should fix this, and not
> > > just be another shot in the dark just to make teletext work for now.
> > >
> > > Are you sure there's no better way to shoehorn proper teletext decoding
> > > into ffmpeg?
> > 
> > 
> > Video and audio are different in that the subpackets for e.g. voice audio
> > are in the realm of several tens of bytes (e.g. 50 byte), which means the
> > (memory/cpu cycle) overhead of giving each packet its own AVPacket
> > container would be highly disproportionate. For video, packet size is
> > several orders of magnitude more than that, so the tradeoff is entirely
> > different between the two - hence the expected optimal (and therefore
> > proposed) solution is different.
> 
> True. Though it seems that often audio is split into subpackets by
> libavformat (or even the container) anyway.
> 
> Is there any reason why avcodec_decode_audio4 can't decode all
> subpackets at once, instead of having the user do repeated calls? Since
> each decode call produce an AVFrame, I figure this would be more
> efficient in general (for the same reasons as you cited).
> 

> > This is why back when we introduced this, we chose to have voice codecs
> > implement the design that you call an "anti-pattern", but keep things
> > as-they-were for video codecs.
> > 
> > So where does text fit in here? I'd say it's closer to audio, so it makes
> > more sense to use the audio approach.
> 
> Text is even lower bandwidth (unless I'm underestimating teletext and

If i didnt misguess/calculate then old teletext in PAL would be around
200kbit/sec of 7bit text
and it has to be for upto 800 or so pages that tv sets with tiny
buffers could comfortably display, one wouldnt want to wait for 5min
each time one types in another page number
I assume the stuff in mpeg-ts is not lower bitrate but ive not checked

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20131016/f721de2d/attachment.asc>