[FFmpeg-devel] [RFC] Support multiple frames in a singe AVPacket in avcodec_decode_subtitle2

Marton Balint cus at passwd.hu
Sun Oct 20 17:00:24 CEST 2013

On Thu, 17 Oct 2013, Marton Balint wrote:
> On Thu, 17 Oct 2013, wm4 wrote:
>> On Wed, 16 Oct 2013 23:41:37 +0200 (CEST)
>> Marton Balint <cus at passwd.hu> wrote:
>>> On Wed, 16 Oct 2013, wm4 wrote:
>>>> On Wed, 16 Oct 2013 14:15:51 -0400
>>>> "Ronald S. Bultje" <rsbultje at gmail.com> wrote:
>>>>> Hi,
>>>>> On Wed, Oct 16, 2013 at 1:38 PM, wm4 <nfxjfg at googlemail.com> wrote:
>>>>>> On Tue, 1 Oct 2013 23:16:01 +0200 (CEST)
>>>>>> Marton Balint <cus at passwd.hu> wrote:
>>>>>>> Hi,
>>>>>>> When I implemented the DVB teletext decoder, I faced a problem: If
>>>>>>> multiple teletext pages are in a single teletext packet, the decoder 
>>>>>>> has
>>>>>>> no way to return multiple AVSubtitles. So the current decoder only 
>>>>>>> return
>>>>>>> one AVSubtitle in that case, an AVSubtitle containing the first 
>>>>>>> decoded
>>>>>>> page from the packet.
>>>>>>> This is not a problem if the user wants to decode only a single 
>>>>>>> teletext
>>>>>>> page (subtitle page), because the same page is not sent twice in a 
>>>>>>> single
>>>>>>> packet. However, if somebody wants to decode all pages, he probably 
>>>>>>> won't
>>>>>>> be able to do so without losing a page here or there.
>>>>>>> I could have split the teletext PES packets (usually around 1472 
>>>>>>> bytes)
>>>>>> at
>>>>>>> the demuxer level to 46-byte packets to overcome this, but I thought 
>>>>>>> it
>>>>>>> would be much better to extend the API the same way it is used now for
>>>>>>> audio decoding, where a single packet can contain multiple frames.
>>>>>>> If I combine this with CODEC_CAP_DELAY, the teletext decoder can store
>>>>>> the
>>>>>>> remaining pages of a teletext packet (unfortunately libzvbi parses all
>>>>>>> pages in the packet in a single pass), and return them to the user on 
>>>>>>> the
>>>>>>> next call to avcodec_decode_subtitle2. In that case the decoder 
>>>>>>> obviously
>>>>>>> would not consume anything from the next packet until its buffer
>>>>>>> containing teletext pages from the previous packet is not empty.
>>>>>>> If we do this, we will have to make sure that the current subtitle
>>>>>>> decoders will always return the full buffer size as the number of
>>>>>> consumed
>>>>>>> bytes. I've checked, and it seems that only 3 decoders are 
>>>>>>> problematic,
>>>>>>> but they only need a one-line patch to fix them. Movtext (patch is
>>>>>> already
>>>>>>> on the mailing list), srtdec and dvbsub are the three.
>>>>>>> So, what do you think?
>>>>>> Sounds like a bad idea.
>>>>>> First, this kind of partial packet decoding seems to be in decline in
>>>>>> ffmpeg. Video doesn't use it anymore, audio uses it only for some
>>>>>> obscure formats (hopefully one day it won't require this anymore). It's
>>>>>> also additional pain for the user to keep around a packet and to slice
>>>>>> it. This is pretty unintuitive API and increases the amount of
>>>>>> boilerplate needed to decode something. It's also not entirely robust
>>>>>> and foolproof. And now you want to introduce a new API which uses this
>>>>>> API anti-pattern?
>>>>>> Second, the API is in need for a better design. AVSubtitle still sucks,
>>>>>> and I'm very doubtful about how subtitle->ASS conversion is done. I
>>>>>> think the next iteration of the subtitle API should fix this, and not
>>>>>> just be another shot in the dark just to make teletext work for now.
>>>>>> Are you sure there's no better way to shoehorn proper teletext decoding
>>>>>> into ffmpeg?
>>>>> Video and audio are different in that the subpackets for e.g. voice 
>>>>> audio
>>>>> are in the realm of several tens of bytes (e.g. 50 byte), which means 
>>>>> the
>>>>> (memory/cpu cycle) overhead of giving each packet its own AVPacket
>>>>> container would be highly disproportionate. For video, packet size is
>>>>> several orders of magnitude more than that, so the tradeoff is entirely
>>>>> different between the two - hence the expected optimal (and therefore
>>>>> proposed) solution is different.
>>>> True. Though it seems that often audio is split into subpackets by
>>>> libavformat (or even the container) anyway.
>>>> Is there any reason why avcodec_decode_audio4 can't decode all
>>>> subpackets at once, instead of having the user do repeated calls? Since
>>>> each decode call produce an AVFrame, I figure this would be more
>>>> efficient in general (for the same reasons as you cited).
>>> Well, I don't know, if there is a real-world example for this, but
>>> subframes may use different sample rate, or may have different metadata,
>>> or may have different decode_error_flags.
>> Most more sophisticated codecs already use split packets anyway. And
>> there's only a small amount of obscure audio codecs which still need
>> this kind of incremental audio decoding.
>>>>> This is why back when we introduced this, we chose to have voice codecs
>>>>> implement the design that you call an "anti-pattern", but keep things
>>>>> as-they-were for video codecs.
>>>>> So where does text fit in here? I'd say it's closer to audio, so it 
>>>>> makes
>>>>> more sense to use the audio approach.
>>>> Text is even lower bandwidth (unless I'm underestimating teletext and
>>>> dvb), so just directly wrapping each subpacket as separate AVPacket
>>>> might not be too bad.
>>> Actually the DVB teletext packet has 1 byte data identifier, and at most
>>> 31*46 bytes of subpackets. Let's just forget we may need the 1 byte data
>>> identifier, if we look at the subpackets, for 25 fps content it means
>>> 31*25 = 775 46-byte AVPackets per second, which is roughly 285 kbps.
>>> But even if we split the packets, libzvbi has a nice feature, it only
>>> decodes teletext pages after it received all the subpackets, all the VBI
>>> lines from a frame, and it knows that is the case, when it receives a
>>> subpacket belonging to a VBI line which is already buffered.
>>> So in practice libzvbi outputs all teletext pages of a single frame at
>>> once. Yes, you can say that the libzvbi is braindead, but tell me now,
>>> that you will never need to support a codec, where a single packet will
>>> mean the end of more than one frame.
>> OK, so teletext really just gravely mismatches with how the API works.
>> What is actually needed here is some sort of stateful decoder, and the
>> application should be able to retrieve arbitrary teletext pages at any
>> time.
> I did teletext support basically for teletext subtitling, I totally 
> understand that for TV-like usage, due to the current limitations of the 
> ffmpeg API, it is not optimal. However the point I tried to make here is that 
> if you ever want to support a codec where a single AVPacket can mean the end 
> of more than one AVSubtitle, you won't be able to do that.
>>> That is why I still believe that this approach makes the API future-proof.
>>> Yes, it is less simple this way. I think what you miss is a decoder helper
>>> API (on top of the existing one), which works somewhat like how
>>> filtering/buffersrc/buffersink works now. You push one AVPacket into the
>>> decoder, and you call the decode function until EAGAIN is returned, so you
>>> will know that you have to push the next AVPacket.
>> Huh? No, not at all. I just don't want an awfully tricky API of the
>> kind that makes people copy&paste ffplay code. (I don't, but it's a
>> common pattern.)
> What I meant is you can add helper functions to the API to hide non-trivial 
> parts of the decoding.
>>>> Also, unlike audio, subtitles aren't really
>>>> "continuous", and you can't just append the decoded data. I'm not sure
>>>> how teletext is supposed to work here: is each AVSubtitle a separate
>>>> page?
>>> Yes, exactly.
>>>> Or do you have to display all AVSubtitles at the same time?
>>>> Depending on what it is, the API might actually be more fundamentally
>>>> inappropriate, and avcodec_decode_subtitle3 would merely solve the
>>>> technical problem of having to deal with subpackets. I keep wondering:
>>>> how are applications supposed to use this at all?
>>> It's for the users/applications to decide. Teletext based subtitling works
>>> the same way as any subtitling in ffmpeg, definitely a useful feature for
>>> everybody. Adding support for other kinds of pages to the decoder was no
>>> big deal, and may be useful for somebody, even if it is not the best way
>>> to add teletext support to your application at the moment.
>>> Multi-page decoding is a bit broken at the moment. Even if it is rarely
>>> used, or rarely will be used, it's worth fixing IMHO, and we can only do
>>> that by extending the API.
>> What users expect from the subtitle decoder API is that it returns
>> timed subtitle events with an start and end time, not somehow multiple
>> subtitle events all at once (that can't even be shown at once). In my
>> opinion, this gravely violates the promises the subtitle decoder API
>> makes, even if the API is actually not documented as having these
>> requirements.
>> Or in other words: the API would behave completely unexpectedly, even
>> if you might say that it's technically correct.
> I think it is better to have it this way, than to not have it at all.
>> This will probably end up with every ffmpeg user having to implement
>> their own teletext pager logic (including page cache) to get proper
>> behavior. Except most just won't bother with this unwieldy stuff.
>> Using libzvbi manually (instead of the libavcodec wrapper) will be
>> easier.
> And they are more than welcome to do that, if ffmpeg API is not right for 
> their purposes. Or they will extend the API themselves if they find it 
> lacking features they want to use, like I did.
> Regards,
> Marton

Okay, I think I came up with an idea, which should be acceptable for 
everyone, and also fixes the problem for the most common cases. (At least 
for all the teletext streams I could get my hands on)

Let's just add CODEC_CAP_DELAY support, don't add CODEC_CAP_SUBFRAMES. It 
enables the teletext decoder to buffer pages. Only problem is that if 
there are only short pages (shorter than a packet), then the buffer may 
contain more and more pages. Fortunately for real-life streams, that 
rarely happens, broadcasters usually do not use all the VBI lines they 
could (less teletext data in the packet), so accroding to my experience 
with real streams, the buffer will be emptied sooner or later.

That is why I decided to add a limitation to the number of the buffered 
pages, so even if we encounter such a stream, we will not consume infinite 
amount of memory, but report an error.

I will post the new patches soon.


More information about the ffmpeg-devel mailing list