[FFmpeg-devel] [PATCH] libvpx: alt reference frame / lag

Thu Jun 17 04:30:50 CEST 2010

On Wed, Jun 16, 2010 at 3:28 PM, Reimar D?ffinger
<Reimar.Doeffinger at gmx.de> wrote:
> On Wed, Jun 16, 2010 at 02:52:17PM -0400, John Koleszar wrote:
>> Allowing the application to run
>> more frequently is always a good thing, especially on embedded
>> platforms, for single threaded applications.
>
> The it makes more sense to provide and API to pass immediately
> whatever it is there to the decoder instead of having to wait
> for a full frame.
>

Maybe. There are applications in both cases, where the data takes a
while to come in and you want to get started early, and where the data
is all available up front but takes a while to decode. Lots of
applications are only set up this way. Multiple packets does exactly
what you suggest, but on well defined boundaries.

>> These frames are frames in every sense except that they're not
>> displayed on their own. They're not just a fancy header. Here's an
>> example: You can have an ARF that's taken from some future frame and
>> not displayed. Then later, when that source frame's PTS is reached,
>> code a non-ARF frame that has no residual data at all, effectively a
>> header saying "present the ARF buffer now." Which packet do you call
>> the "frame" and which is the "header" in that case?
>
> The one that you put into a the decoder and then you get a frame out
> is the frame, and it is the only real frame.
> It doesn't look like it for the decoder, but why do you want to force
> your users by all means to have to bother with _internals_ of your codec?
> Yes, there may be advanced users that might need more control, but
> why should the ordinary users have to pay the price in complexity for them?
> The first rule IMO is still "keep simple things simple".
>

I agree with that rule. A lot of internals are exposed, but not in a
way that you need to be bothered with them. The as-simple-as-possible
simple_encode.c and simple_decoder.c examples are pretty simple. This
is my turn to not know what you're talking about -- if there are
things in the interface that could be better, I want to know about
them, and I want to fix them. I've heard hints about this, but nothing
concrete.

I still think this multiple packet approach is very much KISS, and
it's not just libvpx that it's simple for. The other part of that rule
is "make it as simple as possible, but no simpler."

>> >> A packet stream is a clean abstraction that everybody
>> >> understands, the only twist here is that not all packets are
>> >> displayed.
>> >
>> > That argument works just as well for claiming that e.g. for JPEG
>> > the SOI, EOI etc. should each be in a separate packet.
>> > Or that for H.264 each slice should go into its own packet, after
>> > all someone might want to decode only the middle slice for some
>> > reason.
>>
>> That data is all related to the same frame. An ARF is not necessarily
>> related to the frame preceding or following it.
>
> Neither are most of the time things like SPS and PPS for H.264.
> At least we still don't put the into a separate packet (well, in
> extradata for formats where it is possible, but only because
> it does not change).
>
>> There are existing
>> applications that very much care about the contents of each reference
>> buffer and what's in each packet, this isn't a hypothetical like
>> decoding a single slice.
>
> Which applications exactly? What exactly are they doing? And why exactly
> do they absolutely need to have things in a separate packet?

I'm not going to name names, but I'm talking specifically about video
conferencing applications. I should have been more precise here --
these applications aren't using invisible frames today (though they do
use the alt-ref buffer) but I called them out because they're the type
of applications that are *very* concerned with what's going on in the
encoder, they will want to use invisible frames in the future, and
they'll need access to the frames in the most fine-grained way
possible.

I've been through a lot of the advantages of keeping the data
separate, but it mostly boils down to staying out of the way of
applications that know what they're doing and providing a flexible
interface. It's less of a win for the file playback case than some of
the others, but it's not useless, it works in all the players and apps
I know about, it has a clean implementation in a couple containers (if
you count updating nut). You could of course have two modes, but if
the only practical value would be being able to mux into other
containers, I'm not convinced that outweighs the additional
complexity, and I think there are more interesting things to be
working on right now.