[Ffmpeg-devel] ratecontrol advice

Tue Jul 26 04:12:19 CEST 2005

Hi folks,

After a bit of a hiatus, I finally got 'round to trying to cap frame sizes
generated by ffmpeg in 1-pass h.263 encoding.  Unfortunately, things did
not go smoothly - I tried Loren's advice of:

Loren Merrit wrote:
> But if you don't mind modifying a few lines of code, look in
> modify_qscale() line 435 or so. Currently that limits per-frame bits
> based on the number of bits left in the VBV buffer; you could put in your
> own cap similarly. (Warning: anything done by a 1pass encode will be an
> approximate/soft limit, since you don't know in advance excatly how many
> bits a given QP will take. So you'll need to fudge the limit a little if
> you care about strict compliance.)

...but unfortunately the predicted size of a frame given by qp2bits(rce, q) 
in modify_qscale seems even more inaccurate than I feared.  The problem 
seems to be that the estimate of the frame size has to converge over time - 
which is fine for a longish talking-head style video; after ~60 frames it 
gets pretty accurate and you can detect potential large framesizes and 
clobber their QP.

However, when a video is made up of a series of different still images (e.g. 
non-scrolling credits) it takes much longer (~120 frames or so).  And whilst 
it's converging, it guesses the framesize pretty badly - 32 bits rather than 
the actual 14296 for the first frame, 500 bits rather than 25480 for the 
30th frame, 153315 bits rather than 74776 for the 77th frame, etc. 
(However, frame 91 is almost right (30767 v. 26312), and by frame 121 it's 
pretty close (30546 v. 30144)).

Am I using the wrong metric for estimating the end frame size in 1-pass 
encoding?   And does anyone know which of the various counters & predictors 
starts off initialised incorrectly, causing this convergence effect?

Alternatively, I need a way to tell the ratecontrol to encode a sudden 
change in static image as a small I + several incremental P frames - rather 
a single huge monolithic I frame and a subsequent string of 'empty' P 
frames.  Is there any way to force ffmpeg to encode in this way?

Rich Felker wrote:
> On Tue, May 10, 2005 at 12:55:44AM -0700, Loren Merritt wrote:
>> (Warning: anything done by a 1pass encode will be an approximate/soft 
>> limit, since you don't know in advance excatly how many bits a given QP
>>  will take. So you'll need to fudge the limit a little if you care
>> about strict compliance.)
> 
> While true, this isn't an inherent limitation, just a flaw in lavc's rate
> control engine. It's easy to implement strict limits in 1pass encoding:
> just repeatedly reencode the frame at different values of qp until you
> find one that gives the size you want. With binary search it shouldn't
> even be that slow...

I've also tried going down this line of attack, but it seems that ffmpeg 
doesn't make multiple executions of the encode_thread for a given frame very 
easy - all the rtp callbacks and bitstream output happen directly from 
within the thread.  I take this to mean that I have to completely isolate 
the encode_thread and buffer all its sideeffects in order to run it in a 
sandbox to see how big its output is going to be, and then re-run it with a 
higher QP as needed.  This seems relatively tricky - is there a better way 
of doing it, or is this the only way to go?

I've also tried doing a two-pass encode and fiddling the stats file between 
passes by boosting the perceived i_tex/p_tex/var on frames which were too 
big, hoping that the next pass would overcompensate and shrink them down. 
This doesn't seem to work at all - i'm assuming that some kind of blurring 
or smoothing is annihilating my hacky tweaks, or I'm completely 
misunderstanding how the multipass encoding is mean to work.

any suggestions would be enormously appreciated.

cheers;

M.