[FFmpeg-devel] [Patch] QT RLE encoder, bis

Alexis Ballier alexis.ballier
Mon Jun 25 23:18:15 CEST 2007


> > There are some things that might need comments :
> > - I'm using the AVContext rc_max_rate variable to control whether to
> > use the optimal coding algorithm or the heuristic, that's the best
> > option I found but someone might have a better idea / not like this
> > one.
>
> such missuse of rc_max_rate is not ok, there are many better options
> also maybe we can simply make the optimal coder fast enough to avoid
> the heuristic one ...

+1 for faster code, see below


> > +    for(i=width-1; i>=0; i--) {
> > +
> > +        if((s->frame).key_frame)
> > +                skipcount=0;
> > +        else skipcount=scan_equal(this_line, prev_line, FFMIN(width - i,MAX_RLE_SKIP), s->pixel_size, 0);
> > +
> > +        if(skipcount>0)
> > +                total_skip_cost = (s->length_table[i+skipcount] + 2);
> > +        else total_skip_cost = width;
> > +
> > +
> > +        if(i<width-1) repeatcount=scan_equal(this_line , this_line + s->pixel_size, FFMIN(width-i-1,MAX_RLE_REPEAT-1), s->pixel_size, 0) + 1;
> > +        else repeatcount=1;
>
> you dont have to run scan_equal() for each pixel, you already know how many
> are equal following the next pixel

Nice catch, shamely I didn't think about optimizing that, should be
better now, basic benchs showed no less than 40% perf gain ^^
I also removed the "total_skip_cost = width" total nonsense

Also added a new table to keep skipcount computed values, so that
there is no need to compute it again while writing to buf
That allows the removal of scan_equal

> [...]
> > +        for(j=2;j<=limit;j++){
> > +                temp_cost += s->pixel_size;
> > +                if(s->length_table[i+j] + temp_cost < total_bulk_cost){
> > +                        /* We have found a better bulk copy ... */
> > +                        total_bulk_cost = s->length_table[i+j] + temp_cost;
> > +                        bulkcount = j;
> > +                }
> > +        }
>
> its not needed to do this from 2..limit
> this can instead be stoped much earlier if its sufficiently worse compared
> to any other method
> also you dont even need to start the bulk search if there are enough repeats
> or skips as the case of repeat/skip + later raw will then always be better

Good point, I thought there were special cases where this did not
work, but as even "skip one pixel" or "repeat twice" + bulk is cheaper
(or the same) than having the whole bulk, I modified the code to match
this. [note that skip one pixel + bulk is cheaper only because
pixel_size >= 2 (the cost of a skip sequence), thus if it is used to
encode something with strictly less that 16 bits per pixel it might
lose its optimality]
Again, there were perf gains here.

> > +    while(i<width){
> > +            rlecode = (signed char)s->rlecode_table[i];
>
> the cast is unneeded or your code is broken

was most likely the later, as in the spec a rlecode is signed, I
changed the rlecode_table to match this

>
> [...]
> > +/** Dumps frame including header */
> > +static int dump_frame(QtrleEncContext *s, AVFrame *p, uint8_t *buf)
>
> encode_frame()

not sure what you meant there, I just renamed it to encode_frame.
Do you mean that this should be merged into qtrle_encode_frame() ?

> > +{
> > +    int i;
> > +    int start_line=0;
> > +    int end_line=s->avctx->height;
> > +    uint8_t *orig_buf=buf;
> > +
> > +    if(!((s->frame).key_frame)){
>
> superflous () and there are many more in the patch

Yep, most likely that I'm too paranoid with the parser that might
understand it wrongly, I removed some other () aswell

> > +
> > +    /* save the current frame */
> > +    memcpy(s->previous_frame, p->data[0], avctx->height*avctx->width*s->pixel_size);
>
> this is wrong if linesize != width*pixel_size

Indeed
I changed the "previous_frame" to an AVPicture, copying it with
av_picture_copy, I hope I got it right this time [one of the previous
patches used img_copy which I didn't want to use due to the attribute
deprecated]

The other minor things should also be fixed in enclosed patch


In the end, the encoding time is almost divided by 2, so I just
removed the heuristic and the hack to switch between the optimal &
heuristic. Basic tests showed that the speed gain is even more
important with screencasts (difference between pictures is very small
thus not searching for every possible bulk copy helps a lot), I can
now do realtime encoding with this algorithm at 1024x768 at 25fps :)


Regards,


Alexis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qtrle_encoder_try2.patch
Type: text/x-patch
Size: 14037 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070625/18ae4569/attachment.bin>



More information about the ffmpeg-devel mailing list