[FFmpeg-devel] [PATCH] VC1: merge put/add_pixels with IDCT8x8.

Kostya kostya.shishkov
Sat Feb 19 18:45:02 CET 2011


On Sat, Feb 19, 2011 at 12:27:16PM -0500, Ronald S. Bultje wrote:
> Just quick note here:
> 
> On Sat, Feb 19, 2011 at 12:20 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> > -static void vc1_put_block(VC1Context *v, DCTELEM block[6][64])
> [..]
> > - ? ?if(v->rangeredfrm) {
> > - ? ? ? ?int i, j, k;
> > - ? ? ? ?for(k = 0; k < 6; k++)
> > - ? ? ? ? ? ?for(j = 0; j < 8; j++)
> > - ? ? ? ? ? ? ? ?for(i = 0; i < 8; i++)
> > - ? ? ? ? ? ? ? ? ? ?block[k][i + j*8] = (block[k][i + j*8] - 64) << 1;
> > -
> > - ? ?}
> [..]
> > ?static void vc1_decode_i_blocks(VC1Context *v)
> [..]
> > + ? ?idct8x8_fn = v->vc1dsp.vc1_inv_trans_8x8_put[(!!v->rangeredfrm) | ((v->pq < 9 || !v->overlap) << 1)];
> [..]
> [..]
> > - ? ? ? ? ? ? ? ?v->vc1dsp.vc1_inv_trans_8x8(s->block[k]);
> > - ? ? ? ? ? ? ? ?if(v->pq >= 9 && v->overlap) {
> > - ? ? ? ? ? ? ? ? ? ?for(j = 0; j < 64; j++) s->block[k][j] += 128;
> > - ? ? ? ? ? ? ? ?}
> > + ? ? ? ? ? ? ? ?if(k > 3 && (v->s.flags & CODEC_FLAG_GRAY)) continue;
> > + ? ? ? ? ? ? ? ?idct8x8_fn(dst[k],
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? k & 4 ? s->uvlinesize : s->linesize,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? s->block[k]);
> > ? ? ? ? ? ? }
> >
> > - ? ? ? ? ? ?vc1_put_block(v, s->block);
> 
> I know this isn't the same thing. The old code does this:
> 
> read coeffs
> do idct
> if (v->pq >= 9 && v->overlap) add 128 to each of them
> if (rangered) sub 64 from each of them, <<= 1
> put_pixels_clamped();
> 
> the new code does this, essentially:
> read coeffs
> do idct
> if (v->pq >= 9 && v->overlap) add 128 to each of them
> if (rangered) { if (v->pq >= 9 && v->overlap) { sub 64 from each} <<= 1 }
> put_pixels_clamped();

That's wrong:
clause 8.1.1.4 specified range restoration is performed like
Y[n] = CLIP ((Y[n] ? 128) * 2 + 128)
when range reduction flag is set at the sequence level

Overlapping (and adding 128) is employed when it's signalled, i.e.
v->overlap && v->pq >= 9,
and it's totally independent from range reduction.
 
> and then the add 128 is done through using
> put_signed_pixels_clamped(), which does that internally, instead of
> adding it manually and using put_pixels_clamped(). I'm almost certain
> this is a bug in the old code, or maybe a bug in the spec, or
> something along those lines, but Kostya, could you please confirm this
> change makes sense when you review this?

I can't, see the reason above.

> Also note that this change likely slows the code down in some way,
> because we're temporarily not using
> put_{,signed_}pixels_clamped_mmx(). However, this work allows
> integrating several loops together which will make the final work,
> after MMX/SSE optimizations, a lot faster than without this. Thanks to
> Jason for the suggestions.

Yes, somehow that idct_{put,add} idea passed by me in 2006 when I worked
on decoder. Probably because they require a lot of crazy stuff between
transform and output.
 
> Ronald



More information about the ffmpeg-devel mailing list