[FFmpeg-devel] [PATCH] 10-bit DNxHD support v4 review request

Joseph Artsimovich joseph at mirriad.com
Thu Apr 7 10:56:31 CEST 2011


Baptiste Coudurier wrote:

> >> bit_depth is a better name IMHO.
> > Well, this field was meant to indicate how the DSPContext was initialized (if
> at all).
> > For other purposes we have cid_table->bit_depth
> 
> I think you can check the current cid bit depth if set already.

The problem is that CID comes from the frame header, and therefore theoretically can change in the middle of a stream.  I know that's not currently supported, but at least my approach doesn't make it harder to add such support in the future.  I've evaluated a couple of alternative approaches and they either lead to  ugly and confusing code or require refactoring of existing code.

> >>> +static av_always_inline void dnxhd_10bit_get_pixels_8x8(DCTELEM
> >>> +*restrict block, const uint16_t *pixels, int line_size) {
> >>> +    for (int i = 0; i < 8; ++i) {
> >>> +        for (int j = 0; j < 8; ++j) {
> >>> +            block[j] = (DCTELEM)floor((float)pixels[j] * (1023.0f /
> >>> +65535.0f) + 0.5f);
> >>
> >> Btw, why are you using floats here ?
> > Hoping a compiler might vectorize this code.
> 
> We prefer using ints here, to avoid rounding difference on different archs.

I heard integer divison is quite expensive - more expensive than floating point multiplication + rounding.
Note that we are dividing by (2^16 - 1) not by 2^16, so it can't be replaced by a shift.
As for rounding, in the latest version of my patch I use lrintf().
This article http://www.mega-nerd.com/FPcast/ led me to believe it's safe to assume it always rounds towards the nearest whole number.  Are there platforms where that's not the case?

> >>> +            assert(nbits < ctx->cid_table->bit_depth + 4);
> >>
> >> assert would kill the program.
> > That would indicate a serious bug.  The assert merely checks we are not
> reading past the end of a table.
> > BTW, I've come by the following comment in mpegvideo_enc.c:
> > //non c quantize code returns incorrect block_last_index FIXME If still true,
> that could trigger this assert.
> >
> 
> It's a problem, but can this actually happen ?

It happened to me once while I was in the middle of implementing 10bit support, which made me introduce that assert.  It hasn't happened since.  I hope it was because I did something wrong back then.

> 
> >>> +// To get DCT coefficients multiplied by 4.
> >>
> >> Shouldn't we avoid returning scaled output, could be usefull for
> >> 11,12 bits, etc..
> > Well, I guess it's done to get more precise quantization.  I would be fine
> with having an unscaled version though.  After all, the range of DCT
> coefficients is already 3 bits larger than the range of input samples.
> >
> >
> >>> [...]
> >>>
> >>> diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h index
> >>> 3e447ab..eb2352f 100644
> >>> --- a/libavcodec/mpegvideo.h
> >>> +++ b/libavcodec/mpegvideo.h
> >>> @@ -49,7 +49,19 @@ enum OutputFormat {  #define MPEG_BUF_SIZE
> (16
> >> *
> >>> 1024)
> >>>
> >>>  #define QMAT_SHIFT_MMX 16
> >>> -#define QMAT_SHIFT 22
> >>> +
> >>> +/*
> >>> + * QMAT_SHIFT should be as high as possible, but not so high to
> >>> +cause
> >>> + * overflow in the following expression:
> >>> + * scaled_dct[i] * ((p / s) * ((1 << QMAT_SHIFT) / (qscale *
> >>> +weight_table[i]))))
> >>> + * where s is a scale factor applied to DCT coefficients while
> >>> + * p and weight_table are codec-specific.
> >>> + * For example, the worst case for 10-bit DNxHD would be:
> >>> + * p == 32, s == 4, qscale == 1, weight_table[i] == 32,
> >>> +scaled_dct[i] == (1 << 13) * 4
> >>> + * which gives (1 << 15) * (((1 << QMAT_SHIFT) >> 5) << 3)
> >>> + * which gives 18 as the largest safe QMAT_SHIFT.
> >>> + */
> >>> +#define QMAT_SHIFT 18
> >>>
> >>>  #define MAX_FCODE 7
> >>>  #define MAX_MV 2048
> >>> diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
> >>> index 9255fa8..3f46033 100644
> >>> --- a/libavcodec/mpegvideo_enc.c
> >>> +++ b/libavcodec/mpegvideo_enc.c
> >>> @@ -3725,9 +3725,14 @@ int dct_quantize_c(MpegEncContext *s,
> >>>              else
> >>>                  q = s->c_dc_scale;
> >>>              q = q << 3;
> >>> -        } else
> >>> +        } else {
> >>>              /* For AIC we skip quant/dequant of INTRADC */
> >>> -            q = 1 << 3;
> >>> +            if (s->dsp.fdct == ff_faandct_10bit_safe) {
> >>> +                q = 1 << 2;
> >>> +            } else {
> >>> +                q = 1 << 3; // The post-scale applied to a DCT output.
> >>> +            }
> >>> +        }
> >>>
> >>>          /* note: block[0] is assumed to be positive */
> >>>          block[0] = (block[0] + (q >> 1)) / q;
> >>
> >> dct_quantize_mmx/sse2/etc .. is used if present, see
> QMAX_SHIFT_MMX.
> > I specifically request FF_DCT_10BIT, so I won't get any of those.
> 
> Ultimately you can only replace the dct in dct_quantize.

There are also assembly versions of the whole dct_quantize function,
but they check for dct_algo as well, so requesting FF_DCT_10BIT would disable them:
----------------------------------------
        if(dct_algo==FF_DCT_AUTO || dct_algo==FF_DCT_MMX){
#if HAVE_SSSE3
            if(mm_flags & AV_CPU_FLAG_SSSE3){
                s->dct_quantize= dct_quantize_SSSE3;
            } else
#endif
            if(mm_flags & AV_CPU_FLAG_SSE2){
                s->dct_quantize= dct_quantize_SSE2;
            } else if(mm_flags & AV_CPU_FLAG_MMX2){
                s->dct_quantize= dct_quantize_MMX2;
            } else {
                s->dct_quantize= dct_quantize_MMX;
            }
        }
----------------------------------------

> Besides, this modification makes a change that is specific to dnxhd in generic
> mpeg code, which I'm not a big fan of.

That's not quite the case.  It just makes it aware of the new 10-bit safe DCT algo.  I mean future 10-bit codecs could be using that as well.  Besides, if you look around the code in mpegvideo_enc.c, you'll see it already checks for concrete DCT implementations, for example in ff_convert_matrix():
----------------------------------------
for(qscale=qmin; qscale<=qmax; qscale++){
        int i;
        if (dsp->fdct == ff_jpeg_fdct_islow
#ifdef FAAN_POSTSCALE
            || dsp->fdct == ff_faandct
#endif
            ) {
            for(i=0;i<64;i++) {
                const int j= dsp->idct_permutation[i];
                /* 16 <= qscale * quant_matrix[i] <= 7905 */
                /* 19952             <= ff_aanscales[i] * qscale * quant_matrix[i]               <= 249205026 */
                /* (1 << 36) / 19952 >= (1 << 36) / (ff_aanscales[i] * qscale * quant_matrix[i]) >= (1 << 36) / 249205026 */
                /* 3444240           >= (1 << 36) / (ff_aanscales[i] * qscale * quant_matrix[i]) >= 275 */

                qmat[qscale][i] = (int)((UINT64_C(1) << QMAT_SHIFT) /
                                (qscale * quant_matrix[j]));
            }
        } else if (dsp->fdct == fdct_ifast
#ifndef FAAN_POSTSCALE
                   || dsp->fdct == ff_faandct
#endif
----------------------------------------

> 
> >> I think it's time to separate the dnxhd quantization fonction from
> >> the mpeg one and change QMAT_SHIFT only there.
> >> Remove MpegEncContext from DNxHDEncContext, add
> dnxhd_template_mmx.c
> >> in
> >> x86 and clean it up (intra only, out format, aic only etc..)
> > I believe we only use dct_quantize() from MpegEncContext, and there
> exists quite a few assembly versions of that.  I don't think it's a good idea to
> duplicate that code.
> 
> The code has overflow problems already due to the shift.

For 8 bit DNxHD it doesn't.  Moving away from MpegEncContext would hurt 8 bit DNxHD performance.
I personally don't care about performance much, so if concensus is to go that way, I would be happy to do it.


Joseph Artsimovich
Software Developer
MirriAd Ltd



More information about the ffmpeg-devel mailing list