[FFmpeg-devel] [RFC v5] libavcodec: add a native Daala decoder

Ronald S. Bultje rsbultje at gmail.com
Thu Jan 7 02:22:41 CET 2016


Hello,

On Sat, Jan 2, 2016 at 12:56 PM, Rostislav Pehlivanov <atomnuker at gmail.com>
wrote:
> @@ -864,6 +865,7 @@ OBJS-$(CONFIG_BMP_PARSER)              += bmp_parser.o
>  OBJS-$(CONFIG_CAVSVIDEO_PARSER)        += cavs_parser.o
>  OBJS-$(CONFIG_COOK_PARSER)             += cook_parser.o
>  OBJS-$(CONFIG_DCA_PARSER)              += dca_parser.o dca.o
> +OBJS-$(CONFIG_DAALA_PARSER)            += daala_parser.o
>  OBJS-$(CONFIG_DIRAC_PARSER)            += dirac_parser.o
>  OBJS-$(CONFIG_DNXHD_PARSER)            += dnxhd_parser.o
>  OBJS-$(CONFIG_DPX_PARSER)              += dpx_parser.o

Alphabetical ordering.

> +#ifndef AVCODEC_DAALA_H
> +#define AVCODEC_DAALA_H
> +
> +#include "avcodec.h"

Why? I see nothing in this header that requires avcodec.h. All you should
need is stdint.h and intmath.h.

> +/* Essential typedefs */
> +typedef uint32_t ent_win; /* Has to be able to express 32bit uint nums */
> +typedef uint16_t ent_rng;
> +typedef  int32_t dctcoef;

Is this for compat with daala functions? I find this very confusing TBH.
Are coefficients for 8bit content really 32bit? (In most codecs, they’re
only 16 bit.)

> +#define DAALA_MAX_REF_FRAMES 2  /* Maximum number of reference frames */

Dev note - OMG please increase this.

> +#define DAALA_QM_SCALE (1 << 15)
> +#define DAALA_QM_SCALE_MAX (DAALA_QM_SCALE - 1)
> +#define DAALA_QM_SCALE_UNIT (1.0f/DAALA_QM_SCALE_MAX)
> +#define DAALA_QM_INV_SCALE (1 << 12)
> +#define DAALA_QM_INV_SCALE_UNIT (1.0f/DAALA_QM_INV_SCALE)
> +#define DAALA_QM_BSIZE (DAALA_BSIZE_MAX*DAALA_BSIZE_MAX)
> +#define DAALA_QM_BUFFER_SIZE (DAALA_NBSIZES*2*DAALA_QM_BSIZE)

Floats in a video decoder? That looks strange. I’m pretty sure these are
placeholders for integer arithmetic?

> +/* Expectation value log, outputs Q1 */
> +static av_always_inline int daalaent_log_ex(int ex_q16)
> +{
> +    int o, log = daalaent_log2(ex_q16);
> +    if (log < 15) {
> +        o = ex_q16*ex_q16 > 2 << 2*log;

Derf-style?

> +/* Expectation value is in Q16 */
> +static inline int daalaent_decode_generic(DaalaEntropy *e, DaalaCDF *c,
int *ex,
> +                                          int max, int integrate)
> +{
> +    int rval, lsb = 0, log_ex = daalaent_log_ex(*ex);
> +    const int shift = FFMAX(0, (log_ex - 5) >> 1);
> +    const int id = FFMIN(DAALAENT_MODEL_TAB - 1, log_ex);
> +    const int ms = (max + (1 << shift >> 1)) >> shift;
> +    int xs = (max == -1) ? 16 : FFMIN(ms + 1, 16);
> +    ent_rng *cdf = &c->cdf[id*c->y];
> +    if (!max)
> +        return 0;
> +    if ((xs = daalaent_decode_cdf(e, cdf, xs, 0, CDF_UNSCALED)) == 15) {
> +        int g = ((2*(*ex) >> 8) + (1 << shift >> 1)) >> shift;
> +        ent_win decay = FFMAX(2, FFMIN(254, 256*g/(g + 256)));

av_clip? And space between operators.

> +#ifndef AVCODEC_DAALAPVQ_H
> +#define AVCODEC_DAALAPVQ_H
[..]
> +#define DAALAPVQ_MAX_PART_SIZE   (DAALA_QM_BSIZE/2)
> +#define DAALAPVQ_COMPAND_SCALE   (256 << DAALA_CSHIFT)
> +#define DAALAPVQ_COMPAND_SCALE_1 (1.0f/DAALAPVQ_COMPAND_SCALE)

More floating point, very strange. Will this become integer math at some
point?

> +/* Index for packed quantization matrices */
> +static av_always_inline int daalapvq_get_qm_idx(enum DaalaBsize bsize,
int band)
> +{
> +    return bsize*bsize + bsize + band - band/3;
> +}

I know we expect all compilers to know how to do this, but anyway - can we
use fastdiv for this?

> +static inline double daalapvq_gain_raise(double cg, int q0, double beta)
> +{
> +    if (beta == 1.0f)
> +        return cg*q0;
> +    else if (beta == 1.5f) {
> +        cg *= q0*DAALAPVQ_COMPAND_SCALE_1;
> +        return DAALAPVQ_COMPAND_SCALE*cg*sqrt(cg);
> +    }
> +    return DAALAPVQ_COMPAND_SCALE*pow(cg*q0*DAALAPVQ_COMPAND_SCALE_1,
beta);
> +}

Pow, sqrt, really?

> +/* Decodes quantized coefficients from  the bitsteam */
> +static void daalapvq_decode_laplace_vector(DaalaEntropy *e, dctcoef *y,
> +                                           int n, int k, dctcoef *curr,
> +                                           const dctcoef *means)
> +{
[..]
> +    memset(&y[i], 0, (n - i)*sizeof(dctcoef)); /* Zero the rest */

This is a little strange. So, we typically take advantage of the fact that
most coefficients are zero, and then we use the inverse transform (or
whatever daala’s equivalent of wording is here) to zero only the used part
of the coefficients. Then, we can assume they are zero by default in this
function, and the memset becomes unnecessary. This typically causes a
slight speedup also, because the memset is in simd and requires no extra
function call.

> +static void daalapvq_decode_codeword(DaalaEntropy *e, DaalaPVQ *pvq,
> +                                     dctcoef *y, int n, int k, uint8_t
has_ref,
> +                                     enum DaalaBsize bsize)

Making function arguments uint8_t usually has odd side-effects and
typically causes slowdowns. I’d recommend to just use int or unsigned for
scalar values. Same for the functions below this one.

> +#define DAALA_BSIZE8x8(arr, bstride, bx, by) ((arr)[(by)*(bstride) +
(bx)])
> +#define DAALA_BSIZE4x4(arr, bstride, bx, by) DAALA_BSIZE8x8(arr,
bstride, (bx) >> 1, (by) >> 1)

? So, this may just be me, but I feel this is getting in the #define inc(a)
((a)++) territory.

> +static av_always_inline int daala_get_qm_idx(enum DaalaBsize bsize, int
b)
> +{
> +    return bsize*(bsize + 1) + b - b/3;
> +}

daalapvq_get_qm_idx? (Or I guess the other way around, but that one came
first in this patch.)

> diff --git a/libavcodec/daaladec.c b/libavcodec/daaladec.c
[..]
> +/* Sets the AVFrame type */
> +static av_always_inline void set_frame_type(DaalaBitstreamHeader *b,
AVFrame *frame)
> +{
> +    frame->key_frame = b->key_frame;
> +    frame->pict_type = frame->key_frame ? AV_PICTURE_TYPE_I : !b->bipred
?
> +                                          AV_PICTURE_TYPE_P :
AV_PICTURE_TYPE_B;
> +}

This is (strictly speaking) used in the parser also right? Maybe just move
it to a common header then? I mean, you went through all effort to make it
a function so it’s kind of strange to then see duplicate code anyway.

> +static void daala_calc_prediction(DaalaContext *s, dctcoef *pred, const
dctcoef *d,
> +                                  int x, int y, uint8_t p, enum
DaalaBsize bsize)
> +{
> +    int n = 1 << (bsize + DAALA_LOG_BSIZE0);
> +    int aw = s->width >> s->fmt->dec[p][0];
> +    int off = ((y << DAALA_LOG_BSIZE0))*aw + (x << DAALA_LOG_BSIZE0);
> +    if (s->h.key_frame) {
> +        if (!p || s->h.haar) {
> +            memset(pred, 0, n*n*sizeof(dctcoef));
> +            if (!p && !s->h.haar && s->dsp.intrapred) {
> +                s->dsp.intrapred((uint8_t *)pred, (uint8_t *)&d[off],
aw, x, y,
> +                                 s->bsizes, s->bsizes_stride, bsize);
> +            }
> +        } else {
> +            for (y = 0; y < n; y++) {
> +                for (x = 0; x < n; x++) {
> +                    pred[n*y + x] = s->lcoef[n*y + x];
> +                }
> +            }
> +        }
> +    } else {
> +        /* Copy from mv coeffs */
> +    }
> +}

So, I get confused here. A few things:

A) by having one function for predicting intra as well as inter, it seems
to me you’re not taking advantage of the fact that inter prediction (I
know, not yet implemented) covers multiple transform blocks. For example,
you could have a predicted area of size 16x32 and have 2x4=8 8x8 transform
blocks inside it. By predicting the full inter area (which uses the same MV
all over) at once, your SIMD becomes more efficient.

B) why is pred a dctcoef (32bit)? And if there’s a remote case where that
makes sense, why is it casted to 8bit in the call to intrapred?

C) p should be unsigned or int, even if it’s a boolean.

D) lcoef->pred looks like a copy, that’s strange, esp since it’s a
prediction block that is probably later idct’ed into the target frame
pointer. Why copy at all? And if it’s necessary, why not using a SIMD
function?

> +static void decode_tree_sum(DaalaContext *s, dctcoef *pred, int x, int y,
> +                            dctcoef sum_t, const int shift, const int
dir)
> +{
> +    dctcoef c_sum, t_sum, sub[2][2], n = 1 << shift;

Why is something like sub[][] a dctcoef? It has nothing to do with dct
coefficients, it’s a decomposition tree related thing, no?

It seems a lot of variables are of type dctcoef while they probably
shouldn’t.

> diff --git a/libavcodec/daaladsp.c b/libavcodec/daaladsp.c
[..]
> +#if defined(TEMPLATE_8bit)
> +
> +#    define RENAME(N)   N ## _8bit
> +#    define READ(P)     AV_RN32(P)
> +#    define WRITE(P, V) AV_WN32(P, V)
> +#    define pixel       int32_t
> +#    undef  TEMPLATE_8bit
> +
> +#endif

Why is a pixel 32bit for 8bit? That seems like … A lot. And maybe half uses
of dctcoef in the decoder itself should be of this type?

> +av_cold int ff_daaladsp_init(DaalaDSP *d, int bit_depth)
> +{
> +    switch(bit_depth) {
> +    case 8:
> +        daaladsp_init_8bit(d);
> +        break;
> +    case 10:
> +    case 12:
> +    default:
> +        return 1;
> +        break;
> +    }
> +    return 0;
> +}

It’s kind of odd to make a function return 1 on error. Just say the codec
doesn’t support non8bit at all and error out in the header reading. Assert
here that bit_depth is 8.

Ronald


More information about the ffmpeg-devel mailing list