[FFmpeg-devel] [PATCH] H264 DXVA2 implementation

Fri Jan 8 07:54:53 CET 2010

On Thu, Jan 07, 2010 at 11:19:26PM +0100, Laurent Aimar wrote:
> > This is the opposite of condition above, so use an else above (yes, I know why
> > you did it like this, but I am sure this can be done nicer.
> Then it becomes:

Tried
int result = 0;
if (too small)
   result = -1;
free buffer
return result;

> > > +    pp->residual_colour_transform_flag=
> > > +        (h->sps.profile_idc >= 100 &&
> > > +         h->sps.chroma_format_idc == 3) ? h->sps.residual_color_transform_flag : 0;
> > 
> > Any reason why the sps reading code shouldn't just apply this condition?
>  Well, the DXVA2 specs says that all non bitstream defined values must be set
> to 0 (with a few exceptions). It is unrelated to what does h264.c (it could put
> non 0 value if it simplify its job). (Referenced as [1].)

Well, it "could", however the question is if it is a good design to allow it to
do so.

> > > +    if (h->sps.profile_idc >= 100) {
> > > +        pp->bit_depth_luma_minus8     = h->sps.bit_depth_luma - 8;
> > > +        pp->bit_depth_chroma_minus8   = h->sps.bit_depth_chroma - 8;
> > > +    } else {
> > > +        pp->bit_depth_luma_minus8     = 0;
> > > +        pp->bit_depth_chroma_minus8   = 0;
> > > +    }
> > Again, why not in sps reading code (the condition, not the - 8)?
>  For that one (where a sensible value stored by h264.c would also match the
> requirement of DXVA2), I would say that it doesn't seem the spirit of h264.c
> to initialized non bitstream defined value with a valid one.

I can't tell, but we definitely should ask the maintainer (Michael I think?),
because this is quite ugly and if we can we should avoid it.

> > > +    for (i = 0; i < 6; i++)
> > > +        for (j = 0; j < 16; j++)
> > > +            qm->bScalingLists4x4[i][j] = h->pps.scaling_matrix4[i][zigzag_scan[j]];
> > > +
> > > +    for (i = 0; i < 2; i++)
> > > +        for (j = 0; j < 64; j++)
> > > +            qm->bScalingLists8x8[i][j] = h->pps.scaling_matrix8[i][ff_zigzag_direct[j]];
> > 
> > Swap the loop, and in the second case unroll it.
> > I doubt speed matters, but then you can do the zigzag lookup in the outer loop
> > which is more readable IMO.
> 
> for (j = 0; j < 16; j++) {
>     const int zj = zigzag_scan[j];
>     for (i = 0; i < 6; i++)
>         qm->bScalingLists4x4[i][j] = h->pps.scaling_matrix4[i][zj];
> }
> for (j = 0; j < 64; j++) {
>     const int zj = ff_zigzag_direct[j];
>     qm->bScalingLists8x8[i][j] = h->pps.scaling_matrix8[0][zj];
>     qm->bScalingLists8x8[i][j] = h->pps.scaling_matrix8[1][zj];
> }
> 
> Like that ?
> Not sure it helps/improves (and no, speed does not matter at all here).

Well, you forgot to replace two i in the second case, and yes I definitely
think it is an improvement there.
Not so sure about the first one, but IMO it isn't worse either.

> > > +    for (list = 0; list < 2; list++) {
> > > +        unsigned i;
> > > +        for (i = 0; i < FF_ARRAY_ELEMS(slice->RefPicList[list]); i++) {
> > > +            if (list < h->list_count && i < h->ref_count[list]) {
> > > +                const Picture *r = &h->ref_list[list][i];
> > > +                unsigned plane;
> > > +                slice->RefPicList[list][i].Index7Bits     = get_surface_index(ctx, r);
> > > +                slice->RefPicList[list][i].AssociatedFlag = r->reference == PICT_BOTTOM_FIELD;
> > > +                if (h->luma_weight_flag[list]) {
> > > +                    slice->Weights[list][i][0][0] = h->luma_weight[list][i];
> > > +                    slice->Weights[list][i][0][1] = h->luma_offset[list][i];
> > > +                } else {
> > > +                    slice->Weights[list][i][0][0] = 1 << h->luma_log2_weight_denom;
> > > +                    slice->Weights[list][i][0][1] = 0;
> > > +                }
> > > +                for (plane = 1; plane < 3; plane++) {
> > > +                    int w, o;
> > > +                    if (h->chroma_weight_flag[list]) {
> > > +                        w = h->chroma_weight[list][i][plane-1];
> > > +                        o = h->chroma_offset[list][i][plane-1];
> > > +                    } else {
> > > +                        w = 1 << h->chroma_log2_weight_denom;
> > > +                        o = 0;
> > > +                    }
> > > +                    slice->Weights[list][i][plane][0] = w;
> > > +                    slice->Weights[list][i][plane][1] = o;
> > > +                }
> > > +            } else {
> > > +                unsigned plane;
> > > +                slice->RefPicList[list][i].bPicEntry = 0xff;
> > > +                for (plane = 0; plane < 3; plane++) {
> > > +                    slice->Weights[list][i][plane][0] = 0;
> > > +                    slice->Weights[list][i][plane][1] = 0;
> > > +                }
> > > +            }
> > > +        }
> > > +    }
> > 
> > Maybe it would be better to split this and the other one like it into
> > two parts, one initialization looping from 0 to FF_ARRAY_ELEMS..
> > and one setup lopping from 0 till h->ref_count
>  I don't like computing and storing values that will be overwritten
> just after. It offuscates what is really needed. But if you want it,
> I could do it.

I won't maintain it so I don't care much, but I consider this at least
1-2 indentation levels too deep, I was only suggesting one way to reduce it.

> > > +    if (h->pps.redundant_pic_cnt_present)
> > > +        slice->redundant_pic_cnt = h->redundant_pic_count;
> > 
> > Why/how can h->redundant_pic_count have the wrong value if h->pps.redundant_pic_cnt_present is 0?
>  Because then, h264.c does not set it.

I am quite confident that the contexts should not have uninitialized values,
if it has I think h264.c should be changed.

> > > +#define OFFSET_OF(t,f) ((intptr_t)&((t*)NULL)->f)
> > > +            assert(OFFSET_OF(DXVA_Slice_H264_Short, BSNALunitDataLocation) == 
> > > +                   OFFSET_OF(DXVA_Slice_H264_Long,  BSNALunitDataLocation));
> > > +            assert(OFFSET_OF(DXVA_Slice_H264_Short, SliceBytesInBuffer) == 
> > > +                   OFFSET_OF(DXVA_Slice_H264_Long,  SliceBytesInBuffer));
> > > +#undef OFFSET_OF
> > 
> > There is offsetof which probably works at least as well as this hack which
> > invokes undefined behaviour.
> Why does it invoke undefined behaviour ?

Because pointer arithmetic is only defined as long as the result is within
an allocated memory range or at most one beyond it.
Since NULL in general has no allocated memory associated with it, pointer
arithmetic on it should always be undefined, and the same should apply
to this kind of address calculation.

> (Btw, the same trick is used by the linux kernel header).

Well, the kernel will always only work on specific hardware so it is
far more excusable..

> > > +            if (&current[start_code_size + size] > &dxva_data[dxva_size]) {
> > 
> > Pointlessly using & and [] instead of just adding pointers?
> > Also, sure this does not invoke undefined behaviour (i.e. is
> > current + start_code_size + size at most one after the last element of the allocated
> > memory?)
>  I can do
>     if (current - dxva_data + start_code_size + size > dxva_size)
> to be sure that no undefined behaviour happen.

Depends on the surrounding code, but might be more readable and still correct as
end = dxva_data + dxva_size;
...
if (start_code_size + size > end - current)

Assuming it is certain that start_code_size + size can not overflow.

> > > +    memset(&exe, 0, sizeof(exe));
> > 
> > exe IMO is bad name.
> Why ? The structure type is DXVA2_DecodeExecuteParams. I could call it
> cfg, but it don't think it would be better.

exec_params?
Or even exec is better, exe is too much like the file extension IMO.