[FFmpeg-devel] [PATCH/RFC] H.264 FMO+ASO decoding

Tue Sep 28 22:52:13 CEST 2010

On Tue, Sep 28, 2010 at 10:04:14PM +0200, Stefan Gehrer wrote:
> On 09/26/2010 04:43 PM, Michael Niedermayer wrote:
>> On Wed, Jul 21, 2010 at 10:20:36PM +0200, Stefan Gehrer wrote:
>>> Hi,
>>>
>>> attached is a patch that implements the header stuff for FMO decoding
>>> in H.264 baseline streams. It decodes the slice group map and also
>>> provides means that at the start of a slice the actual x/y position
>>> of the first macroblock can be determined from first_mb_in_slice.
>>> If this is done, slices can be decoded out of order, i.e.
>>> first_mb_in_slice does not have to increase for slices of the same
>>> picture (aka ASO).
>>> Let's take as example a tiny video of 4x3 macroblocks which
>>> has a slice group map with two slice groups like this:
>>>
>>> 1  1  1  1
>>> 1  0  0  1
>>> 1  1  1  1
>>>
>>> Each slice group itself (0 or 1) is decoded in raster-scan order,
>>> so that the macroblock addresses are as follows:
>>>
>>> 2  3  4  5
>>> 6  0  1  7
>>> 8  9 10 11
>>>
>>> So if a slice comes along with first_mb_in_slice equal to 7 we need
>>> to start decoding at MB position x=3 and y=1.
>>>
>>> Unfortunately, the real challenge starts here. A lot of neighbor
>>> context handling (i4x4 modes, non-zero counts, MVs) has to be
>>> handled differently when the assumption of raster-scan order of
>>> macroblocks is not true anymore. Also, deblocking can only be done
>>> after the picture has been fully decoded. This is because deblocking
>>> goes across slice boundaries and slice group boundaries and the
>>> neighbor MB might just happen to be the last to be decoded in the
>>> picture.
>>> Considering the heavy optimizations that have been done in the
>>> normal decoding paths I guess there would be some outcry if in
>>> many places in the MB decoding a conditional like
>>> if(pps->slice_group_count>  1)
>>> would appear.
>>> So my feeling is that if FMO is to be implemented it may be best
>>> to have a new code path for the slice data decoding, a
>>> baseline-FMO-specific version of decode_slice() and some of its
>>> subfunctions maybe?
>>> Opinions welcome.
>>
>> my guess is you wont finish FMO
>
> I guess you are right.
>
>> also we need templating for it ...
>
> are there any more thoughts on this, e.g. how many codepaths to compile?
> MBAFF and non-MBAFF, CAVLC and CABAC?

i think we should keep it flexible, that is a slow fallback path and then
optimized pathes for all important cases as far as the user wants

>
>> still parts of your patch could be usefull and move us a tiny step closer to
>> FMO support
>>
>>
>>>
>>> Stefan
>>
>>>   h264.c    |   32 +++++++---
>>>   h264.h    |    6 +
>>>   h264_ps.c |  187 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
>>>   3 files changed, 182 insertions(+), 43 deletions(-)
>>> 6228649db31722b1b6bbf9ff33a04c52d2baaf14  h264_fmo.diff
>>> diff --git a/libavcodec/h264.c b/libavcodec/h264.c
>>> index d1662fc..bfb65d6 100644
>>> --- a/libavcodec/h264.c
>>> +++ b/libavcodec/h264.c
>>> @@ -1968,8 +1968,6 @@ static int decode_slice_header(H264Context *h, H264Context *h0){
>>>           av_log(h->s.avctx, AV_LOG_ERROR, "first_mb_in_slice overflow\n");
>>>           return -1;
>>>       }
>>> -    s->resync_mb_x = s->mb_x = first_mb_in_slice % s->mb_width;
>>> -    s->resync_mb_y = s->mb_y = (first_mb_in_slice / s->mb_width)<<  FIELD_OR_MBAFF_PICTURE;
>>>       if (s->picture_structure == PICT_BOTTOM_FIELD)
>>>           s->resync_mb_y = s->mb_y = s->mb_y + 1;
>>>       assert(s->mb_y<  s->mb_height);
>>
>>> @@ -2153,11 +2151,18 @@ static int decode_slice_header(H264Context *h, H264Context *h0){
>>>       }
>>>       h->qp_thresh= 15 + 52 - FFMIN(h->slice_alpha_c0_offset, h->slice_beta_offset) - FFMAX3(0, h->pps.chroma_qp_index_offset[0], h->pps.chroma_qp_index_offset[1]);
>>>
>>> -#if 0 //FMO
>>> -    if( h->pps.num_slice_groups>  1&&  h->pps.mb_slice_group_map_type>= 3&&  h->pps.mb_slice_group_map_type<= 5)
>>> -        slice_group_change_cycle= get_bits(&s->gb, ?);
>>> -#endif
>>> +    if(h->pps.slice_group_count>  1){
>>> +        int addr = -1;
>>>
>>> +        if(h->pps.mb_slice_group_map_type>= 3&&  h->pps.mb_slice_group_map_type<= 5)
>>> +            ff_h264_draw_slice_group(h,&h->pps, s->mb_width, s->mb_height);
>>> +        h->slice_group_current = 0;
>>> +        for(j=0;j<=first_mb_in_slice;j++)
>>> +            addr = ff_h264_fmo_next_mb(h, addr);
>>
>> this is too slow with many slices and groups
>
> A way to make things faster is to convert the "slice group map" into a
> lookup table to have a MbAddr -> x,y lookup.
> But for some types of slice group maps the parameters are transmitted on
> every slice (and changeable for every picture) and recalculating this
> lookup table every time may be more costly than stepping through the
> slice group map for every macroblock.
> For some slice group types, it is possible to calculate this lookup
> table directly without the intermediate "slice group map" as described
> in the spec. But when one needs to cover all possible cases I think the
> increase in code size would not justify the speed gained.
> Anyway, thanks for taking the time to look at it.

I dont know what the best data structure is, but i know that the one
implementged is not it ;)
id have to look at the spec and code more to see what exactly is done to know
what is best but then as there is noone seriously willing to finish this
i have not much interrest in looking at what would
be best ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100928/09656442/attachment.pgp>