[FFmpeg-devel] [PATCH] h264 parallelized

Sun Sep 2 19:29:59 CEST 2007

Hi

On Sun, Sep 02, 2007 at 05:47:24PM +0200, Andreas ?man wrote:
> Hello
>
> Michael Niedermayer wrote:
>> Hi
>> hmmmmmm
>> ive just retested it and i can confirm that its mostly faster
>> thats very strange
>
> Perhaps some of the changes has been added after the patch
> was initially submitted had some impact on this?
> I tried benchmarking it with a few strategic commits during
> the last months. Couldn't spot any difference though (but then again
> it has always been faster on my devel workstation)
>
>> anyway under these circumstances, the patch is ok
>
> Okay, I'll carve out a new patch that applies without fuzz
>
>> though please add a comment to each new structure field explaining what
>> it is
>
> Right on .. I added some other comments in the code as well..

[...]
> @@ -4478,6 +4575,18 @@
>              h->slice_beta_offset = get_se_golomb(&s->gb) << 1;
>          }
>      }
> +
> +    if(h->deblocking_filter == 1 && h0->max_contexts > 1) {
> +        h0->max_contexts = 1;
> +        if(h != h0)
> +            execute_decode_slices(h, 0); // deblocking switched inside frame, process pending slices
> +
> +        if(!h0->single_decode_warning) {
> +            av_log(s->avctx, AV_LOG_INFO, "Cannot parallelize deblocking type 1, decoding such frames in sequential order\n");
> +            h0->single_decode_warning = 1;
> +        }
> +    }

have you tested the code above?

[...]
> +    if(h->current_context == 1) {
> +        decode_slice(avctx, h);
> +    } else {
> +        for(i = 1; i < h->current_context; i++) {
> +            hx = h->thread_context[i];

> +            hx->s.error_resilience = 1;

why?

> +            hx->s.error_count = 0;
> +        }
> +
> +        avctx->execute(avctx, (void *)decode_slice, (void **)h->thread_context + h->first_context,
> +                       NULL, h->current_context - h->first_context);
> +
> +        /* pull back stuff from slices to master context */
> +        hx = h->thread_context[h->current_context - 1];
> +        s->mb_x = hx->s.mb_x;
> +        s->mb_y = hx->s.mb_y;
> +        for(i = 1; i < h->current_context; i++)
> +            h->s.error_count += h->thread_context[i]->s.error_count;
> +    }
> +    if(reset) {
> +        h->current_context = 0;
> +        h->first_context = 0;
> +    } else {
> +        /* if currently decoding a slice header / NAL unit, we cannot just set h->current_context to 0.

why not? this increases complexity alot ...
wouldnt copying the current context to 0 work too?

[...]
>          case NAL_IDR_SLICE:
> +            if(h != hx && (h->long_ref_count || h->short_ref_count))
> +                execute_decode_slices(h, 0); // idr() below will reap frames, execute all we've done so far
>              idr(h); //FIXME ensure we don't loose some frames if there is reordering

why is this needed?

[...]
> @@ -7850,8 +8009,10 @@
>          default:
>              av_log(avctx, AV_LOG_ERROR, "Unknown NAL code: %d (%d bits)\n", h->nal_unit_type, bit_length);
>          }
> +        if(h->current_context == h->max_contexts)
> +            execute_decode_slices(h, 1);

i think it would be alot clearer if current_context where reset to 0 here
instead of in execute_decode_slices(), also note its not needed below so its
not code duplication

[...]
> @@ -382,6 +382,22 @@
>      const uint8_t *field_scan8x8_cavlc_q0;
>  
>      int x264_build;
> +
> +    /**
> +     * Multi threading members.
> +     * These are only used in the "master" context
> +     */
> +

doxy has some tags for comments of groups of things ...

> +    struct H264Context *thread_context[MAX_THREADS];
> +
> +    int first_context;          ///< First context in thread_context[] that should be decoded, see execute_decode_slices() for more detailed explanation

not good, its obvious that you can look at where a variable is used
and "context that should be decoded" is well uhm
we decode slices not contexts

already_decoded_slices might also be a better name

> +    int current_context;        /** In decode_slice_header(), this is the current context.
> +                                    current_context - first_context == number of contexts to execute in parallel for a single round */

A
/** doc for B*/
B

A
/**< doc for A*/
B

and "contexts to execute in parallel" is similarely nonsense
threads can excute in parallel contexts dont execute

> +
> +    int slice_counter;          ///< Current slice, only used during decode_slice_header(), and is copied to h->slice_num

if its the current slice why is it not called current_slice ?

> +    int max_contexts;           ///< Max # of contexts (slices) to process in parallel (avctx->thread_count), is reduced to 1 if deblocking_filter == 1

"Max # of contexts (slices) to process in parallel"
good that makes sense

"(avctx->thread_count)"
bad, ambigous reference to a similar variable with no hint what the relation
between the 2 is

"is reduced to 1 if deblocking_filter == 1"
maybe not really important but also good

> +    int single_decode_warning;  ///< Used to limit logging warning if falling back to single threaded decoding mode

1 if the single thread fallback warning has already been displayed, 0 otherwise

> +    int last_slice_type;        ///< Used in decode_slice_header() to figure out last slice type.

grep tells me where its used, the rest is just defining foobar by foobar

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070902/075e9aea/attachment.pgp>