[FFmpeg-devel] [PATCH] h264 parallelized
Michael Niedermayer
michaelni
Sun Sep 2 19:29:59 CEST 2007
Hi
On Sun, Sep 02, 2007 at 05:47:24PM +0200, Andreas ?man wrote:
> Hello
>
> Michael Niedermayer wrote:
>> Hi
>> hmmmmmm
>> ive just retested it and i can confirm that its mostly faster
>> thats very strange
>
> Perhaps some of the changes has been added after the patch
> was initially submitted had some impact on this?
> I tried benchmarking it with a few strategic commits during
> the last months. Couldn't spot any difference though (but then again
> it has always been faster on my devel workstation)
>
>> anyway under these circumstances, the patch is ok
>
> Okay, I'll carve out a new patch that applies without fuzz
>
>> though please add a comment to each new structure field explaining what
>> it is
>
> Right on .. I added some other comments in the code as well..
[...]
> @@ -4478,6 +4575,18 @@
> h->slice_beta_offset = get_se_golomb(&s->gb) << 1;
> }
> }
> +
> + if(h->deblocking_filter == 1 && h0->max_contexts > 1) {
> + h0->max_contexts = 1;
> + if(h != h0)
> + execute_decode_slices(h, 0); // deblocking switched inside frame, process pending slices
> +
> + if(!h0->single_decode_warning) {
> + av_log(s->avctx, AV_LOG_INFO, "Cannot parallelize deblocking type 1, decoding such frames in sequential order\n");
> + h0->single_decode_warning = 1;
> + }
> + }
have you tested the code above?
[...]
> + if(h->current_context == 1) {
> + decode_slice(avctx, h);
> + } else {
> + for(i = 1; i < h->current_context; i++) {
> + hx = h->thread_context[i];
> + hx->s.error_resilience = 1;
why?
> + hx->s.error_count = 0;
> + }
> +
> + avctx->execute(avctx, (void *)decode_slice, (void **)h->thread_context + h->first_context,
> + NULL, h->current_context - h->first_context);
> +
> + /* pull back stuff from slices to master context */
> + hx = h->thread_context[h->current_context - 1];
> + s->mb_x = hx->s.mb_x;
> + s->mb_y = hx->s.mb_y;
> + for(i = 1; i < h->current_context; i++)
> + h->s.error_count += h->thread_context[i]->s.error_count;
> + }
> + if(reset) {
> + h->current_context = 0;
> + h->first_context = 0;
> + } else {
> + /* if currently decoding a slice header / NAL unit, we cannot just set h->current_context to 0.
why not? this increases complexity alot ...
wouldnt copying the current context to 0 work too?
[...]
> case NAL_IDR_SLICE:
> + if(h != hx && (h->long_ref_count || h->short_ref_count))
> + execute_decode_slices(h, 0); // idr() below will reap frames, execute all we've done so far
> idr(h); //FIXME ensure we don't loose some frames if there is reordering
why is this needed?
[...]
> @@ -7850,8 +8009,10 @@
> default:
> av_log(avctx, AV_LOG_ERROR, "Unknown NAL code: %d (%d bits)\n", h->nal_unit_type, bit_length);
> }
> + if(h->current_context == h->max_contexts)
> + execute_decode_slices(h, 1);
i think it would be alot clearer if current_context where reset to 0 here
instead of in execute_decode_slices(), also note its not needed below so its
not code duplication
[...]
> @@ -382,6 +382,22 @@
> const uint8_t *field_scan8x8_cavlc_q0;
>
> int x264_build;
> +
> + /**
> + * Multi threading members.
> + * These are only used in the "master" context
> + */
> +
doxy has some tags for comments of groups of things ...
> + struct H264Context *thread_context[MAX_THREADS];
> +
> + int first_context; ///< First context in thread_context[] that should be decoded, see execute_decode_slices() for more detailed explanation
not good, its obvious that you can look at where a variable is used
and "context that should be decoded" is well uhm
we decode slices not contexts
already_decoded_slices might also be a better name
> + int current_context; /** In decode_slice_header(), this is the current context.
> + current_context - first_context == number of contexts to execute in parallel for a single round */
A
/** doc for B*/
B
A
/**< doc for A*/
B
and "contexts to execute in parallel" is similarely nonsense
threads can excute in parallel contexts dont execute
> +
> + int slice_counter; ///< Current slice, only used during decode_slice_header(), and is copied to h->slice_num
if its the current slice why is it not called current_slice ?
> + int max_contexts; ///< Max # of contexts (slices) to process in parallel (avctx->thread_count), is reduced to 1 if deblocking_filter == 1
"Max # of contexts (slices) to process in parallel"
good that makes sense
"(avctx->thread_count)"
bad, ambigous reference to a similar variable with no hint what the relation
between the 2 is
"is reduced to 1 if deblocking_filter == 1"
maybe not really important but also good
> + int single_decode_warning; ///< Used to limit logging warning if falling back to single threaded decoding mode
1 if the single thread fallback warning has already been displayed, 0 otherwise
> + int last_slice_type; ///< Used in decode_slice_header() to figure out last slice type.
grep tells me where its used, the rest is just defining foobar by foobar
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070902/075e9aea/attachment.pgp>
More information about the ffmpeg-devel
mailing list