[FFmpeg-devel] h264 threading fate tests
Ronald S. Bultje
rsbultje at gmail.com
Wed Oct 3 17:27:49 CEST 2012
On Wed, Oct 3, 2012 at 6:08 AM, Clément Bœsch <ubitux at gmail.com> wrote:
> On Tue, Oct 02, 2012 at 10:01:52AM -0700, Ronald S. Bultje wrote:
>> So yes I've seen the other emails, I don't like responding to this
>> subject in general because it very quickly degenerates into someplace
>> where I just state that helgrind sucks and is useless, and that
>> doesn't help you. So let's skip that.
> Unfortunately, I don't know any other tool to debug such thing (maybe the
> Intel thing?). Helgrind seems to spot the race problem... sometimes.
> AFAICT it's good enough to give some insight about what's wrong. The
> limitation here is my knowledge of the threading in FFmpeg.
> BTW, with the MSVC support, doesn't it make available a special tool or
> something? IIRC it was said to be not available for C stuff, right?
It isn't well-maintained nowadays, e.g. the docs link doesn't work (on
codeplex), but my impression was always that it worked in C as well.
Some people tell me tsan can do things chess can also, so maybe I'll
try that before some next email.
>> My understanding of a potential cause for the h264 bug, which seems
>> somewhat compatible with what you've found so far, is (this isn't
>> based on any thread debugging, just on my understanding of the
>> threading code in general) basically as follows: so I'm going to
>> assume you're familiar with the general design of frame-mt, the idea
>> is that you have a master context (the AVCodecContext interfacing with
>> the application), where avctx->priv_data is the structs in pthread.c,
>> and then worker slave threads, each of which are a single decoder
>> AVCodecContext with ctx->priv_data being the codec's private data
>> (e.g. H264Context). As a thread completes frame header parsing, it
>> calls ff_thread_setup_complete() or whatever it's called, then the
>> contents of the privdata and ctx are copied to the next thread, which
>> is now in the same state as the previous one, and (given the general
>> assumption that these variables can't change after
>> ff_thread_setup_complete()), thus we can start decoding the next frame
>> (in the next thread) - i.e. frame-level parallelism. If that doesn't
>> make sense, please ask, this is pretty fundamental before you can see
>> the theoretical problem in this design (and can design fixes for it).
> So far it makes sense to me, assuming you were talking about
> ff_thread_finish_setup() for the function setting the "RO mode".
> But just to be sure: each (non-master with id≠0 ?) PerThreadContext,
> associated with a frame worker gets a copy of the avctx (or at least a
> limited set of values specified in pthread.c:update_context_from_thread),
> and its private data avctx->priv_data (pointing on the decoder context) is
> duplicated as well.
> What I'm wondering about is: what happens if after
> ff_thread_finish_setup() is called, the current thread starts to modify
> its avctx & priv_data? If it's a copy, the changes won't affect the others
> threads, but it shouldn't create any conflicts, the writing will just have
> no effect outside the scope of that frame decoding. Is that true, and
> eventually abused?
Right, so as you say later in this email, this is bad behaviour and a
possible cause for races.
Just to be clear though, there isn't really a master thread. For slice
threading, thread 0 is indeed the master, but for frame threading,
there is no master. If there is N worker threads, there are N+1
AVCodecContexts, N codec-specific private datas (1 for each
worker-thread AVCodecContext) and 1 FrameThreadContext (the private
data for the application-facing AVCodecContext). The last started call
to AVCodec->decode_frame() is the master, i.e. the one that will be
used as a template to copy values from before the next thread starts
calling update_thread_context() and AVCodec->decode_frame().
>> So, again, the fundamental assumption is that we can copy variables
>> from privdata and avctx to the next thread and they don't change until
>> the next frame header parsing, e.g.:
>> thread 1: read width=X, store in avctx->width, call
>> ff_thread_setup_complete(), continue decoding frame 1
> I'm assuming here that a frame worker in thread #1 gets to decode a frame
> dec_ctx = avctx->priv_data;
> avctx->width = dec_ctx->width = X;
Right, it can do anything with any value that is copied from
AVCodecContext (in pthread.c) or its private data (which it copies in
the update_thread_context() callback), but only before
ff_thread_finish_setup(). Anything after that could be a race because
you never know whether the next thread copied the old or the new
>> However, it's not hard to
>> find a _ton_ of variables in H264Context and MpegEncContext in
>> general, which are modified after ff_thread_setup_complete() is
>> called, copied between thread contexts, yet not protected by any means
>> to ensure that their values are synchronized between the threads.
> This is likely answering my question... So "just" a synchronization
> problem. Fixing this for the avctx variables usages would mean to add the
> field in update_context_from_thread(), and for the codec context, this is
> handled in... the update_thread_context callback, right?
Right. Now, fixing the synchronization issue isn't necessarily as easy
as we'd like it to be, but we can look into that once we have an
actual variable that we've proven to be involved in one of those races
(i.e. we've proven that it is changed after ff_thread_finish_setup(),
that other threads therefore get different values in their
update_thread_context() callback depending on concurrency state, and
that that affects the function of the program (e.g. in the case of
ref_count: a reference buffer is therefore never free()'ed, thus we
run out of free reference buffer, or it is free()'ed while it was
still being used, thus no reference available when we needed it, or
something like that).
More information about the ffmpeg-devel