[FFmpeg-devel] h264 threading fate tests

Ronald S. Bultje rsbultje at gmail.com
Tue Oct 2 19:01:52 CEST 2012


Hi,

On Tue, Oct 2, 2012 at 2:34 AM, Clément Bœsch <ubitux at gmail.com> wrote:
> On Mon, Oct 01, 2012 at 11:50:19AM +0200, Clément Bœsch wrote:
>> On Fri, Sep 28, 2012 at 11:07:54AM +0200, Clément Bœsch wrote:
>> > On Mon, Apr 16, 2012 at 04:43:42PM +0200, Clément Bœsch wrote:
>> > > Hi,
>> > >
>> > > I recently setup a few fate instances to test the threading (2, 8, 16 and
>> > > auto), and regularly one of the h264 conformance test fails; look at the
>> > > yellow entries here for instance:
>> > > http://fate.ffmpeg.org/history.cgi?slot=x86_64-archlinux-gcc-threads-8
>> > >
>> > > The other day, I tried to run an automated git bisect run, but
>> > > unfortunately testing the potential regression would requires to run
>> > > something like "while true; do make fate-h264 -j20 THREADS=8; done" for
>> > > around 15 minutes at least each time, and it might not even be reliable.
>> > >
>> > > I'm not familiar at all with AVC decoding or threading in FFmpeg, but
>> > > maybe someone has an idea of what could cause this?
>> > >
>> > > Maybe I should just open an issue in the trac?
>> >
>> > OK so here is some more information about that race. Since the failure
>> > seems to be always the same (failing frame CRCs are identical), I dumped
>> > both outputs and made a diff. Here it is:
>> >
>> >     http://lucy.pkh.me/race.html
>> >
>> > Basically, we can notice some ±1 byte of difference at times. The
>> > generated outputs can be downloaded from that page.
>> >
>> > Also, I was able the other day (purely by luck) to have a valgrind
>> > backtrace triggering the problem. The output was flooded, but here is a
>> > sample: http://pastie.org/4602183. BTW, it seems the test is using a frame
>> > based threading.
>> >
>> > BTW, we observe only two failing tests: h264-conformance-cama2_vtc_b
>> > mostly, but at times h264-conformance-mr1_bt_a fails as well. The
>> > information I gave above are related to h264-conformance-cama2_vtc_b only.
>> >
>> > Still anyone to look into this? :(
>> >
>>
>> More digging info: it seems the problem (or at least part of it) is around
>> H264Context->ref_count[2] (frame-based threading), especially when copying it
>> from the priv_data in lavc/h264.c:decode_update_thread_context():
>>
>>     libavcodec/h264.c:1248:    copy_fields(h, h1, ref_count, list_count);
>>
>
> I have no idea about that specific race, but it looks like there is
> another one: simultaneous frame worker threads seems to call
> ff_init_cabac_states() (conflict around ff_h264_lps_range accesses).
>
>   worker thread -> decode_frame -> decode_slice -> init cabac states
>
> So maybe moving ff_init_cabac_states() only once in the init or something
> (even if not necessary) could help.

So, the other patch probably looks fine. But just to be clear, when
you say it fixes a race, you mean it fixes a warning from helgrind or
some other threading tool, right? I don't think it would fix any of
the fate failures with frame-mt enabled.

> I may sounds repetitive, but no one familiar with the H264 decoder or the
> threading available? :)
>
> The script I proposed in the previous mail helps to trigger that case,
> generally 10-15 attempts are enough, sometimes a bit more, but it's not
> that complicated to reproduce, you just have to wait and it breaks
> directly.

So yes I've seen the other emails, I don't like responding to this
subject in general because it very quickly degenerates into someplace
where I just state that helgrind sucks and is useless, and that
doesn't help you. So let's skip that.

My understanding of a potential cause for the h264 bug, which seems
somewhat compatible with what you've found so far, is (this isn't
based on any thread debugging, just on my understanding of the
threading code in general) basically as follows: so I'm going to
assume you're familiar with the general design of frame-mt, the idea
is that you have a master context (the AVCodecContext interfacing with
the application), where avctx->priv_data is the structs in pthread.c,
and then worker slave threads, each of which are a single decoder
AVCodecContext with ctx->priv_data being the codec's private data
(e.g. H264Context). As a thread completes frame header parsing, it
calls ff_thread_setup_complete() or whatever it's called, then the
contents of the privdata and ctx are copied to the next thread, which
is now in the same state as the previous one, and (given the general
assumption that these variables can't change after
ff_thread_setup_complete()), thus we can start decoding the next frame
(in the next thread) - i.e. frame-level parallelism. If that doesn't
make sense, please ask, this is pretty fundamental before you can see
the theoretical problem in this design (and can design fixes for it).

So, again, the fundamental assumption is that we can copy variables
from privdata and avctx to the next thread and they don't change until
the next frame header parsing, e.g.:
thread 1: read width=X, store in avctx->width, call
ff_thread_setup_complete(), continue decoding frame 1
thread 2: copy width, continue decoding frame 2
thread 1: finished decoding, width=X
thread 2: finished decoding, width=X

A race would be:
thread 1: read width=X, call ff_thread_setup_complete()
thread 2: copy width, continue decoding frame 2
thread 1: store width in avctx->width, continue decoding frame 1
thread 1: finished decoding, width=X
thread 2: finished decoding, width=uninitialized

Trivial, right? The fundamental thing here, again, is that all values
that are copied between threads cannot be modified after
ff_thread_setup_complete() is called. The exception here is ones that
are protected by the frame progress conditions, such as frame data,
motion vectors, etc. which are used for motion compensation or motion
vector prediction (etc) in the next frame. However, it's not hard to
find a _ton_ of variables in H264Context and MpegEncContext in
general, which are modified after ff_thread_setup_complete() is
called, copied between thread contexts, yet not protected by any means
to ensure that their values are synchronized between the threads. In
many cases, it's OK, we don't actually use their value, but any time
that we do, this is a potential race and possible cause for the h264
failures that we're seeing. ref_count[] may be one of these, but
there's likely more, maybe you want to do a more global review of all
members of these structs and ensure that EITHER they are not copied
between frames, OR they are but we ensure and proove that they are not
modified after ff_thread_setup_complete() is called. As you can see,
this is really hard and time-consuming, plus (sorry) h264 really is a
beast in terms of code size and complexity, so I don't really feel
like doing it.

For vp8, just as a comparison, this is easily proven, because the
decoder is simpler.

HTH,
Ronald


More information about the ffmpeg-devel mailing list