[FFmpeg-devel] h264 threading fate tests

Clément Bœsch ubitux at gmail.com
Wed Oct 3 15:08:47 CEST 2012

On Tue, Oct 02, 2012 at 10:01:52AM -0700, Ronald S. Bultje wrote:
> Hi,
> On Tue, Oct 2, 2012 at 2:34 AM, Clément Bœsch <ubitux at gmail.com> wrote:
> > On Mon, Oct 01, 2012 at 11:50:19AM +0200, Clément Bœsch wrote:
> >> On Fri, Sep 28, 2012 at 11:07:54AM +0200, Clément Bœsch wrote:
> >> > On Mon, Apr 16, 2012 at 04:43:42PM +0200, Clément Bœsch wrote:
> >> > > Hi,
> >> > >
> >> > > I recently setup a few fate instances to test the threading (2, 8, 16 and
> >> > > auto), and regularly one of the h264 conformance test fails; look at the
> >> > > yellow entries here for instance:
> >> > > http://fate.ffmpeg.org/history.cgi?slot=x86_64-archlinux-gcc-threads-8
> >> > >
> >> > > The other day, I tried to run an automated git bisect run, but
> >> > > unfortunately testing the potential regression would requires to run
> >> > > something like "while true; do make fate-h264 -j20 THREADS=8; done" for
> >> > > around 15 minutes at least each time, and it might not even be reliable.
> >> > >
> >> > > I'm not familiar at all with AVC decoding or threading in FFmpeg, but
> >> > > maybe someone has an idea of what could cause this?
> >> > >
> >> > > Maybe I should just open an issue in the trac?
> >> >
> >> > OK so here is some more information about that race. Since the failure
> >> > seems to be always the same (failing frame CRCs are identical), I dumped
> >> > both outputs and made a diff. Here it is:
> >> >
> >> >     http://lucy.pkh.me/race.html
> >> >
> >> > Basically, we can notice some ±1 byte of difference at times. The
> >> > generated outputs can be downloaded from that page.
> >> >
> >> > Also, I was able the other day (purely by luck) to have a valgrind
> >> > backtrace triggering the problem. The output was flooded, but here is a
> >> > sample: http://pastie.org/4602183. BTW, it seems the test is using a frame
> >> > based threading.
> >> >
> >> > BTW, we observe only two failing tests: h264-conformance-cama2_vtc_b
> >> > mostly, but at times h264-conformance-mr1_bt_a fails as well. The
> >> > information I gave above are related to h264-conformance-cama2_vtc_b only.
> >> >
> >> > Still anyone to look into this? :(
> >> >
> >>
> >> More digging info: it seems the problem (or at least part of it) is around
> >> H264Context->ref_count[2] (frame-based threading), especially when copying it
> >> from the priv_data in lavc/h264.c:decode_update_thread_context():
> >>
> >>     libavcodec/h264.c:1248:    copy_fields(h, h1, ref_count, list_count);
> >>
> >
> > I have no idea about that specific race, but it looks like there is
> > another one: simultaneous frame worker threads seems to call
> > ff_init_cabac_states() (conflict around ff_h264_lps_range accesses).
> >
> >   worker thread -> decode_frame -> decode_slice -> init cabac states
> >
> > So maybe moving ff_init_cabac_states() only once in the init or something
> > (even if not necessary) could help.
> So, the other patch probably looks fine. But just to be clear, when
> you say it fixes a race, you mean it fixes a warning from helgrind or
> some other threading tool, right? I don't think it would fix any of
> the fate failures with frame-mt enabled.

Yes it fixes a Helgrind message that happens *sometimes* when running the
cama2_vtc_b test (and seems to generally trigger a bunch of other races at
the same time). It is indeed likely not enough to fix the FATE problem
since there are a few more unrelated issues spotted. I'll push it soon

> > I may sounds repetitive, but no one familiar with the H264 decoder or the
> > threading available? :)
> >
> > The script I proposed in the previous mail helps to trigger that case,
> > generally 10-15 attempts are enough, sometimes a bit more, but it's not
> > that complicated to reproduce, you just have to wait and it breaks
> > directly.
> So yes I've seen the other emails, I don't like responding to this
> subject in general because it very quickly degenerates into someplace
> where I just state that helgrind sucks and is useless, and that
> doesn't help you. So let's skip that.


Unfortunately, I don't know any other tool to debug such thing (maybe the
Intel thing?). Helgrind seems to spot the race problem... sometimes.
AFAICT it's good enough to give some insight about what's wrong. The
limitation here is my knowledge of the threading in FFmpeg.

BTW, with the MSVC support, doesn't it make available a special tool or
something? IIRC it was said to be not available for C stuff, right?

> My understanding of a potential cause for the h264 bug, which seems
> somewhat compatible with what you've found so far, is (this isn't
> based on any thread debugging, just on my understanding of the
> threading code in general) basically as follows: so I'm going to
> assume you're familiar with the general design of frame-mt, the idea
> is that you have a master context (the AVCodecContext interfacing with
> the application), where avctx->priv_data is the structs in pthread.c,
> and then worker slave threads, each of which are a single decoder
> AVCodecContext with ctx->priv_data being the codec's private data
> (e.g. H264Context). As a thread completes frame header parsing, it
> calls ff_thread_setup_complete() or whatever it's called, then the
> contents of the privdata and ctx are copied to the next thread, which
> is now in the same state as the previous one, and (given the general
> assumption that these variables can't change after
> ff_thread_setup_complete()), thus we can start decoding the next frame
> (in the next thread) - i.e. frame-level parallelism. If that doesn't
> make sense, please ask, this is pretty fundamental before you can see
> the theoretical problem in this design (and can design fixes for it).

So far it makes sense to me, assuming you were talking about
ff_thread_finish_setup() for the function setting the "RO mode".

But just to be sure: each (non-master with id≠0 ?) PerThreadContext,
associated with a frame worker gets a copy of the avctx (or at least a
limited set of values specified in pthread.c:update_context_from_thread),
and its private data avctx->priv_data (pointing on the decoder context) is
duplicated as well.

What I'm wondering about is: what happens if after
ff_thread_finish_setup() is called, the current thread starts to modify
its avctx & priv_data? If it's a copy, the changes won't affect the others
threads, but it shouldn't create any conflicts, the writing will just have
no effect outside the scope of that frame decoding. Is that true, and
eventually abused?

(Note: I'm asking here about the non inter-frame specific fields you
mention later in your mail)

> So, again, the fundamental assumption is that we can copy variables
> from privdata and avctx to the next thread and they don't change until
> the next frame header parsing, e.g.:
> thread 1: read width=X, store in avctx->width, call
> ff_thread_setup_complete(), continue decoding frame 1

I'm assuming here that a frame worker in thread #1 gets to decode a frame
    dec_ctx = avctx->priv_data;
    avctx->width = dec_ctx->width = X;


> thread 2: copy width, continue decoding frame 2
> thread 1: finished decoding, width=X
> thread 2: finished decoding, width=X
> A race would be:
> thread 1: read width=X, call ff_thread_setup_complete()
> thread 2: copy width, continue decoding frame 2
> thread 1: store width in avctx->width, continue decoding frame 1
> thread 1: finished decoding, width=X
> thread 2: finished decoding, width=uninitialized
> Trivial, right?

Sure, the next thread got an uninit copy of the thread#1->avctx->width,

>                  The fundamental thing here, again, is that all values
> that are copied between threads cannot be modified after
> ff_thread_setup_complete() is called. The exception here is ones that
> are protected by the frame progress conditions, such as frame data,
> motion vectors, etc. which are used for motion compensation or motion
> vector prediction (etc) in the next frame.

OK so this is related to my previous question where some fields could be
modified; I need to check out how these fields are transmitted and how the
other thread is waiting for data.

>                                            However, it's not hard to
> find a _ton_ of variables in H264Context and MpegEncContext in
> general, which are modified after ff_thread_setup_complete() is
> called, copied between thread contexts, yet not protected by any means
> to ensure that their values are synchronized between the threads.

This is likely answering my question... So "just" a synchronization
problem. Fixing this for the avctx variables usages would mean to add the
field in update_context_from_thread(), and for the codec context, this is
handled in... the update_thread_context callback, right?

>                                                                   In
> many cases, it's OK, we don't actually use their value, but any time
> that we do, this is a potential race and possible cause for the h264
> failures that we're seeing. ref_count[] may be one of these, but
> there's likely more, maybe you want to do a more global review of all
> members of these structs and ensure that EITHER they are not copied
> between frames, OR they are but we ensure and proove that they are not
> modified after ff_thread_setup_complete() is called.

Right, I think I understand better the problem now, I'll do a more deeper
review of these fields. Thanks a lot for the long explanation!

>                                                       As you can see,
> this is really hard and time-consuming, plus (sorry) h264 really is a
> beast in terms of code size and complexity, so I don't really feel
> like doing it.
> For vp8, just as a comparison, this is easily proven, because the
> decoder is simpler.

Indeed, VP8 looks like a good threading example.

> HTH,

It does, thanks again

Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121003/46fb1fe1/attachment.asc>

More information about the ffmpeg-devel mailing list