[FFmpeg-devel] Tracking down frame corrupting race conditions on Windows.

Dale Curtis dalecurtis at chromium.org
Fri Apr 20 22:49:13 CEST 2012


We've been running into an issue where the decoder (avcodec_decode_video2) is
returning corrupted frames on Win32; some images showing the corruption:


The problem is most easily detected when run under thread sanitizer (TSAN)* with
a hash test.  I've created a sample program which simply decodes and hashes
each frame (using the av_sha_* interfaces) to illustrate this:

Without TSAN and 2 threads:
> hash-test.exe bear0.mp4 2
000: a1d99a1a3f6ec3ffbad916ad34e14a69b011ece10505930fea07f5a5cd988a7c
081: d857354e53a1161e1be083e95e01ad7a12920c5307c37d012c8fa1d3fb784fc2

With TSAN and 2 threads:
> tsan\tsan.bat --log-file=tsan.txt -- hash-test.exe bear0.mp4 2
000: e5a0e89c28c8ade300ae572508aeab2f06b59e0da30a20abde926da4003dc680
081: 85582db265d1f2ec90d32f5642075af81397c0de2d606defb3d704eeca261253

With TSAN and 1 threads:
> tsan\tsan.bat --log-file=tsan.txt -- hash-test.exe bear0.mp4 1
000: a1d99a1a3f6ec3ffbad916ad34e14a69b011ece10505930fea07f5a5cd988a7c
081: d857354e53a1161e1be083e95e01ad7a12920c5307c37d012c8fa1d3fb784fc2

Our assumption is that the output from threads == 1 should be indistinguishable
from output with threads > 1.  Some general notes:

   - The issue occurs with at least h264, vp8, and theora files.
   - We have seen no issues under Linux or Mac, which is not to say there are no
     problems.  The Windows version of TSAN runs much slower than the Linux
     or Mac variants and thus may be more susceptible to the issue.
   - Switching to pthreads instead of using w32threads on Windows does not fix
     the problem.
   - Given the hash changes with the number of threads, we suspect the problem
     is a race condition.
   - It's possible TSAN is breaking something fundamental, however past
     experience with the tool has shown the worst case to be false positives;
     never spurious impacts to the running program...
   - FATE will fail its hash tests on almost every single test if TSAN is set as
     the target_exec on Windows.

To reproduce the results, the fastest way is to grab the test bundle from here:


The bundle includes a precompiled Win32 hash-test executable, hash-test source
code, and TSAN binaries as well as h264, theora, and vp8 test cases.  From there
you just need to add the FFmpeg or LibAV DLL files from the prebuilt servers:

   FFmpeg: http://ffmpeg.zeranoe.com/builds/win32/shared/
   LibAV: http://win32.libav.org/

Then to run the test you just run:

   hash-test.exe <file> <threads>
   tsan\tsan.bat --log-file=tsan.txt -- hash-test.exe <file> <threads>

I highly recommend using --log-file with TSAN, otherwise it will generate a
tremendous amount of warnings due to the code's assumption of atomic integers as
well as other more complicated threading patterns.  However, it's possible the
real issue is lurking in one of those warnings.

If you want to build the test yourself you'll need to setup MinGW and build
the shared version of FFmpeg/LibAV, the test can then be compiled from the MinGW
shell with:

   gcc -o hash-test.exe hash-test.c -I. -std=c99 avformat-54.dll avutil-51.dll

acolwell and I have spent a lot of time trying to track down the source of this
corruption on Windows without much success.  We're hoping the community might
have some better ideas on where to look.  Thanks in advance for any assistance!

- dale

*Thread Sanitizer: http://code.google.com/p/data-race-test/wiki/ThreadSanitizer

