[FFmpeg-devel] VP8 sliced threading

Fri Jul 6 18:37:50 CEST 2012

On Fri, Jul 06, 2012 at 03:14:17AM -0400, Daniel Kang wrote:
> >
> > Also how did you get performance numbers? For low horizontal resolution
> >> I'd expect it to potentially get vastly slower on Windows when the sleep
> >> comes in, since the default minimum granularity of the sleep is 10ms, which
> >> should be longer than decoding a whole frame takes.
> >>
> >
> > I only tested HD clips, on Linux and Windows. I will test a low-res clip
> > once I can find a suitable one.
> >
> 
> Sorry for the second email. Where did you find the information on
> granularity on sleep?
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspxstates
> that a value of 0 will "[cause] the thread to relinquish the
> remainder of its time slice to any other thread that is ready to run." I
> cannot find information on implementation details.

I'm afraid it is not documented.
Note I am not sure if things changed _after_ XP but my information
should at least be correct for Windows XP.
10ms is the default scheduling granularity for Windows.
So giving up your time slice will usually mean you will not be able to
continue until at least 10 ms later (the average I think would actually
be 15 ms).
Now obviously this is completely useless for multimedia playback, so Microsoft
added the timeBeginPeriod multimedia API to request higher resolution
(with the funny side effect of causing worse performance and higher
power usage in several cases).
Now where I suspect the issue comes in is if the decoder is used by an
application that does not use timeBeginPeriod (which an application that
does not do realtime playback probably will and should not).
Now another problem for testing this is that the timer resolution is
system global, so you would have to make sure there is no MPlayer,
iTunes, WinAmp, ... running when you test.
Also the pause instruction is meant for spinlock cases. You are using
sleep in the same loop which will usually go into the kernel and in
general this is not really a spinlock, so I don't think it is helping
and I think it is not supposed to be used like this.
I also think I read that using sched_yield this way is not portable and
thus very much discouraged (it is implemented as a NOP often).
The idea being that proper locking/signalling should be fast enough (and
actually can be quite faster if sched_yield is actually a NOP).