[Ffmpeg-devel] [PATCH] fix mpeg4 lowres chroma bug and increase h264/mpeg4 MC speed

Trent Piepho xyzzy
Mon Feb 12 11:24:58 CET 2007


On Mon, 12 Feb 2007, Michael Niedermayer wrote:
> On Sun, Feb 11, 2007 at 03:40:39PM -0800, Trent Piepho wrote:
> > On Fri, 9 Feb 2007, Michael Niedermayer wrote:
> > > > Do you disagree with me that avg_h264_chroma_mc4_mmx2 is completely broken?
> >
> > How come you never answer this?
>
> well, its brokenness depends upon what it is supposed to do, i didnt write
> that code ....
>
> we could add a requirement that some extra bytes must be allocated after the
> buffer but that might cause problems for some users of ffmpeg and wont help
> with the multithreading also it doesnt seem like the correct solution for this
> rather minor internal issue
>
> maybe using the plain C version of the code for the rightmost column would be
> an option ...

I'm not talking about *PUT*_h264_chroma_mc4_mmx2, but *AVG*_h264_chroma_mmx2.
That function does not overwrite the destination bytes, but averages them
with the result.  The first two bytes are averaged correctly, the second
two bytes are averaged with zero.  Adding extra padding or using the C
version on the rightmost column won't fix it at all.  The entire image is
incorrect.  It is supposed to compute result = (new + old)/2, but what it
is doing is result = (new + old/2)/2

> > Why do you discard some times in your TIMER code?  Is the goal just to
> > discard those times in which an interrupt occured?
>
> yes

That's not what's is doing, there are far too many skips for that to
be the case.

I modified the timer code slightly to record the average time of a skip
too.  This is a typical result:
5977 / 106270 centicycles in put_h264_chroma_mc2_mmx2, 4176949 runs, 17355 skips

Total cycles spent in put_h264_chroma_mc2_mmx2 =
(5977*4176949 + 106270*17355)/100 = 268099400 cycles.

Total time spent in put_h264_chroma_mc2_mmx2 =
268099400 cycles / 1533.426 Mhz = 174836 microseconds

skips per second for =
17355 skips / .174836 seconds = 99264 skips/second

I can assure you that my system has nowhere near 99,264 interrupts per
second!  It's more like 1000 int/sec.

The timer code isn't just skipping interrupts, but also cache misses,
unlucky branch prediction, TLB faults, etc.

How often code is effected by these things isn't just random chance, but
depends on the code.  Putting all your constant globals in one cache line
will reduce cache misses, but your benchmark won't see that because you
exclude cache misses from being timed.




More information about the ffmpeg-devel mailing list