[Ffmpeg-devel] [PATCH] fix mpeg4 lowres chroma bug and increase h264/mpeg4 MC speed

Trent Piepho xyzzy
Mon Feb 19 02:24:25 CET 2007

On Fri, 16 Feb 2007, Michael Niedermayer wrote:
> On Wed, Feb 14, 2007 at 04:52:35PM -0800, Trent Piepho wrote:
> > and create a device for controlling the PMCs.  I thought this was too
> > complex and so I wrote a simple kernel module that lets me turn on the pmcs
> > without having to the patch the kernel source or mess with any userspace
> > libraries.
> >
> > I've tested it for counting interrupts, and it works quite well.
> is the code available somewhere?

I haven't released anything yet.

> > > the problem is that errors are systematic and statistics cannot seperate them
> > > out from a routine which just really sometimes needs much more time
> > > think of the following example:
> > >
> > > a routine which needs 10000 cycles per run but once in 100 runs it needs
> > > 1000000 cycles (due to some task switch and an other app doing something
> > > or maybe it really has to deal with more complex data)
> > > suddenly your code looks as if it needs 20000 cycles ...
> >
> > If it's because in a representative dataset, once every 100 calls more
> > complex data comes along, wouldn't it be correct to say the average speed
> > is 20000 cycles?
> of course that is the problem ...
> > If you had another version that ran in only 1,000 cycles except that the 1
> > in 100 hard data made is use 10,000,000 cycles, that version would be
> > slower, would it not?
> of course

But this is the exact opposite of what the benchmarking code will find!
All calls to the function that take "too long" are thrown away, no matter
if they are from interrupts or from the data.  The first version would have
an average of 10,000 and the second of 1,000, since all the slow calls will
get ignored.  But really the second version is 5x slower, not 10x faster!

This is exactly the situation where using statistics will get the correct
result.  You run each version many times on _the same data_.  The calls
that are fast because of the data will be fast each time, the calls that
are slow will be slow each time.

If you somehow ran with interrupts disabled and everything was in the same
memory locations, each run would have exactly the same time, as the data is
the same each time.

But each run didn't produce exactly the same time, there is variation.
Where did this variation come from?  It's not because the data, as the data
was the same each time.  It's not from the code, the code was the same each
time.  It must be from something that wasn't the same, like interrupts and
the effect of other tasks.

When you compared the times of many runs of one version of the code, all
the variation is due to error.  This gives an estimate of the magnitude of
the error.

When you compare the times from two versions of the code, there are two
places the variation could come from.  It can be because the code is
different and it can be because of error.  We compare the difference
between the two versions to our estimate of the error, and this tells us,
"is the difference between the two versions too much to be attributed to

> also the amount of time (repeating benchmarks) and inconvinience (stoping all
> applications and deamons) must be considered

It would even easier to just flip a coin...

More information about the ffmpeg-devel mailing list