[Ffmpeg-devel] [PATCH] fix mpeg4 lowres chroma bug and increase h264/mpeg4 MC speed
Thu Feb 15 01:52:35 CET 2007
On Tue, 13 Feb 2007, Michael Niedermayer wrote:
> On Mon, Feb 12, 2007 at 05:53:38PM -0800, Trent Piepho wrote:
> > On Mon, 12 Feb 2007, Michael Niedermayer wrote:
> > > On Mon, Feb 12, 2007 at 12:49:30PM +0100, Michael Niedermayer wrote:
> > > > > > > Why do you discard some times in your TIMER code? Is the goal just to
> > > > > > > discard those times in which an interrupt occured?
> > > > > >
> > > > > > yes
> > > > >
> > > > > That's not what's is doing, there are far too many skips for that to
> > > > > be the case.
> > > >
> > > if anyone has any ideas how we could detect if a interrupt/task switch
> > > happened between START and STOP_TIMER please tell me ...
> i think requireing to modify the kernel for using START/STOP_TIMER is not a
> good idea ...
> rdpmc though might be worth a try ...
You need at least a kernel module to be able to use rdpmc. There are two
different system out there for pmc on linux. Both require a patched kernel
and create a device for controlling the PMCs. I thought this was too
complex and so I wrote a simple kernel module that lets me turn on the pmcs
without having to the patch the kernel source or mess with any userspace
I've tested it for counting interrupts, and it works quite well.
> > If it's the latter, then I wouldn't worry about catching interrupts. Some
> > calls of the code will get an interrupt and have too many cycles. Most of
> > the calls won't. The interrupts will add a very small amount of error to
> > your average cycle count. That's ok. All measurements in all fields of
> > science have error!
> yes and in all fields of science peoply try to reduce the errors in their
> measurements ...
You assume you are reducing error! I wouldn't be so sure that's the case.
Say you have the sequence of times:
1 100 1 100 1 100 1 100 = gives average 50
1 1 100 100 1 100 1 100 = gives average 4
I just swapped the 2nd and 3rd value. The average your skipping method
finds changed from 50 to 4. Totally different average from the same data
in a slight different order.
What you've done is add a new source of error, one that makes the result
change based on the precise order of the initial observation values. Since
this error is based on the data itself, I don't think you can call it
independent, so the central limit theorem goes out the window.
> > One run of one version of code will have some error from interrupts. The
> > next run will have a different amount of error. The other version of the
> > code will have error too when it's benchmarked. This is why you run the
> > benchmarks many times. Then you can use statistics to make mathematically
> > precise statements about the confidence of one version being faster than
> > another despite the presence of measurement error. If difference between
> > versions is so small and the error so large that the error overshadows the
> > difference, then statistics will tell you that you can't say which is
> > faster with much confidence. That's probably a good sign you're wasting
> > your optimization efforts trying to decide which version to use.
> the problem is that errors are systematic and statistics cannot seperate them
> out from a routine which just really sometimes needs much more time
> think of the following example:
> a routine which needs 10000 cycles per run but once in 100 runs it needs
> 1000000 cycles (due to some task switch and an other app doing something
> or maybe it really has to deal with more complex data)
> suddenly your code looks as if it needs 20000 cycles ...
If it's because in a representative dataset, once every 100 calls more
complex data comes along, wouldn't it be correct to say the average speed
is 20000 cycles?
If you had another version that ran in only 1,000 cycles except that the 1
in 100 hard data made is use 10,000,000 cycles, that version would be
slower, would it not?
If the extra 1,000,000 cycles is because of a task switch, you would expect
that to be random, right? That's why you repeat the benchmark many times!
Some will average 20,000 because of the task switch and some will be
10,000. If the other version of the code also has times between about
10,000 and 20,000, then a statistical hypothesis test will tell you that
you can't tell which is faster with the amount of measurement error caused
by the task switch, if that's the case.
Suppose both versions happen to get hit with the same number of task
switches that take the same amount of time. Then the task switches didn't
really add any "error", did they? The difference of the means of the two
versions will still be the same as it would have been without any task
switch times! A statistical test will tell you this.
The only problem would be if all the runs of the benchmark of one version
get hit with the same cost of task switches, and all the runs of the other
version get hit with the same cost of task switches, but the costs between
the two versions are somehow different. And this difference has nothing to
do with the code.
More information about the ffmpeg-devel