[FFmpeg-cvslog] r16207 - trunk/libavcodec/h264.c

Måns Rullgård mans
Thu Dec 18 12:48:58 CET 2008


Michael Niedermayer wrote:
> On Thu, Dec 18, 2008 at 04:27:34AM +0000, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>> > On Thu, Dec 18, 2008 at 03:57:54AM +0000, M?ns Rullg?rd wrote:
>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>> >>
>> >> > On Thu, Dec 18, 2008 at 02:57:17AM +0000, M?ns Rullg?rd wrote:
>> >> >> michael <subversion at mplayerhq.hu> writes:
>> >> >>
>> >> >> > Author: michael
>> >> >> > Date: Thu Dec 18 03:53:18 2008
>> >> >> > New Revision: 16207
>> >> >> >
>> >> >> > Log:
>> >> >> > Use the new idct functions (except chroma as it was slower in
>> >> >> > benchmarks) cathedral +0.5% speed
>> >> >> > aladin +0.6% speed [note aladin has been cat-ed 10 times to reduce
>> >> >> > the influence of init time]
>> >> >> > Speedup also verified via START/STOP_TIMER (difference was very
>> >> >> > significant for the changed parts)
>> >> >>
>> >> >> How much does this hurt on architectures that don't yet have the new
>> >> >> SIMD functions?
>> >> >
>> >> > there are no really new SIMD functions.
>> >> > I just moved the loops like
>> >> > for(i=0; i<16; i++)
>> >> >     dsp->idct4x4_add(blah blah);
>> >> >
>> >> > into dsputil so they are
>> >> >
>> >> > for(i=0; i<16; i++)
>> >> >     idct4x4_add_simdwhatever(blah blah);
>> >> >
>> >> > that way gcc can inline the function and avoids up to 15 calls through
>> >> > dsp->
>> >> >
>> >> > adding support for this to your favorite architecture is a matter of
>> >> > copy & paste and adjusting the function names.
>> >>
>> >> I can see how it can be done.  I'm asking how much of an impact this
>> >> has on performance until it's been done.  What percentage of the old
>> >> calls are affected?
>> >
>> > depends on the video, if i have to guess and thats just a guess id say
>> > more than 50% of the idct work will go to new code
>>
>> That's enough to give a significant slowdown on ARM.  How about a
>> little coordination next time?
>
> Iam not sure what you suggest?
>
> I could of course send out a warning liks
> "will in an hour (if it passes tests and benchmarks) commit code that will
>  require arch specific optims to be updated, until that update they will
>  be slower. I will update x86 myself"
>
> but i dont think that would have helped you.

That is more or less what you did, and it did not help me at all.
What I want is that before you commit the change, you send a patch
to the list and wait one day for arch maintainers to catch up, then
apply.

The following sequence of steps would make the process smooth:

1. Add dsputil hooks, C implementation, and whatever asm you have.
2. Send patch for using new functions.
3. Wait one day for arch-specific code to be updated.
4. Commit patch from 2.

The changes should be proven through benchmarks before step 1.

> And waiting for ppc, sparc, alpha, sh4, ... to be updated is like not
> commiting the change at all ...

I know that.  I'm only asking for a grace period of *one day* for
arch maintainers to their code to avoid speed regressions.  Those
that fail to update within a day will take the performance hit.

I'm sure you would object if someone committed ppc or arm optimisations
slowing down x86 without warning.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-cvslog mailing list