[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Sun Nov 18 18:31:14 CET 2007
Michael Niedermayer a ?crit :
>>> the $24 add can be avoided by using a offset for the movq above
>> Applied. Also made me see I didn't use SHIFT2_8B_END_LINE macro.
> if theres just one left then the code can be simplified by not passing
> SHIFT2_16B_END_LINE as argument
Missed that. Done
>> There are 2 reasons why I didn't want to use pmullw as much as possible:
>> - here, I couldn't load the factor in a register (seems less speed
>> critical than in my recollection)
>> - I have a core2 and an Athlon computers; both have a latency for pmullw
>> of 3; I think some P4 have a latency of 6.
In fact, it can even be 8 vs 3...
> to be honest, IMHO the P4 is a failure design wise and it might be better
> not to give too much weight to the P4 in optimization decissions
Well yes, but what matters is how on average people benefit from such
code. If more than 50% of ffmpeg users still have a P4, then I don't
like seeing myself, even as the developer spending his own free time,
imposing such decisions on the majority.
> though of course P4 benchmarks would still be interresting maybe its
> not slower at all
I was also hoping that while changing the code. With that last message,
if we don't get any benchmarks, then there are not that many people
having a P4 that care.
On a side note, I'm currently investigating what we last discussed 4
months ago (thread "MMX version for put_no_rnd_h264_chroma_mc8_c"). I'll
post in that thread.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 25874 bytes
Desc: not available
More information about the ffmpeg-devel