[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Sun Nov 18 18:31:14 CET 2007

Hi,

Michael Niedermayer a ?crit :
>>> the $24 add can be avoided by using a offset for the movq above
>> Applied. Also made me see I didn't use SHIFT2_8B_END_LINE macro.
> 
> if theres just one left then the code can be simplified by not passing
> SHIFT2_16B_END_LINE as argument

Missed that. Done

>> There are 2 reasons why I didn't want to use pmullw as much as possible:
>> - here, I couldn't load the factor in a register (seems less speed
>> critical than in my recollection)
>> - I have a core2 and an Athlon computers; both have a latency for pmullw
>> of 3; I think some P4 have a latency of 6.

In fact, it can even be 8 vs 3...

> to be honest, IMHO the P4 is a failure design wise and it might be better
> not to give too much weight to the P4 in optimization decissions

Well yes, but what matters is how on average people benefit from such
code. If more than 50% of ffmpeg users still have a P4, then I don't
like seeing myself, even as the developer spending his own free time,
imposing such decisions on the majority.

> though of course P4 benchmarks would still be interresting maybe its
> not slower at all

I was also hoping that while changing the code. With that last message,
if we don't get any benchmarks, then there are not that many people
having a P4 that care.

On a side note, I'm currently investigating what we last discussed 4
months ago (thread "MMX version for put_no_rnd_h264_chroma_mc8_c"). I'll
post in that thread.

Best regards,
-- 
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1dsp.diff
Type: text/x-patch
Size: 25874 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071118/24f387ef/attachment.bin>