[FFmpeg-devel] [PATCH 02/23] vp3/x86: use full transpose for all IDCTs.
Ronald S. Bultje
rsbultje at gmail.com
Tue Mar 12 22:54:10 CET 2013
On Tue, Mar 12, 2013 at 11:56 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Mar 12, 2013 at 07:28:12AM -0700, Ronald S. Bultje wrote:
>> From: "Ronald S. Bultje" <rsbultje at gmail.com>
>> This way, the special IDCT permutations are no longer needed. Bfin code
>> is disabled until someone updates it. This is similar to how H264 does
>> it, and removes the dsputil dependency imposed by the scantable code.
> does this have any speed/performace effect ?
sse2 idct_add goes from 135 to 125 cycles. Overall decode time doesn't
change much, 2.75 seconds both before and after on first 1000 frames
of big buck bunny 720p (1 thread).
mmx probably has a similar speedup. I don't expect C performance to
change at all.
More information about the ffmpeg-devel