[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)
Wed Jun 23 05:22:11 CEST 2010
On Tue, Jun 22, 2010 at 6:50 PM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> On Tue, Jun 22, 2010 at 4:31 PM, Jason Garrett-Glaser
> <darkshikari at gmail.com> wrote:
>> On Tue, Jun 22, 2010 at 4:05 PM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>>> On Tue, Jun 22, 2010 at 12:35 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>> as per $subj.
>>>> Speed gain:
>>>> - dc_add goes from 1800 to 1350 cycles (where 1150 is overhead,
>>>> measured as empty asm func), so about 3-3.5x faster.
>>>> - The MC functions are each about 4-5x faster (I only measured the 4x4
>>>> ones, the rest I assume are similarly faster but not measured).
>>>> - Total time spent on a shell-script that decodes the whole testsuite
>>>> (vp8-test-vectors-r1, file 001-017) including shell overhead and
>>>> everything goers from 2.3 to 2.1 seconds with these applied.
>>>> Results are bit-identical, and this is my first MMX/etc. ever! Thanks
>>>> to Jason for teaching me. ;-).
>>> New patch attached.
>> Now with SSE2 v-filter motion compensation.
> Now with full SSE2 MC. ?I also went and updated the x264asm headers
> (and associated asm) to the latest versions. ?This will be split in
> the real commit.
Now with SSSE3 h-filter. I'm pretty sure SSSE3 is something like 2-3
times faster in this case, though I haven't benched any of it, I'm
just going by the number of instructions.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 48675 bytes
Desc: not available
More information about the ffmpeg-devel