[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)

Jason Garrett-Glaser darkshikari
Fri Jun 25 03:08:26 CEST 2010


On Thu, Jun 24, 2010 at 4:56 PM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> On Wed, Jun 23, 2010 at 12:38 AM, Jason Garrett-Glaser
> <darkshikari at gmail.com> wrote:
>> On Tue, Jun 22, 2010 at 8:22 PM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>>> On Tue, Jun 22, 2010 at 6:50 PM, Jason Garrett-Glaser
>>> <darkshikari at gmail.com> wrote:
>>>> On Tue, Jun 22, 2010 at 4:31 PM, Jason Garrett-Glaser
>>>> <darkshikari at gmail.com> wrote:
>>>>> On Tue, Jun 22, 2010 at 4:05 PM, Jason Garrett-Glaser
>>>>> <darkshikari at gmail.com> wrote:
>>>>>> On Tue, Jun 22, 2010 at 12:35 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> as per $subj.
>>>>>>>
>>>>>>> Speed gain:
>>>>>>> - dc_add goes from 1800 to 1350 cycles (where 1150 is overhead,
>>>>>>> measured as empty asm func), so about 3-3.5x faster.
>>>>>>> - The MC functions are each about 4-5x faster (I only measured the 4x4
>>>>>>> ones, the rest I assume are similarly faster but not measured).
>>>>>>> - Total time spent on a shell-script that decodes the whole testsuite
>>>>>>> (vp8-test-vectors-r1, file 001-017) including shell overhead and
>>>>>>> everything goers from 2.3 to 2.1 seconds with these applied.
>>>>>>>
>>>>>>> Results are bit-identical, and this is my first MMX/etc. ever! Thanks
>>>>>>> to Jason for teaching me. ;-).
>>>>>>>
>>>>>>> Ronald
>>>>>>
>>>>>> New patch attached.
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>
>>>>> Now with SSE2 v-filter motion compensation.
>>>>>
>>>>> Jason
>>>>>
>>>>
>>>> Now with full SSE2 MC. ?I also went and updated the x264asm headers
>>>> (and associated asm) to the latest versions. ?This will be split in
>>>> the real commit.
>>>>
>>>> Jason
>>>>
>>>
>>> Now with SSSE3 h-filter. ?I'm pretty sure SSSE3 is something like 2-3
>>> times faster in this case, though I haven't benched any of it, I'm
>>> just going by the number of instructions.
>>>
>>> Jason
>>>
>>
>> Now with full versions of all MC, including SSSE3. ?Not as optimized
>> as it could be, but pretty good so far I think.
>>
>> There are some... issues... with the current code that prevent commit.
>> ?I will be bringing these up with Ronald soon ;)
>>
>> Dark Shikari
>>
>
> Now updated with 16x16 intra pred asm, and rebased against trunk.
>
> Dark Shikari
>

Now with 8x8 intra pred modes and non-broken line endings.  Did I
mention this makes h264 faster too?

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp8_asm.diff
Type: application/octet-stream
Size: 51579 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100624/6bf11da8/attachment.obj>



More information about the ffmpeg-devel mailing list