[FFmpeg-devel] [PATCH 1/2] x86/vf_blend: add sse and ssse3 extremity functions

James Almer jamrial at gmail.com
Wed Jun 28 02:46:58 EEST 2017


On 6/27/2017 8:19 PM, Ivan Kalvachev wrote:
> On 6/27/17, James Almer <jamrial at gmail.com> wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavfilter/x86/vf_blend.asm    | 25 +++++++++++++++++++++++++
>>  libavfilter/x86/vf_blend_init.c |  4 ++++
>>  tests/checkasm/vf_blend.c       |  1 +
>>  3 files changed, 30 insertions(+)
>>
>> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm
>> index 33b1ad1496..25f6f5affc 100644
>> --- a/libavfilter/x86/vf_blend.asm
>> +++ b/libavfilter/x86/vf_blend.asm
>> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3
>>      jl .loop
>>  BLEND_END
>>
>> +BLEND_INIT extremity, 8
>> +    pxor       m2, m2
>> +    mova       m4, [pw_255]
>> +.nextrow:
>> +    mov        xq, widthq
>> +
>> +    .loop:
>> +        movu            m0, [topq + xq]
>> +        movu            m1, [bottomq + xq]
>> +        punpckhbw       m5, m0, m2
>> +        punpcklbw       m0, m2
>> +        punpckhbw       m6, m1, m2
>> +        punpcklbw       m1, m2
>> +        psubw           m3, m4, m0
>> +        psubw           m7, m4, m5
>> +        psubw           m3, m1
>> +        psubw           m7, m6
>> +        ABS1            m3, m1
>> +        ABS1            m7, m6
> 
> Minor nitpick.
> 
> There exists ABS2 that takes 4 parameters and that does
> two interleaved ABS1 , that are (hopefully) faster on sse2.
> It should generate exactly the same code on ssse3.

Ah nice, pushed a change to use them. Thanks.


More information about the ffmpeg-devel mailing list