[FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD
Paul B Mahol
onemda at gmail.com
Sun Dec 3 21:52:38 EET 2017
On 12/3/17, Martin Vignali <martin.vignali at gmail.com> wrote:
> 2017-12-03 20:36 GMT+01:00 Paul B Mahol <onemda at gmail.com>:
>
>> On 12/3/17, Martin Vignali <martin.vignali at gmail.com> wrote:
>> >>
>> >> In any case, if clang or gcc can generate better code, then the hand
>> >> written version needs to be optimized to be as fast or faster.
>> >>
>> >>
>> >>
>> > Quick test : pass checkasm (but probably only because width = 256)
>> > hflip_byte_c: 26.4
>> > hflip_byte_ssse3: 20.4
>> >
>> >
>> > INIT_XMM ssse3
>> > cglobal hflip_byte, 3, 5, 2, src, dst, w, x, v, src2
>> > mova m0, [pb_flip_byte]
>> > xor xq, xq ; <======
>> > mov wd, dword wm
>> > sub wq, mmsize * 2
>> > ;remove the cmp here <======
>> > jl .skip
>> >
>> > .loop0: ; process two xmm in the loop
>> > neg xq
>> > movu m1, [srcq + xq - mmsize + 1]
>> > movu m2, [srcq + xq - mmsize * 2 + 1] <======
>> > pshufb m1, m0
>> > pshufb m2, m0 <======
>> > neg xq
>> > movu [dstq + xq], m1
>> > movu [dstq + xq + mmsize], m2 <======
>> > add xq, mmsize * 2 <======
>> > cmp xq, wq
>> > jl .loop0
>> > RET ; add RET here
>> >
>> > ; MISSING one xmm process if need
>> >
>> > .skip:
>> > add wq, mmsize
>> > .loop1:
>> > neg xq
>> > mov vb, [srcq + xq]
>> > neg xq
>> > mov [dstq + xq], vb
>> > add xq, 1
>> > cmp xq, wq
>> > jl .loop1
>> > RET
>>
>> So what is wrong now?
>>
>
> Doesn't see your email, when i send mine.
>
> Check asm result with your last patch (and modify for the short version
> "add xq, mmsize" to "add xq, mmsize * 2")
> hflip_byte_c: 28.0
> hflip_byte_ssse3: 127.5
> hflip_short_c: 276.5
> hflip_short_ssse3: 100.2
>
Ops, fixed.
>
> Do you think if you add RET after the end of loop0 , it can work in all
> cases ?
No, it would try to read before src, and crash.
More information about the ffmpeg-devel
mailing list