[FFmpeg-devel] [PATCH 1/2]v2 Add macros used in opus_pvq_search to x86util.asm

Ivan Kalvachev ikalvachev at gmail.com
Sun Aug 6 15:36:45 EEST 2017


On 8/6/17, Henrik Gramner <henrik at gramner.com> wrote:
> On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev <ikalvachev at gmail.com> wrote:
>> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
>> +%if cpuflag(avx2)
>> +    vbroadcastss  %1, %2                    ; ymm, xmm
>> +%elif cpuflag(avx)
>> +    %ifnum sizeof%2         ; avx1 register
>> +        vpermilps  xmm%1, xmm%2, q0000      ; xmm, xmm, imm || ymm, ymm,
>> imm
>
> Nit: Use shufps instead of vpermilps, it's one byte shorter but
> otherwise identical in this case.
>
> c5 e8 c6 ca 00    vshufps xmm1,xmm2,xmm2,0x0
> c4 e3 79 04 ca 00 vpermilps xmm1,xmm2,0x0

It's also 1 latency cycle less on some old AMD cpu's.

Done.


>> +%macro BLENDVPS 3 ; dst/src_a, src_b, mask
>> +%if cpuflag(avx)
>> +    blendvps  %1, %1, %2, %3
>> +%elif cpuflag(sse4)
>> +    %if notcpuflag(avx)
>> +        %ifnidn %3,xmm0
>> +            %error sse41 blendvps uses xmm0 as default 3d operand, you
>> used %3
>> +        %endif
>> +    %endif
>
> notcpuflag(avx) is redundant (it's always true since AVX uses the first
> branch).

Done.

This is a remnant from the time I had label to turn on and off
different implementations.


Best Regards

 _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-macros-to-x86util.asm.patch
Type: text/x-patch
Size: 4089 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170806/6a5d83f4/attachment.bin>


More information about the ffmpeg-devel mailing list