[FFmpeg-devel] [PATCH]v5 Opus Pyramid Vector Quantization Search in x86 SIMD asm

Rostislav Pehlivanov atomnuker at gmail.com
Sun Jul 23 03:47:45 EEST 2017


On 22 July 2017 at 12:18, Ivan Kalvachev <ikalvachev at gmail.com> wrote:

> This patch is ready for review and inclusion.
>
>
>
>+%macro emu_blendvps 3 ; dst/src_a, src_b, mask
>+%macro lea_pic_base 2; reg, const_label
Capitalize macro names.


>+      %error sse41 blendvps uses xmm0 as default 3d operand, you used %3
Remove this error

>+    %error "blendvps" emulation needs SSE
Definitely remove this error too.

>+        %error AVX1 does not support integer 256bit ymm operations
And this one is particularly pointless...

>+      %error sse41 blendvps uses xmm0 as default 3 operand, you used %3
Same...

>+    %error "broadcastss" emulation needs SSE
Same...

>+    %error "pbroadcastd" emulation needs SSE2
Yep, the same...

>+    %error pick HSUMPS implementation
Again...

All of these are pointless and unnecessary. Always assume at least SSE2.


>+
>+; 256bit version seems slower than the 128bit AVX on all current CPU's
>+; So disable it for now and keep the code in case something changes in
future.
>+%if HAVE_AVX2_EXTERNAL > 0 && 0
>+INIT_YMM avx2
>+PVQ_FAST_SEARCH
>+%endif

Remove this altogether.


>+%if 1
>+    emu_pbroadcastd m1,   xmm1
...
>+%else
>+        ; Ryzen is the only CPU at the moment that doesn't have
>+        ; write forwarding stall and where this code is faster
>+        ; with about 7 cycles on avarage.
>+        %{1}ps      m5,   mm_const_float_1
>+        movss       [tmpY + r4q], xmm5      ; Y[max_idx] +-= 1.0;

Remove the %else and always use the first bit of code.


>+%if cpuflag(avx2) && 01
>+%elif cpuflag(avx) && 01
>+%if cpuflag(avx2) && 01

Remove the && 01 in these 3 places.


>+;      vperm2f128   %1,   %1,   %1,   0 ; ymm, ymm, ymm     ; 2-3c 1/1

Remove this commented out code.


>+cglobal pvq_search, 4, 5+num_pic_regs, 11, 256*4, inX, outY, K, N
>+%if ARCH_X86_64 == 0    ;sbrdsp

You're using more than 11 xmm regs, so the entire code is always going to
need a 64 bit system.
So wrap the entire file, from top to bottom after the %include in
%if ARCH_X86_64
<everything>
%endif


>+SECTION_RODATA 64
>+
>+const_float_abs_mask:   times 8 dd 0x7fffffff
>+const_align_abs_edge:   times 8 dd 0
>+
>+const_float_0_5:        times 8 dd 0.5
>+const_float_1:          times 8 dd 1.0
>+const_float_sign_mask:  times 8 dd 0x80000000
>+
>+const_int32_offsets:
>+                        %rep 8
>+                                dd $-const_int32_offsets
>+                        %endrep
>+SECTION .text

This whole thing needs to go right at the very top after the %include. All
macros must then follow it.
Having read only sections in the middle of the file is very confusing and
not at all what all of our asm code follows.


More information about the ffmpeg-devel mailing list