[FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

Mon Jun 12 02:31:51 EEST 2017

On 2017-06-11 11:34, Ivan Kalvachev wrote:
> On 6/10/17, James Darnley <james.darnley at gmail.com> wrote:
>> On 2017-06-09 13:41, Ivan Kalvachev wrote:
>>>
>>> const_*_edge is used on only one place is the code.
>>> Would you check if this patch fixes the issue.
>>>
>>> I expected that the addresses would be pre-calculated
>>> by n/yasm as one value and indexed
>>> relative to the section start.
>>> Instead it seems that each entry is represented with
>>> its own address and offset from it.
>>> Since the offset is negative it uses all 64 bits and
>>> it makes difference if it is truncated to 32 bits.
>>>
>>> Same issue could happen with clang tools.
>>
>> The problem is with the relative addressing.  You need to load the real
>> address first before you can offset with another register at runtime. So
>> something like:
>>
>>> mov  reg1,  [read_only_const]
> lea ?

Yes, sorry about that.

> ======
> --- a/libavcodec/x86/opus_pvq_search.asm
> +++ b/libavcodec/x86/opus_pvq_search.asm
> @@ -406,7 +406,7 @@ align 16
>  ; uint32 N      - Number of vector elements. Must be 0 < N < 8192
>  ;
>  %macro PVQ_FAST_SEARCH 0
> -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
> +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N
>  %define tmpX rsp
> 
>  ;        movsxdifnidn Nq,  Nd
> @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
>          add         Nq,   r4q           ; Nq = align(Nq, mmsize)
>          sub         rsp,  Nq            ; allocate tmpX[Nq]
> 
> +%ifdef PIC
> +        lea         r5q,  [const_align_abs_edge]            ; rip+const
> +        movups      m3,   [r5q+r4q-mmsize]                  ; this is the bit mask for the padded read at the end of the input
> +%else
>          movups      m3,   [const_align_abs_edge-mmsize+r4q] ; this is the bit mask for the padded read at the end of the input
> +%endif
> 
>          lea         r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned up) to mmsize, so r4q can't become negative here, unless N=0.
>          movups      m2,   [inXq + r4q]
> ======

FYI that diff works just fine, when I correct the line wrapping.  I can
compile and run on Win64.  This is not a comment about the original patch.