[FFmpeg-devel] [PATCH 4/5] x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis

James Almer jamrial at gmail.com
Mon Apr 1 17:59:10 EEST 2019


On 4/1/2019 9:13 AM, Lynne wrote:
> Mar 31, 2019, 11:49 PM by jamrial at gmail.com:
> 
>> A float ret value needs to be in xmm0, and you swapped m0 with m2 on
>> Win64. This is the source of the fate failure.
>>
> Attached a patch to fix this.
> 
>> %if WIN64
>> -    SWAP 0, 2
>> -%endif
>> +    shufps m0, m2, 0
>> +%else
>>       shufps m0, m0, 0
>> +%endif
>> %endif
> 
> 0001-x86-opusdsp-fix-WIN64-return-value.patch
> 
> From 98e93b6f322a2a9dee7499fe87b74cf50a33b022 Mon Sep 17 00:00:00 2001
> From: Lynne <dev at lynne.ee>
> Date: Mon, 1 Apr 2019 13:06:34 +0100
> Subject: [PATCH] x86/opusdsp: fix WIN64 return value
> 
> ---
>  libavcodec/x86/opusdsp.asm | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
> index ed65614e06..a7bff4f0b3 100644
> --- a/libavcodec/x86/opusdsp.asm
> +++ b/libavcodec/x86/opusdsp.asm
> @@ -40,9 +40,10 @@ cglobal opus_deemphasis, 4, 4, 8, out, in, coeff, len
>      VBROADCASTSS m0, coeffm
>  %else
>  %if WIN64
> -    SWAP 0, 2
> -%endif
> +    shufps m0, m2, 0

No, like this shufps will interleave the values from m2 and m0. The
correct instruction would be

shufps m0, m2, m2, 0

Where m2 is used for both input arguments, and the VEX encoding lets us
use another reg as output.
Tested and pushed it with that change. Thanks.


More information about the ffmpeg-devel mailing list