[FFmpeg-devel] [PATCH 6/6] lossless audio dsp: unroll

James Almer jamrial at gmail.com
Tue Apr 19 01:37:21 CEST 2016


On 4/18/2016 2:52 PM, Christophe Gisquet wrote:
> 2016-04-18 19:15 GMT+02:00 James Almer <jamrial at gmail.com>:
>> On 4/18/2016 10:07 AM, Christophe Gisquet wrote:
>>> The loops are guaranteed to be at least multiples of 8, so this
>>> unrolling is safe but allows exploiting execution ports.
>>>
>>> For int32 version: 72 -> 57c.
>>
>> What compiler are you using, and what cpu at configure time?
> 
> gcc 5.1, Win64, haswell. I don't use mingw64 compiler.
> 
>> We're currently enabling tree vectorization for gcc 4.9 or newer on x86,
>> and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse.
>> I didn't bench it, but after this patch it's not being vectorized anymore.
> 
> The code I benchmarked as being 72c is vectorized and keeps being
> vectorized here. It actually looks better than the previously
> vectorized one.
> 
> The 16_c version is no longer vectorized, but is really a mess here
> when vectorized.

The 16_c one isn't important since we have sse2 and even mmxext versions. But
you're right the 32_c one remains vectorized, even when targeting <SSE4 cpus,
so the patch should be good.


More information about the ffmpeg-devel mailing list