[FFmpeg-devel] [PATCH] SSE3/4 implementation of flac_encode_residual_lpc

Loren Merritt lorenm
Sat May 30 23:30:28 CEST 2009


On Sat, 30 May 2009, Bobby Bingham wrote:
> On Fri, 29 May 2009, Loren Merritt wrote:
>
>> For the remainder, this logic should be doable
>> with just 1 paddd and 1 por per vector. Merge several vectors before
>> branching.
>
> I'm afraid I don't quite see what you mean by using 1 paddd and 1 por.
> The attached patch does have a slight improvement in this piece of
> code, but I doubt it's what you meant.

The C version is:
(unsigned)(x+0x8000) >= 0x10000
And to merge several entries before the branch:
(unsigned)((x[0]+0x8000) | (x[1]+0x8000) | ...) >= 0x10000
Or since sse doesn't have an uint32 compare:
(((x[0]+0x8000) | (x[1]+0x8000) | ...) >> 16) != 0

This won't be much if any faster than yours when testing one vector at a 
time.

--Loren Merritt



More information about the ffmpeg-devel mailing list