[FFmpeg-devel] [PATCH] SSE3/4 implementation of flac_encode_residual_lpc
Sun Jun 21 15:47:13 CEST 2009
On Thu, 18 Jun 2009 13:51:00 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, May 30, 2009 at 09:30:28PM +0000, Loren Merritt wrote:
> > On Sat, 30 May 2009, Bobby Bingham wrote:
> >> On Fri, 29 May 2009, Loren Merritt wrote:
> >>> For the remainder, this logic should be doable
> >>> with just 1 paddd and 1 por per vector. Merge several vectors
> >>> before branching.
> >> I'm afraid I don't quite see what you mean by using 1 paddd and 1
> >> por. The attached patch does have a slight improvement in this
> >> piece of code, but I doubt it's what you meant.
> > The C version is:
> > (unsigned)(x+0x8000) >= 0x10000
> > And to merge several entries before the branch:
> > (unsigned)((x+0x8000) | (x+0x8000) | ...) >= 0x10000
> > Or since sse doesn't have an uint32 compare:
> > (((x+0x8000) | (x+0x8000) | ...) >> 16) != 0
> > This won't be much if any faster than yours when testing one vector
> > at a time.
> whats the status of this patch?
> waiting for changes?
> ok to commit?
> want me to review it?
I haven't had much time to work on it lately. I want to try a couple
variations on Loren's idea and compare them before I submit it for
More information about the ffmpeg-devel