[Ffmpeg-devel] [PATCH] another vorbis optimization

Michael Niedermayer michaelni
Tue Aug 8 12:14:56 CEST 2006


On Mon, Aug 07, 2006 at 11:32:34PM -0700, Loren Merritt wrote:
> Another 6% faster vorbis decoding. But I am unsure as to the cleanest way 
> to integrate it with run-time cpu detection.

hmm currently we add the BIAS during the windowing, maybe it could
be done prior to the imdct and only to a few coeffs, iam not sure though my
knowledge of the mdct-windowing thingy isnt too good
someone simply would have to feed a constant (=BIAS) vector through
the windowing + MDCT to see if the resulting vector is (approximately) sparse
or not

alternatively you could optimize the windowing stuff in SSE(2) too then there
would be no extra special case :)

> +        ::"r"(15<<23)
> +    );
> +    for(i=0;i<len;i+=4) {
> +        asm volatile(
> +            "movdqa       %1, %%xmm0 \n\t"
> +            "paddd    %%xmm7, %%xmm0 \n\t"

i think that can be avoided by simply multiplying the windows by 1<<15

> +            "cvtps2dq %%xmm0, %%xmm0 \n\t"

if that is replaced by cvtps2pi and the code below changed accordingly then
the code should run on SSE1 cpus, if its slower a seperate SSE1 variant
could be added too, thats of course just an idea, iam happy with SSE2 code
too, just my cpu here isnt :)

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is

More information about the ffmpeg-devel mailing list