[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Jason Garrett-Glaser darkshikari
Sat May 23 19:49:47 CEST 2009


On Sat, May 23, 2009 at 1:40 PM, Bobby Bingham <uhmmmm at gmail.com> wrote:
> On Sat, 23 May 2009 07:00:59 -0400
> Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
>
>> On Fri, May 22, 2009 at 11:40 PM, Bobby Bingham <uhmmmm at gmail.com>
>> wrote:
>> > On Sun, 3 May 2009 21:21:19 -0700
>> > Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
>> >> > "phaddd     %%xmm1, %%xmm0          \n\t"
>> >> > "phaddd     %%xmm3, %%xmm2          \n\t"
>> >> > "phaddd     %%xmm2, %%xmm0          \n\t"   // xmm0 = [p0, p1,
>> >> > p2, p3]
>> >>
>> >> Did you not find a better way of doing this without PHADD, given
>> >> how slow it is?
>> >
>> > The best I've come up with so far is this, but I can't compare the
>> > speed:
>> >
>> > "movdqa     %%xmm0, %%xmm4          \n\t"
>> > "movdqa     %%xmm2, %%xmm5          \n\t"
>> > "punpckldq  %%xmm1, %%xmm0          \n\t"
>> > "punpckhdq  %%xmm1, %%xmm4          \n\t"
>> > "punpckldq  %%xmm3, %%xmm2          \n\t"
>> > "punpckhdq  %%xmm3, %%xmm5          \n\t"
>> > "paddd      %%xmm4, %%xmm0          \n\t"
>> > "paddd      %%xmm5, %%xmm2          \n\t"
>> > "movdqa     %%xmm0, %%xmm1          \n\t"
>> > "punpcklqdq %%xmm2, %%xmm0          \n\t"
>> > "punpckhqdq %%xmm2, %%xmm1          \n\t"
>> > "paddd      %%xmm1, %%xmm0          \n\t"
>>
>> You really should not be writing assembly without a system to test it
>> on.
>>
>> Various people have shell accounts they can loan you--for example,
>> checkers on #x264 can give out shell accounts on Penryn-based Linux
>> systems.
>
> In the meantime, here's an SSE2 version I have tested.  I'm not really
> happy with calling the C version for the cases where 32 bit multiplies
> are needed, but I haven't found the time yet to implement that in <SSE4.
>
> Also, with this patch, gcc warns that need32 might be used
> uninitialized, but it is always initialized by the assembly.  Does
> someone know how to silence this warning?
>
> --
> Bobby Bingham
> ??????????????????????
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>

> "movlhps    %%xmm3,  %%xmm5         \n\t"
> "movhlps    %%xmm4,  %%xmm5         \n\t"

Have you tried replacing this by movdqa/shufpd?  That should have one
less latency.  This is actually the first place I've ever seen shufpd
be potentially useful.

Dark Shikari



More information about the ffmpeg-devel mailing list