[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Bobby Bingham uhmmmm
Sat May 23 05:40:27 CEST 2009


On Sun, 3 May 2009 21:21:19 -0700
Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
> > "phaddd     %%xmm1, %%xmm0          \n\t"
> > "phaddd     %%xmm3, %%xmm2          \n\t"
> > "phaddd     %%xmm2, %%xmm0          \n\t"   // xmm0 = [p0, p1, p2,
> > p3]
> 
> Did you not find a better way of doing this without PHADD, given how
> slow it is?

The best I've come up with so far is this, but I can't compare the
speed:

"movdqa     %%xmm0, %%xmm4          \n\t"
"movdqa     %%xmm2, %%xmm5          \n\t"
"punpckldq  %%xmm1, %%xmm0          \n\t"
"punpckhdq  %%xmm1, %%xmm4          \n\t"
"punpckldq  %%xmm3, %%xmm2          \n\t"
"punpckhdq  %%xmm3, %%xmm5          \n\t"
"paddd      %%xmm4, %%xmm0          \n\t"
"paddd      %%xmm5, %%xmm2          \n\t"
"movdqa     %%xmm0, %%xmm1          \n\t"
"punpcklqdq %%xmm2, %%xmm0          \n\t"
"punpckhqdq %%xmm2, %%xmm1          \n\t"
"paddd      %%xmm1, %%xmm0          \n\t"

-- 
Bobby Bingham
??????????????????????



More information about the ffmpeg-devel mailing list