[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc
Bobby Bingham
uhmmmm
Sat May 23 05:40:27 CEST 2009
On Sun, 3 May 2009 21:21:19 -0700
Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
> > "phaddd %%xmm1, %%xmm0 \n\t"
> > "phaddd %%xmm3, %%xmm2 \n\t"
> > "phaddd %%xmm2, %%xmm0 \n\t" // xmm0 = [p0, p1, p2,
> > p3]
>
> Did you not find a better way of doing this without PHADD, given how
> slow it is?
The best I've come up with so far is this, but I can't compare the
speed:
"movdqa %%xmm0, %%xmm4 \n\t"
"movdqa %%xmm2, %%xmm5 \n\t"
"punpckldq %%xmm1, %%xmm0 \n\t"
"punpckhdq %%xmm1, %%xmm4 \n\t"
"punpckldq %%xmm3, %%xmm2 \n\t"
"punpckhdq %%xmm3, %%xmm5 \n\t"
"paddd %%xmm4, %%xmm0 \n\t"
"paddd %%xmm5, %%xmm2 \n\t"
"movdqa %%xmm0, %%xmm1 \n\t"
"punpcklqdq %%xmm2, %%xmm0 \n\t"
"punpckhqdq %%xmm2, %%xmm1 \n\t"
"paddd %%xmm1, %%xmm0 \n\t"
--
Bobby Bingham
??????????????????????
More information about the ffmpeg-devel
mailing list