[FFmpeg-devel] [PATCH] SSE RDFT
Sat Mar 20 22:43:21 CET 2010
2010/3/20 M?ns Rullg?rd <mans at mansr.com>
> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
> > On Sun, Mar 14, 2010 at 3:23 PM, Alex Converse <alex.converse at gmail.com>
> >> I'm sure I've made some embarrassingly amateurish mistakes here.
> >> Feedback is more than welcome.
> >> --Alex
> > In the interests of getting away from discussions about yasm and into
> > actually reviewing the asm...
> > +///sign mask of RDFT sine terms
> > Three / ?
> > Looking at the asm overall, it looks like there's a huge amount of
> > moving stuff around and very little actual calculation. Is there no
> > better way to organize it?
> > + "movlps (%4,%0,4), %%xmm4 \n\t"
> > + "unpcklps %%xmm4, %%xmm4 \n\t"
> > + "movlps (%5,%0,4), %%xmm3 \n\t"
> > + "unpcklps %%xmm3, %%xmm3 \n\t"
> > This looks like a candidate for movsldup in an SSE3 version.
Sorry, I've been a little tied up trying to finish up PS.
There is a lot of data shuffling in here. One potential reduction is
reorganizing the trig tables but keeping extra trig tables around is always
a bit controversial.
More information about the ffmpeg-devel