[FFmpeg-devel] [PATCH] SSE RDFT

Jason Garrett-Glaser darkshikari
Mon Mar 15 01:53:14 CET 2010


On Sun, Mar 14, 2010 at 3:23 PM, Alex Converse <alex.converse at gmail.com> wrote:
> I'm sure I've made some embarrassingly amateurish mistakes here.
> Feedback is more than welcome.
>
> --Alex

In the interests of getting away from discussions about yasm and into
actually reviewing the asm...

+///sign mask of RDFT sine terms

Three / ?

Looking at the asm overall, it looks like there's a huge amount of
moving stuff around and very little actual calculation.  Is there no
better way to organize it?

+        "movlps     (%4,%0,4), %%xmm4     \n\t"
+        "unpcklps      %%xmm4, %%xmm4     \n\t"
+        "movlps     (%5,%0,4), %%xmm3     \n\t"
+        "unpcklps      %%xmm3, %%xmm3     \n\t"

This looks like a candidate for movsldup in an SSE3 version.

Dark Shikari



More information about the ffmpeg-devel mailing list