[FFmpeg-devel] [PATCH] make building swscale rgb template conditional
Michael Niedermayer
michaelni
Tue Sep 14 13:49:58 CEST 2010
On Mon, Sep 13, 2010 at 11:43:34PM -0300, Ramiro Polla wrote:
> On Sun, Sep 5, 2010 at 12:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Sun, Sep 05, 2010 at 03:22:06AM -0300, Ramiro Polla wrote:
> [...]
> >> none of those made mmx2 or sse2 faster than mmx. I also tried with
> >> many widths/heights. What scenario should I test to see a benefit in
> >> using sse2 here? (btw this was all done using a core2duo)
> >
> > some things to keep in mind. The input array should be initialized explicitly
> > due to copy on write OS behavior for what malloc() returns
> > the output array should be bigger than the L2 cache size
> > make sure linesize and width are multiples of 16 and pointers are aligned
> > also check if the prefetch or the movnt cause the problem by commenting the
> > prefetch out
>
> prefetch made no difference, so it's movntq. I tried with several
> sizes, and mmx was faster on 512x512, they were almost on par on
> 1024x1024, and then sse2 started being faster. Would it make sense to
> have sws_getContext() get the L2 cache size and determine which
> function to use based on whether the image fits in it?
if someone wants to maintain and implement this then i dont mind
>
> >> Going back to my original patch (0003), it did not change
> >> functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
> >> apply it before working on the other issues?
> >
> > i primarely care about things being fixed and no app using swscale being
> > broken ...
>
> New patch attached.
> rgb2rgb.c | 36 ++++++--------
> rgb2rgb_template.c | 134 ++++++++++++++++++++++++++---------------------------
> 2 files changed, 83 insertions(+), 87 deletions(-)
> 11cbdae62347371eaab1235cb7876f46bc835cb1 dont_misuse_have_xxx.diff
should be ok if tested
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100914/7d7318b9/attachment.pgp>
More information about the ffmpeg-devel
mailing list