[FFmpeg-devel] [PATCH] make building swscale rgb template conditional

Michael Niedermayer michaelni
Tue Sep 14 13:49:58 CEST 2010


On Mon, Sep 13, 2010 at 11:43:34PM -0300, Ramiro Polla wrote:
> On Sun, Sep 5, 2010 at 12:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Sun, Sep 05, 2010 at 03:22:06AM -0300, Ramiro Polla wrote:
> [...]
> >> none of those made mmx2 or sse2 faster than mmx. I also tried with
> >> many widths/heights. What scenario should I test to see a benefit in
> >> using sse2 here? (btw this was all done using a core2duo)
> >
> > some things to keep in mind. The input array should be initialized explicitly
> > due to copy on write OS behavior for what malloc() returns
> > the output array should be bigger than the L2 cache size
> > make sure linesize and width are multiples of 16 and pointers are aligned
> > also check if the prefetch or the movnt cause the problem by commenting the
> > prefetch out
> 
> prefetch made no difference, so it's movntq. I tried with several
> sizes, and mmx was faster on 512x512, they were almost on par on
> 1024x1024, and then sse2 started being faster. Would it make sense to
> have sws_getContext() get the L2 cache size and determine which
> function to use based on whether the image fits in it?

if someone wants to maintain and implement this then i dont mind
    

> 
> >> Going back to my original patch (0003), it did not change
> >> functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
> >> apply it before working on the other issues?
> >
> > i primarely care about things being fixed and no app using swscale being
> > broken ...
> 
> New patch attached.

>  rgb2rgb.c          |   36 ++++++--------
>  rgb2rgb_template.c |  134 ++++++++++++++++++++++++++---------------------------
>  2 files changed, 83 insertions(+), 87 deletions(-)
> 11cbdae62347371eaab1235cb7876f46bc835cb1  dont_misuse_have_xxx.diff

should be ok if tested

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100914/7d7318b9/attachment.pgp>



More information about the ffmpeg-devel mailing list