[FFmpeg-devel] [PATCH] Fix function parameters for rgb48 to YV12 functions.

Michael Niedermayer michaelni
Wed Feb 3 03:48:01 CET 2010


On Tue, Feb 02, 2010 at 11:30:24PM -0200, Ramiro Polla wrote:
> Hi,
> 
> On Tue, Feb 2, 2010 at 5:42 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Tue, Feb 02, 2010 at 08:21:15PM +0100, Reimar D?ffinger wrote:
> >> On Tue, Feb 02, 2010 at 08:01:26PM +0100, Michael Niedermayer wrote:
> >> > On Tue, Feb 02, 2010 at 04:10:06PM -0200, Ramiro Polla wrote:
> >> > > Hello Michael,
> >> > >
> >> > > On Sun, Jan 24, 2010 at 8:31 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > > > the gain happens when you change the variables used to calculate the index
> >> > > > also to it. You could also try to make the index unsigned but make sure it
> >> > > > cant be negative if you try this
> >> > >
> >> > > Sorry but I still don't understand how that will be of use here in
> >> > > libswscale. I've tried forcing int32_t and int64_t for x86_64 in some
> >> > > of those functions (some xxxTo(Y|UV), hScale and the fast bilinear
> >> > > ones), in all C, MMX and MMX2. All I can see is the expansion from
> >> > > 32-bit to 64-bit being changed from caller and callee. There is no
> >> > > difference in the inner loop, nor in how gcc addresses the the src and
> >> > > dst arrays.
> >> >
> >> > maybe theres no gain for swscale, i cant say without looking at the asm
> >> > gcc generates.
> >> > i know that in h264 gcc filled some functions with 32->64 sign extension
> >> > code in the inner loops.
> >>
> >> Which compilation options have you been using?
> >
> > default of ffmpeg & gcc-4.4
> > also a quick
> > grep movslq libavcodec/h264_loopfilter.S | grep -v '('
> > ? ?68 ? ? 136 ? ?1220
> >
> > and with -mtune=core2 -march=core2 -mcpu=core2
> > grep movslq libavcodec/h264_loopfilter.S | grep -v '(' |wc
> > ? ?68 ? ? 136 ? ?1226
> >
> > so no, its not helping it still does produce all the register-register
> > sign extensions
> 
> Hmm, I think I understand now what you mean... This is what the asm of
> some functions look like when things get changes from long to int.
> I'll put the sizes of some functions as in <name> <size with long>
> <size with int> <int - long>, along with their differences (mostly
> only prologues). All tested with gcc 4.4.1 from ubuntu 9.10:
> 
> nv12ToUV_MMX    77  87  10
> BEToUV_MMX      84  90  6
> and similar _MMX functions.
> int
>     lea    (%r8,%r8,1),%r9d
>     movslq %r8d,%rax
>     neg    %r8d
>     movslq %r8d,%r8
>     add    %rax,%rdi
>     add    %rax,%rsi
>     movslq %r9d,%r9
>     add    %r9,%rdx
>     add    %r9,%rcx
>     movq   0x0,%mm4
> long:
>     lea    (%r8,%r8,1),%rax
>     mov    %r8,%r9
>     add    %r8,%rdi
>     neg    %r9
>     add    %r8,%rsi
>     add    %rax,%rdx
>     add    %rax,%rcx
>     movq   0x0,%mm4
> 
> bgr24ToUV_half_3DNow    142 172 30
> int:
>     test   %r8d,%r8d
>     push   %rbx
>     jle    1cd8a <bgr24ToUV_half_3DNow+0xaa>
>     sub    $0x1,%r8d
>     xor    %eax,%eax
>     lea    0x3(%r8,%r8,2),%rbx
>     add    %rbx,%rbx
>     movzbl (%rdx,%rax,1),%ecx
>     movzbl 0x3(%rdx,%rax,1),%r9d
>     movzbl 0x4(%rdx,%rax,1),%r10d
>     movzbl 0x5(%rdx,%rax,1),%r8d
> long:
>     test   %r8,%r8
>     jle    1cb1c <bgr24ToUV_half_3DNow+0x8c>
>     lea    (%rdi,%r8,1),%r8
>     movzbl (%rdx),%eax
>     movzbl 0x3(%rdx),%r9d
>     movzbl 0x4(%rdx),%r10d
>     movzbl 0x5(%rdx),%ecx
> 
> rgb32ToUV   139 143 4
> int
>     sub    $0x1,%r8d
>     lea    0x4(%rdx,%r8,4),%r11
>     mov    (%rdx),%ecx
> long
>     push   %rbx
>     shl    $0x2,%r8
>     xor    %ecx,%ecx
>     mov    (%rdx,%rcx,1),%r9d
> 
> all hyscale_fast functions have only one more movslq in the int version.
> 
> Then many have this difference where the int version uses sub and lea
> while the long version uses either add %reg,%reg or shl $2, %reg.
> 
> abgrToA         37  45  8
> BEToUV_C        51  51  0
> nv12ToUV_C      48  48  0
> int
>     sub    $0x1,%r8d
>     lea    0x2(%r8,%r8,1),%r8
> long
>     add    %r8,%r8
> 
> rgb15ToUV       141 143 2
> long uses rbx (as in it pushes and pops rbx) while the int version
> doesn't, long accesses arrays with movzwl (%rdx,%rax,1),%r9d instead
> of movzwl (%rdx),%ecx in the inner loop (I don't know what difference
> this makes). long uses add %r8,%r8 instead of sub & lea.
> 
> Then there's:
> rgb15ToUV_half  166 174 8
> int
>     sub    $0x1,%r8d
>     lea    0x4(%rdx,%r8,4),%r10
> long
>     lea    (%rdx,%r8,4),%r10
> 
> Very few functions are larger with long such as:
> rgb15ToY        95  84  -11
> int uses sub & lea instead of add. long uses more 64-bit registers so
> the instructions are larger.
> 
> 
> And on to the caller,
> 
> swScale_C       10319   10082   -237
> long has 9 more movslq, uses more stack
> 
> I haven't checked all functions though.
> 
> 
> The final size (with runtime cpudetect):
> 841336 swscale_ints.o
> 841024 swscale_longs.o
> 
> The number of movslq between registers:
> $ objdump -d swscale_ints.o | grep movslq | grep -v "(" | wc -l
> 1038
> $ objdump -d swscale_longs.o | grep movslq | grep -v "(" | wc -l
> 927
> 
> No speed differences were ever noticed. Dark_Shikari tells me a movslq
> between registers is 1uop...
> 
> As for other architectures, the arm and ppc I have would have made no
> difference since they're not 64-bit.
> 
> I've attached a patch which adds an array_index type, if that's what
> you had in mind.
> 

> Otherwise I really don't know what to do. Long is being misused here,
> and breaks compilation on mingw-w64.

if theres no speed difference i dont mind int being used
I also dont mind the array_index stuff being added to libavutil

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100203/7195a7a0/attachment.pgp>



More information about the ffmpeg-devel mailing list