[FFmpeg-devel] [FFmpeg-cvslog] r12171 - trunk/doc/optimization.txt
Michael Niedermayer
michaelni
Thu Feb 21 20:28:10 CET 2008
On Thu, Feb 21, 2008 at 09:16:39PM +0200, ?smail D?nmez wrote:
> Hi,
>
> On Thu, Feb 21, 2008 at 9:11 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Thu, Feb 21, 2008 at 08:52:17PM +0200, ?smail D?nmez wrote:
> > > Hi,
> > >
> > > >Author: melanson
> > > >Date: Thu Feb 21 19:46:49 2008
> > > >New Revision: 12171
> > > >
> > > >Log:
> > > >minor English corrections
> > > >
> > > >
> > > >Modified:
> > > > trunk/doc/optimization.txt
> > > [...]
> > > > -Use asm() instead of intrinsics. Later requires a good optimizing compiler
> > > > +Use asm() instead of intrinsics. The latter requires a good optimizing compiler
> > > > which gcc is not.
> > >
> > > We all know this is FUD now, I know Michael still uses gcc 2.95 but
> > > the world have moved on. GCC 4.3 is about to be released.
> > > So please either backup these claims or note that this is not true for
> > > recent GCCs.
> >
> > I use gcc r132072 ATM, i admit its a few days old, do you claim that gcc
> > was rewritten yesterday?
> >
> > Also to backup the claim, the following was suggested to me a few days ago:
> > -static inline void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, int stride)
> > +static void diff_pixels_mmx(DCTELEM *block, const uint8_t *s1, const uint8_t *s2, long stride)
> > {
> > - asm volatile(
> > - "pxor %%mm7, %%mm7 \n\t"
> > - "mov $-128, %%"REG_a" \n\t"
> > - ASMALIGN(4)
> > - "1: \n\t"
> > - "movq (%0), %%mm0 \n\t"
> > - "movq (%1), %%mm2 \n\t"
> > - "movq %%mm0, %%mm1 \n\t"
> > - "movq %%mm2, %%mm3 \n\t"
> > - "punpcklbw %%mm7, %%mm0 \n\t"
> > - "punpckhbw %%mm7, %%mm1 \n\t"
> > - "punpcklbw %%mm7, %%mm2 \n\t"
> > - "punpckhbw %%mm7, %%mm3 \n\t"
> > - "psubw %%mm2, %%mm0 \n\t"
> > - "psubw %%mm3, %%mm1 \n\t"
> > - "movq %%mm0, (%2, %%"REG_a") \n\t"
> > - "movq %%mm1, 8(%2, %%"REG_a") \n\t"
> > - "add %3, %0 \n\t"
> > - "add %3, %1 \n\t"
> > - "add $16, %%"REG_a" \n\t"
> > - "jnz 1b \n\t"
> > - : "+r" (s1), "+r" (s2)
> > - : "r" (block+64), "r" ((long)stride)
> > - : "%"REG_a
> > - );
> > + long offset = -128;
> > + MOVQ_ZERO(mm7);
> > + do {
> > + asm volatile(
> > + "movq (%0), %%mm0 \n\t"
> > + "movq (%1), %%mm2 \n\t"
> > + "movq %%mm0, %%mm1 \n\t"
> > + "movq %%mm2, %%mm3 \n\t"
> > + "punpcklbw %%mm7, %%mm0 \n\t"
> > + "punpckhbw %%mm7, %%mm1 \n\t"
> > + "punpcklbw %%mm7, %%mm2 \n\t"
> > + "punpckhbw %%mm7, %%mm3 \n\t"
> > + "psubw %%mm2, %%mm0 \n\t"
> > + "psubw %%mm3, %%mm1 \n\t"
> > + "movq %%mm0, (%2, %4) \n\t"
> > + "movq %%mm1, 8(%2, %4) \n\t"
> > + : : "r" (s1), "r" (s2), "r" (block+64), "r" (stride), "r" (offset)
> > + : "memory");
> > + s1 += stride;
> > + s2 += stride;
> > + offset += 16;
> > + } while (offset < 0);
> > }
> >
> > the effect that has on the generated asm is:
> > .L143:
> > .loc 3 241 0
> > leaq (%rsi,%r8), %rdx
> > leaq (%r10,%r8), %rax
> > #APP
> > # 241 "dsputil_mmx.c" 1
> > movq (%rdx), %mm0
> > movq (%rax), %mm2
> > movq %mm0, %mm1
> > movq %mm2, %mm3
> > punpcklbw %mm7, %mm0
> > punpckhbw %mm7, %mm1
> > punpcklbw %mm7, %mm2
> > punpckhbw %mm7, %mm3
> > psubw %mm2, %mm0
> > psubw %mm3, %mm1
> > movq %mm0, (%rdi, %r9)
> > movq %mm1, 8(%rdi, %r9)
> >
> > # 0 "" 2
> > .loc 3 258 0
> > #NO_APP
> > addq %rcx, %r8
> > .loc 3 259 0
> > addq $16, %r9
> > jne .L143
> > -------------
> >
> > As you can see gcc injects 2 unneeded lea instructions in the innermost loop.
> > And i think this is a very simple asm, if you want you can try this with some
> > complex code, but i recommand that you have a few bags for vomit ready ...
>
> If you can give an example based on complex asm we can report a bug to
> gcc, just saying gcc is not a good optimizer
> does not help anyone, do we have another better open source compiler?
> No. So if you have a better example of bad asm produced we can ask
> gcc developers.
Ill mail the next case i stumble across to you, but as i dont convert
asm to intrinsics or do-asm-while i probably wont stumble across one
soon.
Also you can just keep your eyes open, there tend to be various asm snippets
posted once every few weeks ...
reimar did just a few days ago post a ridiculous one which had alignments
at the wrong places, it wasnt the latest gcc though ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080221/70930e06/attachment.pgp>
More information about the ffmpeg-devel
mailing list