[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Mon Feb 10 18:00:22 CET 2014

On 11 February 2014 03:33, Michael Niedermayer <michaelni at gmx.at> wrote:

> On Mon, Feb 10, 2014 at 06:53:23PM +1100, Matt Oliver wrote:
> > On 10 February 2014 09:21, Reimar Döffinger <Reimar.Doeffinger at gmx.de
> >wrote:
> >
> > > On Thu, Feb 06, 2014 at 12:35:13PM +1100, Matt Oliver wrote:
> > >
> > > > On 6 February 2014 05:45, Reimar Döffinger <Reimar.Doeffinger at gmx.de
> >
> > > wrote:
> > > >
> > > > > On Wed, Feb 05, 2014 at 11:41:38AM +1100, Matt Oliver wrote:
> > > > > > Cheers for the feedback. Ill split up the patches to make things
> > > easier.
> > > > > To
> > > > > > start off I have split the libmpcodec patches into some smaller
> > > easily
> > > > > > digestible chunks.
> > > > > >
> > > > > > Patch 1: Changes a couple of inline asm labels for using named
> > > labels to
> > > > > > standard numbered labels. This has no affect but to allow for
> > > compilation
> > > > > > with compilers that dont support named labels.
> > > > >
> > > > > If you use numbered labels I believe you need to append the jump
> > > > > direction.
> > > > > I.e.
> > > > > jnz 1b
> > > > > instead of
> > > > > jnz 1
> > > > > if you mean the 1: label that comes before.
> > > > > At least that is what I see in e.g. libavcodec/x86/snowdsp.c and
> many
> > > > > more files.
> > > > >
> > > > > > > Well, having the DECLARE_ALIGNED changes separately would be
> nice
> > > > > since I
> > > > > > think that part can be applied without further review.
> > > > > >
> > > > > > Patch 2: Is just the declare aligned fixes. This uses the
> existing
> > > cross
> > > > > > platform specific declare aligned macro instead of the gcc only
> > > > > attribute.
> > > > > > Seperated as Reimar suggested.
> > > > >
> > > > > You seem to have lost the very first "static". Otherwise it looks
> ok to
> > > > > me.
> > > > >
> > > > > > Patch 3: Removes unnecessary commas from some inline asm.
> Leaving the
> > > > > > commas in with nothing following them causes icl compilation
> errors.
> > > > > > removing them has no impact on anything else as ive seen this
> used
> > > else
> > > > > > where in ffmpeg so this should have no impact.
> > > > >
> > > > > Looks good to me as well.
> > > > > If nobody has any other comments I will try to get it applied to
> > > MPlayer
> > > > > soon to try to keep the codebases in sync (or if I'm too slow
> > > > > anyone with commit access there is of course free to apply them).
> > > > > _______________________________________________
> > > > > ffmpeg-devel mailing list
> > > > > ffmpeg-devel at ffmpeg.org
> > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > > >
> > > >
> > > > > If you use numbered labels I believe you need to append the jump
> > > > > direction.
> > > >
> > > > You only have to do this if you have multiple labels with the same
> name.
> > > In
> > > > this case the forward/backward identifiers are so the assembler knows
> > > which
> > > > label you are referring to (closest in front or closest behind).
> Since
> > > the
> > > > labels in the patch are all unique then there is only one so we dont
> need
> > > > to specify which direction to find it.
> > >
> > > I don't trust that this is always true, and I see no reason to do
> things
> > > different than all other places I could find.
> > > So I changed that.
> > >
> > > > > You seem to have lost the very first "static". Otherwise it looks
> ok to
> > > > > me.
> > > >
> > > > That I did, fixed in updated patch attached. I changed to using the
> > > > predefined ASM_CONSTANT macro that auto includes static const.
> > >
> > > You changed one uint8_t to int8_t, I fixed that before applying.
> > > Otherwise all patches have been applied to MPlayer's libmpcodecs.
> > > I have _not_ synced the FFmpeg code, if someone can easily do that
> > > please go ahead.
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel at ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> >
> > Please provide the relevant (!!) part of config.log
> >
> >
> > Sorry Carl, I rescind my previous statement. Gcc works fine it was my
> MinGW
> > that was failing and after checking the logs a second time I realised it
> > was actually due to a random write permission error in creating the temp
> > file that has started cropping up since I moved to Win8.1. So it could be
> > changed if thats what people want. In not 100% about the change myself as
> > if Intel do update there compiler with direct symbol references then due
> to
> > the way they resolve references currently without the 'test' declaration
> it
> > would still fail. This is a future proof thing so Im happy to concede to
> > whatever the group consensus is.
> >
> >
> > > I'm not going to argue against this being an icl bug, but I'm
> surprised you
> > > can't come up with a concept that works. Doesn't something like "leal
> (%l2,
> > > %l3), %l3" work? Also, which one does it convert to 64bit?
> >
> >
> > Ive tried everything I can think of so Im certainly open to suggestions.
> > Ive tried every combination of constraints I could think off (although
> > Intel doesnt support all of them) and just could not get it to work. As
> to
> > which operand register is being set to either 32bit/64bit im not sure as
> > the error message doesnt specify so the only way would be to try and find
> > the line in the compiler intermediary output (which im understandably in
> no
> > hurry to attempt). All i can ascertain from the error message is one of
> > %2/%3 is 32bit and the other is 64bit. As to why the difference I have no
> > idea as they both have the same constraint and are both uint. I tried
> your
> > suggestion about prefixing operands with 'l' (although im not familiar
> with
> > this technique) a direct substitution resulted in the error:
> "Substitution
> > directive specifies non-existent operand in asm instruction". Although
> as I
> > said im not familiar with that technique (im guessing the prefix forces
> it
> > to use long sized registers?) i may be missing something. But nothing ive
> > tried works, even trying to change the leal to just lea or something
> > generates errors (illegal reg in address for those who are interested).
> So
> > I ran out of ideas. Basically everything else ive tried generates the
> > "Unsupported instruction form" error. Intel just does not like lea
> > instructions where the input are not explicit registers. Thats the only
> way
> > I know of to fix it is to move say %2 into an explicit register and then
> > use that but that would potentially negatively affect other buildchains.
> If
> > there is a fix I would love to know it.
> >
> > You changed one uint8_t to int8_t, I fixed that before applying.
> > > Otherwise all patches have been applied to MPlayer's libmpcodecs.
> > > I have _not_ synced the FFmpeg code, if someone can easily do that
> > > please go ahead.
> > >
> >
> > Must have missed that when i updated, Thanks Reimar for picking it up and
> > fixing it.
> >
> > Also some recent changes in master have added some new asm in the file
> > dca.h. Attached is a standalone patch that fixes that for intel
> > compatibility (I didnt add it to the previous patch so that for those who
> > have already looked over and ok'ed the current version dont have to
> recheck
> > them).
>
> >  dca.h |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > dfdf0508b391e5e0f7f68344b6798802289af7ee
>  7-6-Additional-icl-inline-asm-fix.patch
> > From 7929bc1ae54be0f79b4407b8916d5e42d69f4fcf Mon Sep 17 00:00:00 2001
> > From: Matt Oliver <protogonoi at gmail.com>
> > Date: Mon, 10 Feb 2014 18:51:48 +1100
> > Subject: [PATCH] [7/6] Additional icl inline asm fix.
> >
> > ---
> >  libavcodec/x86/dca.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/x86/dca.h b/libavcodec/x86/dca.h
> > index 6415129..cefac59 100644
> > --- a/libavcodec/x86/dca.h
> > +++ b/libavcodec/x86/dca.h
> > @@ -31,7 +31,7 @@ static inline void int8x8_fmul_int32(av_unused
> DCADSPContext *dsp,
> >  {
> >      DECLARE_ALIGNED(16, static const uint32_t, inverse16) = 0x3D800000;
> >      __asm__ volatile (
> > -        "cvtsi2ss        %2, %%xmm0 \n\t"
> > +        "cvtsi2ssl       %2, %%xmm0 \n\t"
> >          "mulss           %3, %%xmm0 \n\t"
> >          "movq          (%1), %%xmm1 \n\t"
> >          "punpcklbw   %%xmm1, %%xmm1 \n\t"
>
> In file included from ffmpeg/libavcodec/dcadec.c:1:
> In file included from ffmpeg/libavcodec/dcadec.c:53:
> ffmpeg/libavcodec/x86/dca.h:34:9: error: invalid instruction mnemonic
> 'cvtsi2ssl'
>         "cvtsi2ssl       %2, %%xmm0 \n\t"
>         ^
> <inline asm>:1:2: note: instantiated into assembly here
>         cvtsi2ssl       -44(%rbp), %xmm0
>         ^~~~~~~~~
>
> btw iam starting to think that the issue should be fixed on the
> icl side or some wraper script be written to do all these translations
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I do not agree with what you have to say, but I'll defend to the death your
> right to say it. -- Voltaire
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Your right in that it would be much easier if icl would just fix there
stuff (hell id do it if theyd let me) but from what I understand its a low
priority. However if cvtsi2ssl is causing errors then I can add it to the
existing patch that detects between cdq/cltd. Given that one already tries
to detect intel mnemonics then we can just add cvtsi2ss as an additional
one and treat it the same as cdq. Hopefully there wont be any others but if
there is we can at least add them to the mnemonic list (or find someone at
Intel to help fix it).