[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Michael Niedermayer michaelni
Fri Sep 24 14:56:39 CEST 2010


On Fri, Sep 24, 2010 at 07:17:06AM -0400, Ronald S. Bultje wrote:
> On Thu, Sep 23, 2010 at 11:18 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Thu, Sep 23, 2010 at 06:13:30PM -0400, Ronald S. Bultje wrote:
> >> $subj. This could likely be done in inline asm as well but I still
> >> can't write that.
> >
> > can i help you to learn it?
> 
> Yes.
> 
> - we need a way to clobber xmm registers. This turns out to be very
> difficult and I haven't really looked at it very seriously. Help get
> something like Reimar's patch committed without breaking any of the
> fate systems. You still are maintainer and there is a patch, there's
> no reason why this can't be finished.

subj of patch submitting mail?


> - write a smallish tutorial, e.g. "how to write a copy_pixels() in asm
> using mmx/sse" plus maybe a list of useful macros like TRANSPOSE (so I
> can grep for them and see how it changes register order). I didn't
> learn yasm, I looked around for people to teach me, and Jason did. And
> hey, it works, I sort of get this stuff now. Inline asm has some
> tricks that yasm does not have, like three colons at the end of each
> block and these modifiers that are there. I think I can sort of read
> them, write a few examples on how to use this properly, efficiently
> and how to not shoot yourself in the foot. Or a
> good-practices-with-inline-asm guide. Once I have something to start
> with, it shouldn't be too hard. After starting with yasm (this is 1-2
> months ago),

inline asm is very simple
you have asm() blocks, each asm block contains up to 4 parts
asm(
    1
    :2
    :3
    :4
)

1 is the actual code in form of a single long string like "
         "movd (%1), %%mm0              \n\t"
         "movd (%1, %3), %%mm1          \n\t"
         ...

The compiler will replace %* placeholders in this string and then just
litterally insert it into what it passes to the assembler. the %% will be
turned into %, %0 will be replaced by the first operand and so on.

2 is the list of output and input+output operands for example
: "+g"(h), "+r" (pixels),  "+r" (block)

3 is the list of input operands like:
: "r"((x86_reg)line_size)

4 is the list of things that are not preserved by the asm aka the clobber list
: "%"REG_a, "memory"

if the first 2 lists are empty you end with :::

the + means they are input and output, = instead would mean they are just
output and thus their value would be undefined when read in the asm
the latter means the kind of operand
r means general purpose register like eax, rbx or otherwise
m means memory like [eax] or [eax+2*ebx+123]
i means compile/assembly time constant

abcdSI are e/rax, e/rbx e/rcx e/rdx e/rsi e/rdi
y are mmx
x are sse registers these dont work on all compilers

these can also be used together like "rm" for register and memory are both ok
g means "rmi"

there are more but they are rarely if ever used see info gcc-4.X and Simple Constraints
and Constraints for Particular Machines
also take teh gcc docs with a grain of salt they arent complete and arent good

there are several modifiers but they are rarely usefull
& as in "=&r" is the only one important, it tells the compiler that this
output operand will be written before the code is done with its input operands
and that it thus cannot use the same location/register as another input operand
there are modifirs that allow you to specify preferred alternatives like prefer
some register over others, that allow you to specifiy commutative opperands
and various other rarely usefull things

about volatile and "memory" heres what the docs say:
    If your assembler instructions access memory in an unpredictable
    fashion, add `memory' to the list of clobbered registers.  This will
    cause GCC to not keep memory values cached in registers across the
    assembler instruction and not optimize stores or loads to that memory.
    You will also want to add the `volatile' keyword if the memory affected
    is not listed in the inputs or outputs of the `asm', as the `memory'
    clobber does not count as a side-effect of the `asm'.  If you know how
    large the accessed memory is, you can add it as input or output but if
    this is not known, you should add `memory'.  As an example, if you
    access ten bytes of a string, you can use a memory input like:

        {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.



heres a full example:

static void put_pixels4_mmx(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
    __asm__ volatile(
         "lea (%3, %3), %%"REG_a"       \n\t"
         ASMALIGN(3)
         "1:                            \n\t"
         "movd (%1), %%mm0              \n\t"
         "movd (%1, %3), %%mm1          \n\t"
         "movd %%mm0, (%2)              \n\t"
         "movd %%mm1, (%2, %3)          \n\t"
         "add %%"REG_a", %1             \n\t"
         "add %%"REG_a", %2             \n\t"
         "movd (%1), %%mm0              \n\t"
         "movd (%1, %3), %%mm1          \n\t"
         "movd %%mm0, (%2)              \n\t"
         "movd %%mm1, (%2, %3)          \n\t"
         "add %%"REG_a", %1             \n\t"
         "add %%"REG_a", %2             \n\t"
         "subl $4, %0                   \n\t"
         "jnz 1b                        \n\t"
         : "+g"(h), "+r" (pixels),  "+r" (block)
         : "r"((x86_reg)line_size)
         : "%"REG_a, "memory"
        );
}

the most important macros are
REG_a/b/c/d/S/D which become 64 or 32 bit register strings like "rax"
depending on x86_32/64
and
REGa/b/c/d/BP which become non string register names like rax

x86_reg is the native register size like int64_t or int32_t
ASMALIGN() allows code to be alliged with do nothing instructions
the b in jnz 1b means the closest matching "1" lablel in backward direction
f would mean forward.
gcc needs this as code can be inlined and thus there could be more 1: labels
in the final .s file before the assembler

if you want to access the matching 32bit register of a 64bit register operand
theres %k0 for %0 and similarly for %1, ...

please tell me if above is enough and if you have any questions


> I tried adding one extra argument to a function (I think
> SSE2 MC) to read one extra register using "r"(src+stride*3) or
> something like that, and _it just didn't work_. It wouldn't compile,
> giving weird compile errors about invalid constraints or something
> like that. That's extremely frustrating, especially when nobody on IRC
> understands the error (or how to write such code) either.

i need to see code and error to comment


> 
> If you get started with these two, I can send a patch that does the
> same as the original but without moving it away from inline asm.
> 
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
> 

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is not what we do, but why we do it that matters.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/2127a48a/attachment.pgp>



More information about the ffmpeg-devel mailing list