[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Loren Merritt lorenm
Fri Sep 24 22:50:38 CEST 2010


On Fri, 24 Sep 2010, Ronald S. Bultje wrote:

> Hi,
>
> On Fri, Sep 24, 2010 at 3:30 PM, Alexander Strange
> <astrange at ithinksw.com> wrote:
>> "g" permits registers, so it could generate something like this:
>> ? por %rax(%rcx), %mm1
>>
>> You have to use "m" or MANGLE() in this situation.
>> Also, generating _d_idx(%rax) won't work on BROKEN_RELOCATIONS (x86-64 darwin) because all global references must be _d_idx(%rip).
>>
>> If you have a recent clang try compiling with that, it has a built-in assembler which may have clearer errors.
>> ?and which doesn't build ffmpeg because it doesn't know about 3dnow.
>
> No, no, I don't want registers, I want it to load as a constant, i.e.
> a static number. The register (later on) would be b_idx, so that I
> don't need separate regs for each variable, plus var+b_idx, plus
> var+b_idx+d_idx.
>
> @@ -152,7 +152,7 @@
>                 "movd (%0), %%mm1 \n"
>                 "por  (%0,%1), %%mm1 \n" // nnz[b] || nnz[bn]
>                 ::"r"(nnz+b_idx),
> -                  "r"(d_idx)
> +                  "g"(d_idx)
>             );
>             __asm__ volatile(
>                 "pminub    %%mm7, %%mm1 \n"
>
> Also has fun effects:
>
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:341:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:341:expecting
> scale factor of 1, 2, 4, or 8: got `$-8)'
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:430:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:430:expecting
> scale factor of 1, 2, 4, or 8: got `$-1)'
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:466:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:466:expecting
> scale factor of 1, 2, 4, or 8: got `$-1)'
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:502:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:502:expecting
> scale factor of 1, 2, 4, or 8: got `$-1)'
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:622:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:622:expecting
> scale factor of 1, 2, 4, or 8: got `$-8)'
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:712:Bad
> Absolute Expression, absolute 0 assumed.
> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//ccREEh7B.s:712:expecting
> scale factor of 1, 2, 4, or 8: got `$-1)'
>
> ??
>
> Apparently it understands I don't want to take a register for d_idx,
> but it doesn't understand how to make it a number. Yet. Help?

Whereas yasm just adds together everything inside a set of [] to get a 
memory arg, gas has special syntax for each component. There is no inline 
asm input constraint that would give you something that can be either a 
register or a constant and that can be dereferenced to get a memory arg. I 
don't know even how to get something that's only a constant and usable in 
that context ("i" adds a $ which is required in immediate args but 
forbidden in addresses).
The only way to do it is to use "m" and put the entire address in the 
input constraint.

         "movd %0, %%mm1 \n"
         "por  %1, %%mm1 \n"
         ::"m"(nnz[b_idx]),
           "m"(nnz[b_idx+d_idx])
     );

This isn't strictly correct, in that it doesn't tell gcc that you're 
also reading from other adjacent elements of nnz[]. This is rarely a 
problem in asm that isn't inlined (because gcc isn't clever enough to 
track non-reference through function pointers), but is really annoying to 
track down when it does happen. If you did need to fix that, the only ways 
I know of are:

         "movd %0, %%mm1 \n"
         "por  %1, %%mm1 \n"
         ::"m"(*(struct {uint8_t x[4];}*)&nnz[b_idx]),
           "m"(*(struct {uint8_t x[4];}*)&nnz[b_idx+d_idx])
     );

which is even uglier than usual, or

         "movd %0, %%mm1 \n"
         "por  %1, %%mm1 \n"
         ::"m"(nnz[b_idx]),
           "m"(nnz[b_idx+d_idx])
         :"memory"
     );

which is overbroad and thus invites pessimization.

--Loren Merritt



More information about the ffmpeg-devel mailing list