[FFmpeg-devel] [PATCH] some SIMD write-combining for h264

Alexander Strange astrange
Mon Jan 18 02:02:58 CET 2010


2010/1/17 M?ns Rullg?rd <mans at mansr.com>:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
>> On Sun, Jan 17, 2010 at 04:59:55AM -0500, Alexander Strange wrote:
>>>
>>> On Jan 16, 2010, at 9:59 PM, M?ns Rullg?rd wrote:
>>>
>>> > Alexander Strange <astrange at ithinksw.com> writes:
>>> >
>>> >> On Jan 16, 2010, at 12:35 AM, Michael Niedermayer wrote:
>>> >>
>>> >>> On Fri, Jan 15, 2010 at 11:11:23PM -0500, Alexander Strange wrote:
>>> >>>> This adds intreadwrite macros for 64/128-bit memory operations and uses them in h264.
>>> >>>>
>>> >>>> Unlike the other macros, these assume correct alignment, and the patch only defines the ones there was an immediate use for.
>>> >>>> This only has x86 versions, but others should be easy. The 64-bit operations can be done with double copies on most systems, I guess.
>>> >>>>
>>> >>>> Decoding a 30s file on Core 2 Merom with --cpu=core2 (minimum of 5 runs):
>>> >>>> x86-32: 12.72s before, 12.51s after (1.7%)
>>> >>>> x86-64: 10.24s before, 10.20s after (.4%)
>>> >>>>
>>> >>>> Tested on x86-32, x86-64, x86-32 with --arch=c.
>>> >>>
>>> >>> as your code uses MMX you need to at least mention EMMS/float issue in the
>>> >>> dox and probably a ?emms_c(); call before draw_horiz_band()
>>> >>> dunno if these are all
>>> >>
>>> >> Added in the comment.
>>> >>
>>> >>> also what sets __MMX__ ? we have our own defines for that
>>> >>
>>> >> It's a gcc builtin define, set based on ./configure --cpu=x adding
>>> >> -march. ?HAVE_MMX is for the build and not the host cpu family, and
>>> >> this is inlined asm, so it can't use it.
>>> >
>>> > Huh? ?Host... build???
>>>
>>> Oh, that was supposed to be "target"...
>>> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the given --cpu and not on the build system's capabilities.
>>
>> do all compilers set __MMX__ ?
>
> All that support gcc inline asm do AFAIK.
>
>>> +static inline void AV_COPY64(void *d, const void *s)
>>
>> are you sure this should not be always_inline?
>> 2 plain 32bit reads +writes likely beat pushing 2 pointers on the stack and
>> calling pulling 2 pointers off 2 mmx for copy and return
>
> Yes, all of those should be always_inline. ?Seems like I missed that...

I followed the style of ppc/intreadwrite.h exactly, which uses inline.
This reminds me that I've seen a compiler not inline NEG_USR32 in
bitstream.h once, so I'll use always_inline here and hope the rest are
converted sometime.



More information about the ffmpeg-devel mailing list