[FFmpeg-devel] [PATCH] some SIMD write-combining for h264
Måns Rullgård
mans
Sun Jan 17 22:44:01 CET 2010
Michael Niedermayer <michaelni at gmx.at> writes:
> On Sun, Jan 17, 2010 at 04:59:55AM -0500, Alexander Strange wrote:
>>
>> On Jan 16, 2010, at 9:59 PM, M?ns Rullg?rd wrote:
>>
>> > Alexander Strange <astrange at ithinksw.com> writes:
>> >
>> >> On Jan 16, 2010, at 12:35 AM, Michael Niedermayer wrote:
>> >>
>> >>> On Fri, Jan 15, 2010 at 11:11:23PM -0500, Alexander Strange wrote:
>> >>>> This adds intreadwrite macros for 64/128-bit memory operations and uses them in h264.
>> >>>>
>> >>>> Unlike the other macros, these assume correct alignment, and the patch only defines the ones there was an immediate use for.
>> >>>> This only has x86 versions, but others should be easy. The 64-bit operations can be done with double copies on most systems, I guess.
>> >>>>
>> >>>> Decoding a 30s file on Core 2 Merom with --cpu=core2 (minimum of 5 runs):
>> >>>> x86-32: 12.72s before, 12.51s after (1.7%)
>> >>>> x86-64: 10.24s before, 10.20s after (.4%)
>> >>>>
>> >>>> Tested on x86-32, x86-64, x86-32 with --arch=c.
>> >>>
>> >>> as your code uses MMX you need to at least mention EMMS/float issue in the
>> >>> dox and probably a emms_c(); call before draw_horiz_band()
>> >>> dunno if these are all
>> >>
>> >> Added in the comment.
>> >>
>> >>> also what sets __MMX__ ? we have our own defines for that
>> >>
>> >> It's a gcc builtin define, set based on ./configure --cpu=x adding
>> >> -march. HAVE_MMX is for the build and not the host cpu family, and
>> >> this is inlined asm, so it can't use it.
>> >
>> > Huh? Host... build???
>>
>> Oh, that was supposed to be "target"...
>> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the given --cpu and not on the build system's capabilities.
>
> do all compilers set __MMX__ ?
All that support gcc inline asm do AFAIK.
>> +static inline void AV_COPY64(void *d, const void *s)
>
> are you sure this should not be always_inline?
> 2 plain 32bit reads +writes likely beat pushing 2 pointers on the stack and
> calling pulling 2 pointers off 2 mmx for copy and return
Yes, all of those should be always_inline. Seems like I missed that...
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list