[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Rich Felker dalias
Thu Oct 4 04:51:12 CEST 2007


On Wed, Oct 03, 2007 at 09:47:01PM +0200, Michael Niedermayer wrote:
> Hi
> 
> On Wed, Oct 03, 2007 at 08:16:39PM +0200, Reimar D?ffinger wrote:
> > Hello,
> > On Tue, Oct 02, 2007 at 11:19:42PM +0200, Michael Niedermayer wrote:
> > [...]
> > > > +        ASMALIGN(3)
> > > > +        "1:                                \n\t"
> > > 
> > > how much speed is gained by the align?
> > 
> > Inconclusive in my tests on AMD64 on a 64 bit OS:
> > without:
> > 3012 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048397 runs, 179 skips
> > 1249 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048505 runs, 71 skips
> > 
> > 3011 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048397 runs, 179 skips
> > 1232 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048517 runs, 59 skips
> > 
> > 3011 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048514 runs, 62 skips
> > 1232 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048548 runs, 28 skips
> > 
> > with:
> > 3038 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048340 runs, 236 skips
> > 1259 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048487 runs, 89 skips
> > 
> > 3027 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048415 runs, 161 skips
> > 1259 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048515 runs, 61 skips
> > 
> > 3030 dezicycles in vc1_put_ver_16b_shift2_mmx, 1048384 runs, 192 skips
> > 1258 dezicycles in vc1_put_hor_16b_shift2_mmx, 1048516 runs, 60 skips
> 
> i wouldnt call that "Inconclusive" but rather slower, and its what i
> expected as thats how all code aligns i remember on x86 behaved
> maybe we should try to missalign all branch targets :)

Try compiling the whole library with these CFLAGS and compare. I bet
it'll be faster!

-malign-functions=1 -malign-jumps=1 -malign-loops=1

Yay for gcc stupidity.

Rich




More information about the ffmpeg-devel mailing list