[FFmpeg-devel] [PATCH] x86/me_cmp: port mmxext and sse2 sad functions to yasm

Michael Niedermayer michaelni at gmx.at
Mon Sep 15 00:51:53 CEST 2014


On Sun, Sep 14, 2014 at 07:35:26PM -0300, James Almer wrote:
> On 14/09/14 7:12 PM, Michael Niedermayer wrote:
> > On Sat, Sep 13, 2014 at 10:12:12PM -0300, James Almer wrote:
> >> Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
> >> sad16_x2, sad16_y2 and sad16_xy2.
> >> Since the _xy2 versions are not bitexact, they are accordingly marked as
> >> approximate.
> >>
> >> Signed-off-by: James Almer <jamrial at gmail.com>
> >> ---
> > 
> >> Not benched.
> > 
> > if the author of some code doesnt benchmark his code, how can he know
> > which way it is faster ?
> > what effect each difference has ? ...
> 
> I didn't bench because i didn't have the time and assumed it wasn't necessary 
> considering this is a port from inline to yasm with little to no changes to 
> the asm.
> I'll try to do some quick benchmarks later.

[...]

> > 
> > 
> >> +%if mmsize == 16
> >> +    movhlps   m0, m2
> >> +    paddw     m2, m0
> >> +%endif
> >> +    movd     eax, m2
> >> +    RET
> >> +%endmacro
> >> +
> >> +INIT_MMX mmxext
> >> +SAD 8
> >> +SAD 16
> >> +INIT_XMM sse2
> >> +SAD 16
> >> +
> >> +;------------------------------------------------------------------------------------------
> >> +;int ff_sad_x2_<opt>(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, int stride, int h);
> >> +;------------------------------------------------------------------------------------------
> >> +%macro SAD_X2 1
> >> +cglobal sad%1_x2, 5, 5, 5, v, pix1, pix2, stride, h
> >> +%if %1 == mmsize
> >> +    shr       hd, 1
> >> +%define STRIDE strideq
> >> +%else
> >> +%define STRIDE 8
> >> +%endif
> >> +    pxor      m0, m0
> >> +
> > 
> >> +align 16
> > 
> > do these improve or reduce the speed ?
> 
> No idea. I copied them from the inline version (where they were ".p2align 4") 
> to keep the resulting asm as similar as possible.

ahh ok, ive not realized that
if its just the same as before then its ok

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140915/d34363d1/attachment.asc>


More information about the ffmpeg-devel mailing list