[FFmpeg-devel] [PATCH 1/5] avutil: add pixelutils API
Clément Bœsch
u at pkh.me
Tue Aug 5 21:13:24 CEST 2014
On Sun, Aug 03, 2014 at 12:36:19AM +0200, Michael Niedermayer wrote:
> On Sat, Aug 02, 2014 at 11:34:07PM +0200, Clément Bœsch wrote:
[...]
> > +#ifdef TEST
> > +#define W1 320
> > +#define H1 240
> > +#define W2 640
> > +#define H2 480
> > +int main(void)
> > +{
> > + int i, a, ret = 0;
> > + DECLARE_ALIGNED(32, uint32_t, buf1)[W1*H1];
> > + DECLARE_ALIGNED(32, uint32_t, buf2)[W2*H2];
> > + uint32_t state = 0;
> > +
> > + for (i = 0; i < W1*H1; i++) {
> > + buf1[i] = state;
> > + state = state * 1664525 + 1013904223;
> > + }
> > +
> > + for (i = 0; i < W2*H2; i++) {
> > + buf2[i] = state;
> > + state = state * 1664525 + 1013904223;
> > + }
>
> the code should in addition be tested with maximal and minimal
> difference cases
>
Tests added.
>
> [...]
> > +;-------------------------------------------------------------------------------
> > +; int ff_pixelutils_sad_[au]_16x16_sse(const uint8_t *src1, ptrdiff_t stride1,
> > +; const uint8_t *src2, ptrdiff_t stride2);
> > +;-------------------------------------------------------------------------------
> > +%macro SAD_XMM_16x16 1
> > +INIT_XMM sse2
> > +cglobal pixelutils_sad_%1_16x16, 4,4,3, src1, stride1, src2, stride2
> > + pxor m2, m2
> > +%rep 8
> > + mov%1 m0, [src2q]
> > + mov%1 m1, [src2q + stride2q]
> > + psadbw m0, [src1q]
> > + psadbw m1, [src1q + stride1q]
> > + paddw m2, m0
> > + paddw m2, m1
> > + lea src1q, [src1q + 2*stride1q]
> > + lea src2q, [src2q + 2*stride2q]
> > +%endrep
> > + movhlps m0, m2
> > + paddw m2, m0
> > + movd eax, m2
> > + RET
> > +%endmacro
>
> there are various improvments possible, though these should be in
> a seperate patch and not in gcc->yasm but
> the pxor can be avoided by lifting the first iteration out and
> using m2 as destination
>
> it might be faster to use 2 accumulator registers as that way both
> could execute with no dependancies on the other
>
> as you unroll the loop, addressing can be done with fewer instructions
>
I left the ASM as is since it was kind of simple and parallel to the API
itself; we can iterate from here with benchmarks
> LGTM otherwise
>
Patchset applied, thanks
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140805/9ed4e216/attachment.asc>
More information about the ffmpeg-devel
mailing list