[FFmpeg-devel] libavcodec/lossless_videodsp : add add_bytes AVX2

Wed Oct 25 22:43:50 EEST 2017

2017-10-25 9:43 GMT+02:00 Paul B Mahol <onemda at gmail.com>:

> On 10/21/17, Martin Vignali <martin.vignali at gmail.com> wrote:
> > Hello,
> >
> > In attach patch to add AVX2 version for add_bytes
> >
> > 0001-libavcodec-lossless_videodsp-add-add_bytes-avx2-vers :
> > add AVX2 version
> >
> > pass fate-test for me (os 10.12, x86_64)
> >
> > checkasm result : (Kaby Lake) (run 10 times, and i took the fastest
> > version)
> > checkasm: all 2 tests passed
> > add_bytes_c: 108.7
> > add_bytes_sse2: 26.5
> > add_bytes_avx2: 15.5
> >
> >
> > 0002-libavcodec-lossless_video_dsp-cosmetic-add-better-se:
> > only cosmetic
> > like the ref c function declaration in asm file is not consistent between
> > each asm file
> > i think a better separator for each function make the file easier to read
> >
> > also add the c declaration for add bytes in comment
> >
> >
> > Martin
> >
>
> Are you sure 32bit alignment is actually enforced?
>
>
Hello,

I think, data used by add_bytes is always aligned
because dst and src, are start of a line of an AvFrame


More details :

the add_bytes func is used only by magicyuv and huffyuvdec

*in MagicYuv decoder*

uint8_t *b = p->data[0] + j * s->slice_height * p->linesize[0];
uint8_t *g = p->data[1] + j * s->slice_height * p->linesize[1];
uint8_t *r = p->data[2] + j * s->slice_height * p->linesize[2];
...
for (i = 0; i < height; i++) {
    s->llviddsp.add_bytes(b, g, width);
    s->llviddsp.add_bytes(r, g, width);
    b += p->linesize[0];
    g += p->linesize[1];
    r += p->linesize[2];


*In Huffyuv Decoder*
add_bytes is call directly
or with a call to static void add_bytes(HYuvContext *s, uint8_t *dst,
uint8_t *src, int w)

*First use : *
add_bytes(s, dst, dst - fake_stride, w);

==>
uint8_t *dst = p->data[plane] + p->linesize[plane]*y;
and fake_stride is a multiple of linesize of the frame


*Second use :*
s->llviddsp.add_bytes(ydst, ydst - fake_ystride, width);

==> same idea here ydst is the start of a line
fake_ystride is a multiple of linesize


*Third use : *
s->llviddsp.add_bytes(ydst, ydst - fake_ystride, width);
if (!(s->flags & AV_CODEC_FLAG_GRAY)) {
    s->llviddsp.add_bytes(udst, udst - fake_ustride, width2);
    s->llviddsp.add_bytes(vdst, vdst - fake_vstride, width2);

==> Same idea here


*Last use : *
s->llviddsp.add_bytes(p->data[0] + p->linesize[0] * y, p->data[0] +
p->linesize[0] * y + fake_ystride, 4 * width);


Martin