[FFmpeg-devel] [PATCH 02/15] vp9/x86: make cglobal statement more conservative in register allocation.
Ronald S. Bultje
rsbultje at gmail.com
Sat Dec 27 20:44:18 CET 2014
Hi,
On Sat, Dec 27, 2014 at 11:31 AM, Clément Bœsch <u at pkh.me> wrote:
> On Sat, Dec 27, 2014 at 11:02:37AM -0500, Ronald S. Bultje wrote:
> > ---
> > libavcodec/x86/vp9lpf.asm | 21 ++++++++++++++++-----
> > 1 file changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/libavcodec/x86/vp9lpf.asm b/libavcodec/x86/vp9lpf.asm
> > index e0f7386..c62ac46 100644
> > --- a/libavcodec/x86/vp9lpf.asm
> > +++ b/libavcodec/x86/vp9lpf.asm
> > @@ -307,7 +307,20 @@ SECTION .text
> > %endif
> > %endmacro
> >
> > -%macro LOOPFILTER 2 ; %1=v/h %2=size1
> > +%macro LOOPFILTER 3 ; %1=v/h %2=size1 %3=stack
> > +%if UNIX64
> > +cglobal vp9_loop_filter_%1_%2_16, 5, 9, 16, %3, dst, stride, E, I, H,
> mstride, dst2, stride3, mstride3
> > +%else
> > +%if WIN64
> > +cglobal vp9_loop_filter_%1_%2_16, 4, 8, 16, %3, dst, stride, E, I,
> mstride, dst2, stride3, mstride3
> > +%else
>
> > +cglobal vp9_loop_filter_%1_%2_16, 2, 6, 16, %3, dst, stride, mstride,
> dst2, stride3, mstride3
> > +%define Ed dword r2m
> > +%define Id dword r3m
> > +%endif
> > +%define Hd dword r4m
>
> So every 32-bit arch end up here, right?
>
Well, rather, both win64 and x86-32. Unix64 preloads 6 registers to Hd is
in a register upon function entry already, win64 has 4, so Hd is in stack;
x86-32 has stack-only for argument-passing, so everything is in stack; we
load dst/stride and keep the rest where it is to preserve registers.
> > +%endif
> > +
> > mov mstrideq, strideq
> > neg mstrideq
> >
> > @@ -795,10 +808,8 @@ SECTION .text
> >
> > %macro LPF_16_VH 2
> > INIT_XMM %2
> > -cglobal vp9_loop_filter_v_%1_16, 5,10,16, dst, stride, E, I, H,
> mstride, dst2, stride3, mstride3
> > - LOOPFILTER v, %1
> > -cglobal vp9_loop_filter_h_%1_16, 5,10,16, 256, dst, stride, E, I, H,
> mstride, dst2, stride3, mstride3
> > - LOOPFILTER h, %1
> > +LOOPFILTER v, %1, 0
> > +LOOPFILTER h, %1, 256
>
> Should be OK assuming 0 is indeed the default stack size (x86inc seems to
> suggest it to be set to 16 or 32 somehow).
That's alignment if there's stack usage at all. 0 means no stack usage in
the function at all, so we skip allocation, and none of the internal logic
applies.
Ronald
More information about the ffmpeg-devel
mailing list