[FFmpeg-devel] [PATCH] x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()
Clément Bœsch
u at pkh.me
Tue Jan 28 13:24:34 CET 2014
On Tue, Jan 28, 2014 at 12:05:41PM +0100, Christophe Gisquet wrote:
> Hi,
>
> 2014-01-28 James Almer <jamrial at gmail.com>:
> > +%if cpuflag(ssse3)
> > mova m0, [mask_mix]
> > +%endif
> > movd m2, Id
> > movd m3, Ed
> > - pshufb m2, m0
> > - pshufb m3, m0
> > + SPLATB_MASK m2, m0
> > + SPLATB_MASK m3, m0
>
> Is there any gain in loading mask_mix into m0, in particular considering that:
>
The register was available, and iirc splat macros need the value in a
register.
> > %endif
> > mova m0, [pb_80]
> > pxor m2, m0
> > @@ -456,7 +469,7 @@ SECTION .text
> > SPLATB_REG m7, H, m0 ; H H H H ...
> > %else
> > movd m7, Hd
> > - pshufb m7, [mask_mix]
> > + SPLATB_MASK m7, [mask_mix]
> > %endif
>
> It is not loaded here?
I couldn't keep the register available until then.
>
> I'm asking because I have noticed it sometimes (not in vp9 scope) does
> not matter, or is even 1 cycle faster.
In that particular case we need to use it twice, so we just avoid another
read. I admit I didn't bench, but that's probably not relevant.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140128/674edd25/attachment.asc>
More information about the ffmpeg-devel
mailing list