[Ffmpeg-devel] [PATCH] h264 - loopify some get_cabac calls
Sun Mar 25 22:09:56 CEST 2007
On 3/25/07, Alexander Strange <astrange at ithinksw.com> wrote:
> On Mar 24, 2007, at 7:46 PM, Guillaume Poirier wrote:
> >> There's some more AltiVec code here we'll probably send soon:
> >> http://trac.perian.org/ticket/113
> > I had a quick look at http://trac.perian.org/attachment/ticket/113/
> > altivec_lum.3.diff
> > Even though I imagine this patch isn't yet ready to be submitted,
> > I'd like to ask if the in your opinion, transpose routines can make
> > do without accessing memory (do it all in registers).
> They actually do, that patch is just messy enough to hide it.
> The functions transpose4x4 and readVector aren't ever called.
> transpose4/6x16 only do memory operations because the initial loads
> and stores are integrated into them.
Ok, I hadn't looked carefully.
> I think the stuff in transpose6x16 can be cleaned up; it should be
> able to use vec_ste instead of copying the result array.
> But this is my first time studying it too; I didn't write it.
Ok. Who may that be? I attached a patch that uses these altivec
routines on x264. I looks like they don't produce bit-identical
results as the C version, but maybe it's just because I haven't
modifed what needed to to make them work on x264 environment.
> > Also more cycles could be saved if you take advantage of some known
> > alignments (8-bytes aligned load/store can be made faster than a
> > generic unaligned memory access)....
> Hm, doesn't Altivec use the same unaligned load method for both?
> (load x and 15+x, merge them)
Well, if you know the alignement in advance, you don't need to compute
the permute vector, that's all. I doesn't same all that much, just a
Rich, you're forgetting one thing here: *everybody* except you is
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 14448 bytes
Desc: not available
More information about the ffmpeg-devel