[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?

Michael Niedermayer michaelni
Sun Jun 1 22:41:46 CEST 2008


On Sun, Jun 01, 2008 at 12:46:02PM -0600, Loren Merritt wrote:
[...]
> 
> > Iam really a little curious if cleanly written yasm code is so much supperior
> > over cleanly written gcc inline asm code. I certainly are no fan of gcc or
> > its asm, its mainly the extra dependancy and the loss of support for many
> > platforms that annoys me most on this ...
> 
> Which platforms?

OS2 and beos where mentioned in this thread, OS2 seems to have NASM but i do
not know if NASM supports all the syntax you want to use?


> 
> > TRANSPOSE8 is used at 2 spots ...
> >
> >        TRANSPOSE8(%%xmm4, %%xmm1, %%xmm7, %%xmm3, %%xmm5, %%xmm0, %%xmm2, %%xmm6, (%1))
> >        "paddw          %4, %%xmm4 \n"
> >        "movdqa     %%xmm4, 0x00(%1) \n"
> >        "movdqa     %%xmm2, 0x40(%1) \n"
> >        H264_IDCT8_1D_SSE2(%%xmm4, %%xmm0, %%xmm6, %%xmm3, %%xmm2, %%xmm5, %%xmm7, %%xmm1)
> >        "movdqa     %%xmm6, 0x60(%1) \n"
> >        "movdqa     %%xmm7, 0x70(%1) \n"
> >
> > These movdqa are not needed on x86-64 and i suspect that by not using "common"
> > code their number can be reduced on x86-32, more precissely the second looks
> > like it could be merged with something from TRANSPOSE8.
> 
> Agreed. In x264 I have separate x86_32 and x86_64 version of 8x8 dct. 
> But in lavc I just wanted to do as little gcc-asm writing as possible, so 
> I stopped after writing the minimal x86_32 version which can be 
> compiled on x86_64 but doesn't make much use of the extra registers.
> 

> > Also the question of readability has been ignored entirely, is all the
> > preprocesor magic be it yasm or c really that good?
> > You use alot of preprocessor tricks in your gcc-asm, i just thought it
> > might be more flexibl and readable with a little less.
> > After all the code would be the same after the preprocessor anyway.
> 
> What is your alternative? Write code using preprocessor tricks but then
> manually expand them before committing? Anything that reduces code
> duplication is a win in terms of ease of writing (no matter how much
> magic is involved), but I can understand optimizing for reading at the
> expense of writing if you're reasonably sure that the function will
> never change again.

I did not mean to manually expand code, no certainly not.
I rather guessed that you did not write the code with all the macros
in place but rather added them later to factorize common code ...


> 
> > And last ultra finetuned common 64-32 code has another problem. That is
> > when one wants to change/optimize the code but she has not both a 32 and
> > 64 bit cpu. It could easily lead to a speedloss or considerable more
> > work waiting for others to do the benchmarking.
> 
> Essentially all asm I've written in the past 3 years was optimized for 64 
> and for 64-in-32bit-mode, not for any 32bit cpu, so I guess that doesn't 
> count as ultra finetuned. If you optimize for a specific old cpu and have 
> reason to believe your change hurts new cpus, then that's another split, 
> not just 32-64. If you don't have specific reason but just don't have any 
> 64bit cpus to test on, then you not only have code duplication but 
> non-identical duplication without even being sure that the differences are 
> useful.

> If every difference between two near-duplicate functions is documented as 
> to which cpus it's been tested on and the results thereof (what's the 
> chance of that?), then my argument on this point is reduced.

Actually maybe we should document for every function which has been optimized
on what cpus it has been optimized. If we had done that in the past, someone
now could grep for code optimized for really old cpus and check if its still
optimial for todays ...
Also new optimizations could then more easily be tested on the cpus which
served as basis for the previous optimizations, to ensure that they did not
worsen to code for them.


> 
> > So in the end IMHO maybe less preprocessor based asm code factorization
> > would be a better solution than yasm, just my 2cents, iam not opposing yasm
> > if people really want it ...
> 
> Better? It's a solution to a different problem. I'm asking for yasm so I 
> can do more preprocessor stuff.
> Well, syntax is another reason. I'd prefer
>    pshufw mm0, [eax+ecx*4+16], 0
> over
>    "pshufw $0, 16(%%eax,%%ecx,4), %%mm0 \n\t"\
> even if that were the only difference.

trust me, i hate gcc asm syntax as well, i never understood why they did not
use intels syntax.

And as already said, its diego/mans you should convince not me, iam not
opposing yasm/nasm if that means more or better optimizations for ffmpeg

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I hate to see young programmers poisoned by the kind of thinking
Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080601/ea4b1466/attachment.pgp>



More information about the ffmpeg-devel mailing list