[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?
Loren Merritt
lorenm
Sat May 31 13:58:00 CEST 2008
On Fri, 30 May 2008, Michel Lespinasse wrote:
> On Fri, May 30, 2008 at 02:26:21PM -0600, Jason Garrett-Glaser wrote:
>> Main benefit of yasm:
>>
>> * vastly more powerful macro system that makes it far easier to
>> generalize a small function to dozens of specific cases
>>
>> This allows us to do the following:
>>
>> * abstraction between MMX and SSE code; write a single function that does both
>> * automatic handling of macros that permute their arguments; see
>> x264's DCT functions
>> * automatic handling of 32-bit vs 64-bit abstraction
>
> I think the above can also be achived using mmx.h and the C preprocessor.
> At least that's what I used in libmpeg2's IDCT code.
Which point are you responding to?
Abstraction between MMX and SSE can be done in gcc, but it's more complex.
There are several things that need to be defined (reg prefix, reg size,
movdqu, movdqa, movq) and gcc doesn't support defines inside macros and
gcc warns about redefines, so that's a bunch of lines every time you
switch. Plus extra ugly quotes all over since defines only apply in C
context, not in asm strings. Or instead of global defines, you can add all
of those parameters to every macro, which is fewer LOC but hardly cleaner.
Macros that permute their arguments are just impossible in gcc. It
requires xdefine. So the only alternative is to not permute the arguments
and keep track of all permutations by hand.
In case anyone is unclear about what I mean by permute, this is the
difference between
QPEL_H264V(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5, OP)\
QPEL_H264V(%%mm1, %%mm2, %%mm3, %%mm4, %%mm5, %%mm0, OP)\
QPEL_H264V(%%mm2, %%mm3, %%mm4, %%mm5, %%mm0, %%mm1, OP)\
QPEL_H264V(%%mm3, %%mm4, %%mm5, %%mm0, %%mm1, %%mm2, OP)\
QPEL_H264V(%%mm4, %%mm5, %%mm0, %%mm1, %%mm2, %%mm3, OP)\
QPEL_H264V(%%mm5, %%mm0, %%mm1, %%mm2, %%mm3, %%mm4, OP)\
QPEL_H264V(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5, OP)\
QPEL_H264V(%%mm1, %%mm2, %%mm3, %%mm4, %%mm5, %%mm0, OP)\
QPEL_H264V(%%mm2, %%mm3, %%mm4, %%mm5, %%mm0, %%mm1, OP)\
QPEL_H264V(%%mm3, %%mm4, %%mm5, %%mm0, %%mm1, %%mm2, OP)\
QPEL_H264V(%%mm4, %%mm5, %%mm0, %%mm1, %%mm2, %%mm3, OP)\
QPEL_H264V(%%mm5, %%mm0, %%mm1, %%mm2, %%mm3, %%mm4, OP)\
QPEL_H264V(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5, OP)\
QPEL_H264V(%%mm1, %%mm2, %%mm3, %%mm4, %%mm5, %%mm0, OP)\
QPEL_H264V(%%mm2, %%mm3, %%mm4, %%mm5, %%mm0, %%mm1, OP)\
QPEL_H264V(%%mm3, %%mm4, %%mm5, %%mm0, %%mm1, %%mm2, OP)\
and
%rep 16
QPEL_H264V m0, m1, m2, m3, m4, m5, OP
SWAP 0, 1, 2, 3, 4, 5
%endrep
32-bit vs 64-bit is of course done automatically by gcc. It was included
just to show that yasm can implement the stuff gcc does automatically,
while the reverse is not true.
--Loren Merritt
More information about the ffmpeg-devel
mailing list