[FFmpeg-devel] [PATCH 2/2] swscale/input: Avoid calls to av_pix_fmt_desc_get()

Michael Niedermayer michael at niedermayer.cc
Thu Sep 8 23:25:28 EEST 2022


On Thu, Sep 08, 2022 at 09:38:51PM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > Hi
> > 
> > On Thu, Sep 08, 2022 at 04:38:11AM +0200, Andreas Rheinhardt wrote:
> >> Up until now, libswscale/input.c used a macro to read
> >> an input pixel which involved a call to av_pix_fmt_desc_get()
> >> to find out whether the input pixel format is BE or LE
> >> despite this being known at compile-time (there are templates
> >> per pixfmt). Even worse, these calls are made in a loop,
> >> so that e.g. there are six calls to av_pix_fmt_desc_get()
> >> for every pair of UV pixel processed in
> >> rgb64ToUV_half_c_template().
> >>
> >> This commit modifies these macros to ensure that isBE()
> >> is evaluated at compile-time. This saved 9743B of .text
> >> for me (GCC 11.2, -O3).
> > 
> > hmm, all these functions where supposed to be optimized out
> > why where they not ?
> > 
> > iam asking as the code is simpler before your patch if that
> > "optimization out" thing would work
> > 
> 
> Why should these functions be optimized out? What would enable the
> compiler to optimize them out?

Going back into the past, there was
6b0768e2021b90215a2ab55ed427bce91d148148

before this the code certainly did get optimized out, it was just
#define isBE(x) ((x)&1)

thats simple and clean code btw
after this it became

#define isBE(x) \
+    (av_pix_fmt_descriptors[x].flags & PIX_FMT_BE)

thats still really good, and very readable, its a const array so
one would assume that a compiler can figure that out at compile time
well, i try not to think of linking and seperate objects here ;)

next it got then replaced by a function and a call that i suspect
people thought would be inlined


> (And I really don't see why this patch would make the code more
> complicated.)

the code historically was capable to lookup any flag and detail
of a pixel format at compile time
now your code works around that not working. Introducing a 2nd
system to do this in parallel. To me if i look at the evolution
of isBE() / code checking BE-ness it become more messy over time

I think it would be interresting to think about if we can make
av_pix_fmt_desc_get(compile time constant) work at compile time.
or if we maybe can return to a simpler implementation


thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220908/e10877e5/attachment.sig>


More information about the ffmpeg-devel mailing list