[FFmpeg-devel] [PATCH v6 1/4] doc: Explain what "context" means

Stefano Sabatini stefasab at gmail.com
Sat Jun 15 12:17:54 EEST 2024


On date Thursday 2024-06-13 15:20:38 +0100, Andrew Sayers wrote:
> On Wed, Jun 12, 2024 at 10:52:00PM +0200, Stefano Sabatini wrote:
[...]
> > > + at section Context_general “Context” as a general concept
> [...]
> > I'd skip all this part, as we assume the reader is already familiar
> > with C language and with data encapsulation through struct, if he is
> > not this is not the right place where to teach about C language
> > fundamentals.
> 
> I disagree, for a reason I've been looking for an excuse to mention :)
> 

> Let's assume 90% of people who use FFmpeg already know something in the doc.
> You could say that part of the doc is useless to 90% of the audience.
> Or you could say that 90% of FFmpeg users are not our audience.
> 
> Looking at it the second way means you need to spend more time on "routing" -
> linking to the document in ways that (only) attract your target audience,
> making a table of contents with headings that aid skip-readers, etc.
> But once you've routed people around the bits they don't care about,
> it's fine to have documentation that's only needed by a minority.
> 

> Also, less interesting but equally important - context is not a C language
> fundamental, it's more like an emergent property of large C projects.  A
> developer that came here without knowing e.g. what a struct is could read
> any of the online tutorials that explain the concept better than we could.
> I'd be happy to link to a good tutorial about contexts if we found one,
> but we have to meet people where they are, and this is the best solution
> I've been able to find.

The context is just another way to call a struct used to keep an
entity state operated by several functions (that is in other words an
object and its methods), it's mostly about the specific jargon used by
FFmpeg (and used by other C projects as well). In addition to this we
provide some generic utilities (logging+avoptions) which can be used
through AVClass employment.

Giving a longer explanation is making this appear something more
complicated than actually is. My point is that providing more
information than actually needed provides the long-wall-of-text effect
(I need to read through all this to understand it - nah I'd rather
give-up), thus discouraging readers.

> 
> > 
> > > +
> > > +When reading code that *is* explicitly described in terms of contexts,
> > > +remember that the term's meaning is guaranteed by *the project's community*,
> > > +not *the language it's written in*.  That means guarantees may be more flexible
> > > +and change more over time.  For example, programming languages that use
> > > +[encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
> > > +will simply refuse to compile code that violates its rules about access,
> > > +while communities can put up with special cases if they improve code quality.
> > > +
> > 
> > This looks a bit vague so I'd rather drop this.

I mean, if you read for the first time:
| [the context] term's meaning is guaranteed by *the project's
| community*, not the languaguage it's written for.
| That means guarantees may be more flexible and change more over time.

it's very hard to figure out what these guarantees are about, and this
might apply to every specific language and to every specific term,
that's why I consider this "vague".
 
[...]
> > > +Some functions fit awkwardly within FFmpeg's context idiom, so they send mixed
> > > +signals.  For example, av_ambient_viewing_environment_create_side_data() creates
> > > +an AVAmbientViewingEnvironment context, then adds it to the side-data of an
> > > +AVFrame context.  So its name hints at one context, its parameter hints at
> > > +another, and its documentation is silent on the issue.  You might prefer to
> > > +think of such functions as not having a context, or as “receiving” one context
> > > +and “producing” another.
> > 
> > I'd skip this paragraph. In fact, I think that API makes perfect
> > sense, OOP languages adopt such constructs all the time, for example
> > this could be a static module/class constructor. In other words, we
> > are not telling anywhere that all functions should take a "context" as
> > its first argument, and the documentation specify exactly how this
> > works, if you feel this is not clear or silent probably this is a sign
> > that that function documentation should be extended.
> 

> That would be fine if it were just this function, but FFmpeg is littered
> with special cases that don't quite fit.

I still fail to see the general rule for which this is creating a
special case. If this is a special case, what is this special case
for?

> Another example might be swr_alloc_set_opts2(), which can take an
> SwrContext in a way that resembles a context, or can take NULL and
> allocate a new SwrContext.  And yes, we could document that edge
> case, and the next one, and the one after that. But even if we
> documented every little special case that existed today, there's no
> rule, so new bits of API will just reintroduce the problem again.

In this specific case I'd say the API is not much ergonomic, and
probably it would have been better to keep operations (allow+set_opts)
separated, but then it is much a choice of the user (you can decide to
keep alloc and set_opts as different operations). On the other hand,
*it is already documented* so there is not much to add.

> There's a deeper issue here - as an expert, when you don't know something,
> your default assumption is that it's undefined, and could evolve in future.
> When a newbie doesn't know something, their default assumption is that
> everybody else knows and they're just stupid.  That assumption drives
> newbies away from projects, so it's important to fill in as many blanks as
> possible, even if it has to be with simple answers that they eventually
> evolve beyond (and feel smart for doing so).
> 
> > > + at subsection Context_lifetime Manage lifetime: creation, use, destruction
> [...]
> > About this I have mixed feelings, but to me it sounds like a-posteriori
> > rationalization.
> > 
> > I don't think there is a general rule with the allocation/closing/free
> > rule for the various FFmpeg APIs, and giving the impression that this
> > is the case might be misleading. In practice the user needs to master
> > only a single API at a time (encodering/decoding, muxing/demuxing,
> > etc.)  each one with possibly slight differences in how the term
> > close/allocation/free are used. This is probably not optimal, but in
> > practice it works as the user do not really need to know all the
> > possible uses of the API (she will work through what she is interested
> > for the job at hand).
> 
> Note: I'm assuming "this" means "this section", not "this paragraph".
> Apologies if it was intended as a specific nitpick about closing functions.
> 
> TBH, a lot of this document is about inventing memorable rules of thumb.

> The alternative is to say "FFmpeg devs can't agree on an answer, so they just
> left you to memorise 3,000+ special cases".

This is not what most people do. You don't have to read all the
manual, especially you don't have to memorize the complete API, but
only the relevant part for the task at hand. If you need to learn
about decoding, you would probably start with the decoding API
(libavcodec/avcodec.h - which is unfortunatly pretty much complex
because the problem is complex), and you can ignore pretty much
everything else. If you only need to work with hash methods, you only
need to learn about that API.

So in general, you only learn the smaller bits needed for the task at
hand. For example I never used the av_ambient_viewing_environment API,
and I will probably never will.

More realistically, people learn from examples, so that's why we
should improve doc/examples. doxy is mostly for providing a complete
reference in case you want to fine tune an already working piece of
code or you have problems understanding specific bits (how are
timestamps handled? When should I ref/unref a frame? How can I get the
name of a channel layout?).

Usually it is a mixed process (you read the API doc, or you read the
examples to get the general idea, and you interpolate between the
two).

> Let's assume learning the whole of FFmpeg means understanding 3,000 tokens
> (I'm not sure the exact count in 7.0, but it's about that number if you don't
> include individual members of structs, arguments to functions etc.).  Let's
> also assume it takes an average of ten minutes to learn each token (obviously
> that varies - AV_LOG_PANIC will take less, AVCodecContext will take more).
> That means you'd have to spend 8 hours a day every day for over two months
> to learn FFmpeg.

> Obviously there are usable subsets, but they mostly cut out the
> simple things, so don't save nearly as much time as you'd think.  If
> you want people to pick up FFmpeg, they need to learn a useful
> subset in about 8 hours, which requires a drastically simplified
> explanation.

I agree, I'm fine with giving a high level description of what we mean
by "context" - but probably one paragraph or two should be enough, but
adding too much information might get the exact opposite effect
(exposing more information than needed - assuming the reader is
familiar with the C language, which we should assume).

> (the above is closely related to an argument from a recent post[1],
> but the numbers might help explain the scale of the challenge)
> 
> There may not be an explicit rule for context lifetimes, but I've looked at the
> code carefully enough to have a nuanced opinion about the number of tokens,
> and the edgiest case I've found so far is swr_alloc_set_opts2() (see above).
> I'm open to counterexamples, but the model discussed in this section feels
> pretty reliable.

swr_alloc_set_opts() is an unfortunate case, probably it would be
better to fix the API to have a dedicated set_opts in place of
aggregating two operations - one of the issues being that the function
cannot distinguish the case of allocation failure (ENOMEM) from
invalid parameters (EINVAL).

That said, I would not mind keeping the general discussion, but
probably I'd cut the part about "finalize" to avoid the reference to
the C++ language, in general I'd avoid comparisons with C++ all around
the place.

> > > +
> > > + at subsection Context_avoptions Configuration options: AVOptions-enabled structs
> > > +
> > 
> > > +The @ref avoptions "AVOptions API" is a framework to configure user-facing
> > > +options, e.g. on the command-line or in GUI configuration forms.
> > 
> > This looks wrong. AVOptions is not at all about CLI or GUI options, is
> > just some way to set/get fields (that is "options") defined in a
> > struct (a context) using a high level API including: setting multiple
> > options at once (through a textual encoding or a dictionary),
> > input/range validation, setting more fields based on a single option
> > (e.g. the size) etc.
> > 
> > Then you can query the options in a given struct and create
> > corresponding options in a UI, but this is not the point of AVOptions.
> 
> There's a problem here I haven't been communicating clearly enough.
> I think that's because I've understated the problem in the past,
> so I'll try overstating instead:
> 
> "Option" is a meaningless noise word.  A new developer might ask "is this like a
> command-line option, or a CMake option?  Is it like those Python functions with
> a million keyword arguments, or a config file with sensible defaults?".
> Answering "it can be any of those if you need it to be" might help an advanced
> user (not our audience), but is bewildering to a newbie who needs a rough guide
> for how they're likely to use it.  The only solution that's useful to a newbie
> is to provide a frame of reference, preferably in the form of something they
> already know how to use.

This is the definition I gave:

[AVOptions system is] just some way to set/get fields (that is
"options") defined in a struct (a context) using a high level API
including: setting multiple options at once (through a textual
encoding or a dictionary), input/range validation, setting more fields
based on a single option (e.g. the size) etc.

"Setting options" on a struct/context with AVOptions means to set
_fields_ on the struct, following the very specific rules defined by
AVOptions.

So it's not like "it can be any of those if you need it to be", but it
has a very specific meaning.

> Having said all that, yes this particular answer is wrong.  Could
> you apply [2] so I can start thinking about what to replace it with?

I'll have a look at it.


More information about the ffmpeg-devel mailing list