[FFmpeg-devel] [PATCH] matroskaenc: Add support for writing video projection.

Aaron Colwell acolwell at google.com
Fri Feb 3 18:57:58 EET 2017


On Fri, Feb 3, 2017 at 3:28 AM Vittorio Giovara <vittorio.giovara at gmail.com>
wrote:

> On Thu, Feb 2, 2017 at 9:34 PM, James Almer <jamrial at gmail.com> wrote:
> > On 1/31/2017 12:40 PM, Aaron Colwell wrote:
> >>
> >>
> >> On Tue, Jan 31, 2017 at 2:12 AM Vittorio Giovara <
> vittorio.giovara at gmail.com <mailto:vittorio.giovara at gmail.com>> wrote:
> >>
> >> On Sat, Jan 28, 2017 at 4:13 AM, James Almer <jamrial at gmail.com
> <mailto:jamrial at gmail.com>> wrote:
> >>> On 1/27/2017 11:21 PM, Aaron Colwell wrote:
> >>>> On Fri, Jan 27, 2017 at 5:46 PM James Almer <jamrial at gmail.com
> <mailto:jamrial at gmail.com>> wrote:
> >>>>
> >>>> yeah. I'm a little confused why the demuxing code didn't implement
> this to
> >>>> begin with.
> >>>
> >>> Vittorio (The one that implemented AVSphericalMapping) tried to add
> this at
> >>> first, but then removed it because he wasn't sure if he was doing it
> right.
> >>
> >> Hi,
> >> yes this was included initially but then we found out what those
> >> fields were really for, and I didn't want to make the users get as
> >> confused as we were. As a matter of fact Aaron I mentioned this to you
> >> when I proposed that we probably should have separated the classic
> >> equi projection from the tiled one in the specification, in order to
> >> simplify the implementation.
> >>
> >>
> >> Like I said before, it is not a different projection. It is still
> equirectangular and those parameters just crops the projection. It is very
> simple to just verify that the fields are zero if you don't want to support
> the cropped version.
>
> Hello,
> I'm sorry but I heavily disagree. The tiled equirectangular projection
> is something that cannot be used standalone, you have to do additional
> mathematics and take into account different files or streams to
> generate a "normal" or full-frame equirectangular projection. Having a
> separate type allows to include extensions such as the bounds fields,
> which can be safely ignored by the every user that do not need a tiled
> projection.
>

I still think you don't understand what these fields do given what you say
here. Yes there is a little more math. At the end of the day all these
fields do is specify a the min & max for the latitude & longitude. This
essentially translates to adding scale factors and offsets in your shader
or something similar in your 3D geometry creation logic. I get it if
implementations don't want to do this small bit of extra work, but saying
this is a different type seems strange because you wouldn't do this when
talking about cropped 2D images.


>
> It is too late to change the spec, but I do believe that the usecase
> is different enough to add a new type, in order to not overcomplicate
> the implementation.
>

It feels like you are just moving the problem to the demuxers and muxers
here. Adding a new type means all demuxers will have to contain logic to
generate these different types and all muxers will have to contain logic to
collapse these types back down to a single value.

I don't really want to keep arguing about this. If folks really want
different types then I'll do it just because I want to get reading and
writing of metadata working end-to-end. I'd like to make a small request to
use the term "cropped equirectagular" instead of "tiled equirectangular"
but I don't feel to strongly about that.


>
> >>>>> I know you're the one behind the spec in question, but wouldn't it
> be a
> >>>>> better idea to wait until AVSphericalMapping gets a way to propagate
> this
> >>>>> kind of information before adding support for muxing video projection
> >>>>> elements? Or maybe try to implement it right now...
> >>>>>
> >>>>
> >>>> I'm happy to implement support for the projection specific info. What
> would
> >>>> be the best way to proceed. It seems like I could just place a union
> with
> >>>> projection specific structs in AVSphericalMapping. I'd also like some
> >>>
> >>> I'm CCing Vittorio so he can chim in. I personally have no real
> preference.
> >>
> >> The best way in my opinion is to add a third type, such as
> >> AV_SPHERICAL_TILED_EQUI, and add the relevant fields in
> >> AVSphericalMapping, with a clear description about the actual use case
> >> for them, mentioning that they are used only in format. By the way,
> >> why do you mention adding a union? I think four plain fields should
> >> do.
> >>
> >>
> >> I don't think it is worth having the extra enum value for this. All the
> cropped fields do is control how you generate the spherical mesh or control
> the shader used to render the projection. If players don't want to support
> it they can just check to see that all the fields are zero and error out if
> not.
>
> Why would you have them check these fields every time, when this can
> be implicitly determined by the type semantics. I'm pretty sure API
> users prefer this scenario
>
> * check projection type
> -> if normal_equi -> project it
> -> if tiled_equi -> read additional data -> project it
>
> rather than
>
> * check projection type
> -> if equi -> read additional data -> check if data needs additional
> processing -> project it, or perform more operations before projecting
>

I think this mainly boils down to which code you want to burden with the
checks. My proposal puts the burden on the rendering code. Your proposal
puts it on every muxer and demuxer that deals with this information.


>
> >> I was suggesting using a union because the projection bounds fields are
> for equirect, and there are different fields for the cubemap & mesh
> projections. I figured that the enum + union of structs might be a
> reasonable way to organize the projection specific fields.
>
> This is a structure whose size does not depend on ABI and can be
> extended as we like, there is no need to separate new fields in such a
> way as long as they are properly documented, in my opinion.
>

I suppose this is just a difference in style so I don't feel too strongly
about it. I was just trying to use unions & structs here to make it a
little clearer about which fields are expected to be valid and when. The
fields seemed to be disjoint sets so I was trying to use language features
to convey that.

I'd also like to float a potential alternative solution. We could just
convey the projection private data as a size and byte array in this struct.
That would allow me to pass the metadata from demuxers to muxers so I can
achieve my goals, and allow actual parsing of the information to be
deferred until someone needs it. How do you feel about this compromise
position?


>
> Please keep me in CC.
>

Done for real this time. :) Sorry about before. Reply all didn't do what I
expected it to do.

Aaron


More information about the ffmpeg-devel mailing list