[FFmpeg-devel] [PATCH v3] fbdetile cpu based detiling of framebuffers v03

C Hanish Menon hanishkvc at gmail.com
Wed Jul 1 19:37:15 EEST 2020


Hi Lynne,

On Wed, Jul 1, 2020 at 3:37 PM Lynne <dev at lynne.ee> wrote:

> Jun 29, 2020, 18:58 by hanishkvc at gmail.com:
>
> > v03-20200629IST2208 fbdetile
> >
> > Added a generic detiling logic, which can be easily configured to
> > detile many different tiling schemes.
> >
> > The same is inturn used to detile Intel Tile-Yf layout.
> >
> > NOTE: This is a full patch, it contains the previous versions also
> > in it.
> >
> > v02-20200627IST2331
> >
> > Unrolled Intel Legacy Tile-Y detiling logic.
> >
> > Also a consolidated patch file, instead of the previous development
> > flow based multiple patch files.
> >
> > v01-20200627IST1308
> >
> > Implemented Intel Legacy Tile-X and Tile-Y detiling logic
> >
> > NOTES:
> >
> > This video filter allows framebuffers which are tiled to be detiled
> > using logic running on the cpu, into a linear layout.
> >
> > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling,
> > as well as the newer Intel Tile-Yf layouts.
> >
> > THis should help one to work with frames captured (say using kmsgrab)
> > on laptops having Intel GPU. This can be done live while capturing
> > itself, or it can be applied later as a seperate pass.
> >
> > Tile-X conversion logic has been explicitly cross checked, with Tile-X
> > based frames. However Tile-Y and Tile-Yf conv logics havent been tested
> > with Tile-Y | Tile-Yf based frames, but it should potentially get the
> > job done, based on my current understanding of these layout formats.
> >
> > TODO1: At a later time have to generate Tile-Y|Yf based frames, and then
> > cross check the corresponding logic explicitly.
> >
> > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the
> > layout conversion. But some online discussions from sometime back seem
> > to indicate that this path is not fully bug free currently.
> >
>
> Still not happening, I'd like to see this done properly with hwdownload.
> While what you
> have works as a hack, we're not interested in hacks but something that
> works universally.
> As I said before, it can be easily sped up by a factor of 4 or 8 using
> SIMD, so its
> unjustifiable to have this in the codebase as a filter.
>
>
Can you tell me how this is not universal. Rather by embedding it within
hwdownload, we
will be making it limited to use from a hwcontext, while keeping it has a
seperate filter,
allows one to use it either with a hw context or from any other source. And
also it gives
the flexibility to do it live or offline. So not sure in what sense you
call my current flow
restricted and a possible embedded within hwdownload one has being
universal?

Also I am assuming that gcc + libc is sensible enough to use a appropriate
fast memcpy
with say rep movs or simd load-stores as the case may be based on which cpu
architecture to
which the code is being built. The overhead with FullHD content is
negligible. Beyond that if
required I have structured the generic detile logic which I have
implemented to do parallel detiling
of multiple tiles in step, which could be easily translated into true
parallel detiling in a hw or
multicore setup.

-- 
Keep ;-)
HanishKVC


More information about the ffmpeg-devel mailing list