[FFmpeg-devel] [PATCH v2] fbdetile cpu based framebuffer layout detiling v02
hanishkvc
hanishkvc at gmail.com
Sat Jun 27 22:57:39 EEST 2020
v02-20200627IST2331
Unrolled Intel Legacy Tile-Y detiling logic.
Also a consolidated patch file, instead of the previous development
flow based multiple patch files.
v01-20200627IST1308
Implemented Intel Legacy Tile-X and Tile-Y detiling logic
NOTES:
This video filter allows framebuffers which are tiled to be detiled
using logic running on the cpu, into a linear layout.
Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling.
THis should help one to work with frames captured (say using kmsgrab)
on laptops having Intel GPU.
Tile-X conversion logic has been explicitly cross checked, with Tile-X
based frames. However Tile-Y conv logic hasnt been tested with Tile-Y
based frames, but it should potentially do the job, based on my current
understanding of the Tile-Y layout format.
TODO1: At a later time have to generate Tile-Y based frames, and then
cross check the corresponding logic explicitly.
TODO2: May be use OpenGL or Vulcan buffer helper routines to do the
layout conversion. But some online discussions from sometime back seem
to indicate that this path is not fully bug free currently.
---
Changelog | 1 +
doc/filters.texi | 62 ++++++++
libavfilter/Makefile | 1 +
libavfilter/allfilters.c | 1 +
libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++
5 files changed, 374 insertions(+)
create mode 100644 libavfilter/vf_fbdetile.c
diff --git a/Changelog b/Changelog
index a60e7d2eb8..0e03491f6a 100644
--- a/Changelog
+++ b/Changelog
@@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to youngest within each release,
releases are sorted from youngest to oldest.
version <next>:
+- fbdetile cpu based framebuffer layout detiling video filter
- AudioToolbox output device
- MacCaption demuxer
diff --git a/doc/filters.texi b/doc/filters.texi
index 3c2dd2eb90..73ba21af89 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -12210,6 +12210,68 @@ It accepts the following optional parameters:
The number of the CUDA device to use
@end table
+ at anchor{fbdetile}
+ at section fbdetile
+
+Detiles the Framebuffer tile layout into a linear layout using CPU.
+
+It currently supports conversion from Intel legacy tile-x and tile-y layouts
+into a linear layout. This is useful if one is using kmsgrab and hwdownload
+to capture a screen which is using one of these non-linear layouts.
+
+Currently it expects the data to be a 32bit RGB based pixel format. However
+the logic doesnt do any pixel format conversion or so. Later will be enabling
+16bit RGB data also, as the logic is transparent to it at one level.
+
+One could either insert this into the filter chain while capturing itself,
+or else, if it is slowing things down or so, then one could instead insert
+it into the filter chain during playback or transcoding or so.
+
+It supports the following optional parameters
+
+ at table @option
+ at item type
+Specify which detiling conversion to apply. The supported values are
+ at table @var
+ at item 0
+intel tile-x to linear conversion (the default)
+ at item 1
+intel tile-y to linear conversion.
+ at end table
+ at end table
+
+If one wants to convert during capture itself, one could do
+ at example
+ffmpeg -f kmsgrab -i - -vf "hwdownload, fbdetile" OUTPUT
+ at end example
+
+However if one wants to convert after the tiled data has been already captured
+ at example
+ffmpeg -i INPUT -vf "fbdetile" OUTPUT
+ at end example
+ at example
+ffplay -i INPUT -vf "fbdetile"
+ at end example
+
+NOTE: While transcoding a test 1080p h264 stream, with 276 frames, with two
+runs of each situation, the performance was has given below. However this
+was for the older | initial version of the logic, as well as it was run on
+the default linux chromebook->vm->container, so the perf values need not be
+proper. But in a relative sense the overhead would be similar.
+ at example
+rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4
+rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=0 out.mp4
+rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=1 out.mp4
+ at end example
+ at table @option
+ at item with no fbdetile filter
+it took ~7.28 secs,
+ at item with fbdetile=0 filter
+it took ~8.69 secs,
+ at item with fbdetile=1 filter
+it took ~9.20 secs.
+ at end table
+
@section hqx
Apply a high-quality magnification filter designed for pixel art. This filter
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 5123540653..bdb0c379ae 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER) += vf_hwdownload.o
OBJS-$(CONFIG_HWMAP_FILTER) += vf_hwmap.o
OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER) += vf_hwupload_cuda.o
OBJS-$(CONFIG_HWUPLOAD_FILTER) += vf_hwupload.o
+OBJS-$(CONFIG_FBDETILE_FILTER) += vf_fbdetile.o
OBJS-$(CONFIG_HYSTERESIS_FILTER) += vf_hysteresis.o framesync.o
OBJS-$(CONFIG_IDET_FILTER) += vf_idet.o
OBJS-$(CONFIG_IL_FILTER) += vf_il.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 1183e40267..f8dceb2a88 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload;
extern AVFilter ff_vf_hwmap;
extern AVFilter ff_vf_hwupload;
extern AVFilter ff_vf_hwupload_cuda;
+extern AVFilter ff_vf_fbdetile;
extern AVFilter ff_vf_hysteresis;
extern AVFilter ff_vf_idet;
extern AVFilter ff_vf_il;
diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c
new file mode 100644
index 0000000000..8b20c96d2c
--- /dev/null
+++ b/libavfilter/vf_fbdetile.c
@@ -0,0 +1,309 @@
+/*
+ * Copyright (c) 2020 HanishKVC
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * Detile the Frame buffer's tile layout using the cpu
+ * Currently it supports the legacy Intel Tile X layout detiling.
+ *
+ */
+
+/*
+ * ToThink|Check: Optimisations
+ *
+ * Does gcc setting used by ffmpeg allows memcpy | stringops inlining,
+ * loop unrolling, better native matching instructions, additional
+ * optimisations, ...
+ *
+ * Does gcc map to optimal memcpy logic, based on the situation it is
+ * used in.
+ *
+ * If not, may be look at vector_size or intrinsics or appropriate arch
+ * and cpu specific inline asm or ...
+ *
+ */
+
+#include "libavutil/avassert.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/opt.h"
+#include "avfilter.h"
+#include "formats.h"
+#include "internal.h"
+#include "video.h"
+
+enum FilterMode {
+ TYPE_INTELX,
+ TYPE_INTELY,
+ NB_TYPE
+};
+
+typedef struct FBDetileContext {
+ const AVClass *class;
+ int width, height;
+ int type;
+} FBDetileContext;
+
+#define OFFSET(x) offsetof(FBDetileContext, x)
+#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
+static const AVOption fbdetile_options[] = {
+ { "type", "set framebuffer format_modifier type", OFFSET(type), AV_OPT_TYPE_INT, {.i64=TYPE_INTELX}, 0, NB_TYPE-1, FLAGS, "type" },
+ { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST, {.i64=TYPE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" },
+ { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST, {.i64=TYPE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" },
+ { NULL }
+};
+
+AVFILTER_DEFINE_CLASS(fbdetile);
+
+static av_cold int init(AVFilterContext *ctx)
+{
+ FBDetileContext *fbdetile = ctx->priv;
+
+ if (fbdetile->type == TYPE_INTELX) {
+ fprintf(stderr,"INFO:fbdetile:init: Intel tile-x to linear\n");
+ } else if (fbdetile->type == TYPE_INTELY) {
+ fprintf(stderr,"INFO:fbdetile:init: Intel tile-y to linear\n");
+ } else {
+ fprintf(stderr,"DBUG:fbdetile:init: Unknown Tile format specified, shouldnt reach here\n");
+ }
+ fbdetile->width = 1920;
+ fbdetile->height = 1080;
+ return 0;
+}
+
+static int query_formats(AVFilterContext *ctx)
+{
+ // Currently only RGB based 32bit formats are specified
+ // TODO: Technically the logic is transparent to 16bit RGB formats also
+ static const enum AVPixelFormat pix_fmts[] = {AV_PIX_FMT_RGB0, AV_PIX_FMT_0RGB, AV_PIX_FMT_BGR0, AV_PIX_FMT_0BGR,
+ AV_PIX_FMT_RGBA, AV_PIX_FMT_ARGB, AV_PIX_FMT_BGRA, AV_PIX_FMT_ABGR,
+ AV_PIX_FMT_NONE};
+ AVFilterFormats *fmts_list;
+
+ fmts_list = ff_make_format_list(pix_fmts);
+ if (!fmts_list)
+ return AVERROR(ENOMEM);
+ return ff_set_common_formats(ctx, fmts_list);
+}
+
+static int config_props(AVFilterLink *inlink)
+{
+ AVFilterContext *ctx = inlink->dst;
+ FBDetileContext *fbdetile = ctx->priv;
+
+ fbdetile->width = inlink->w;
+ fbdetile->height = inlink->h;
+ fprintf(stderr,"DBUG:fbdetile:config_props: %d x %d\n", fbdetile->width, fbdetile->height);
+
+ return 0;
+}
+
+static void detile_intelx(AVFilterContext *ctx, int w, int h,
+ uint8_t *dst, int dstLineSize,
+ const uint8_t *src, int srcLineSize)
+{
+ // Offsets and LineSize are in bytes
+ int tileW = 128; // For a 32Bit / Pixel framebuffer, 512/4
+ int tileH = 8;
+
+ if (w*4 != srcLineSize) {
+ fprintf(stderr,"DBUG:fbdetile:intelx: w%dxh%d, dL%d, sL%d\n", w, h, dstLineSize, srcLineSize);
+ fprintf(stderr,"ERRR:fbdetile:intelx: dont support LineSize | Pitch going beyond width\n");
+ }
+ int sO = 0;
+ int dX = 0;
+ int dY = 0;
+ int nTRows = (w*h)/tileW;
+ int cTR = 0;
+ while (cTR < nTRows) {
+ int dO = dY*dstLineSize + dX*4;
+#ifdef DEBUG_FBTILE
+ fprintf(stderr,"DBUG:fbdetile:intelx: dX%d dY%d, sO%d, dO%d\n", dX, dY, sO, dO);
+#endif
+ memcpy(dst+dO+0*dstLineSize, src+sO+0*512, 512);
+ memcpy(dst+dO+1*dstLineSize, src+sO+1*512, 512);
+ memcpy(dst+dO+2*dstLineSize, src+sO+2*512, 512);
+ memcpy(dst+dO+3*dstLineSize, src+sO+3*512, 512);
+ memcpy(dst+dO+4*dstLineSize, src+sO+4*512, 512);
+ memcpy(dst+dO+5*dstLineSize, src+sO+5*512, 512);
+ memcpy(dst+dO+6*dstLineSize, src+sO+6*512, 512);
+ memcpy(dst+dO+7*dstLineSize, src+sO+7*512, 512);
+ dX += tileW;
+ if (dX >= w) {
+ dX = 0;
+ dY += 8;
+ }
+ sO = sO + 8*512;
+ cTR += 8;
+ }
+}
+
+/*
+ * Intel Legacy Tile-Y layout conversion support
+ *
+ * currently done in a simple dumb way. Two low hanging optimisations
+ * that could be readily applied are
+ *
+ * a) unrolling the inner for loop
+ * --- Given small size memcpy, should help, DONE
+ *
+ * b) using simd based 128bit loading and storing along with prefetch
+ * hinting.
+ *
+ * TOTHINK|CHECK: Does memcpy already does this and more if situation
+ * is right?!
+ *
+ * As code (or even intrinsics) would be specific to each architecture,
+ * avoiding for now. Later have to check if vector_size attribute and
+ * corresponding implementation by gcc can handle different architectures
+ * properly, such that it wont become worse than memcpy provided for that
+ * architecture.
+ *
+ * Or maybe I could even merge the two intel detiling logics into one, as
+ * the semantic and flow is almost same for both logics.
+ *
+ */
+static void detile_intely(AVFilterContext *ctx, int w, int h,
+ uint8_t *dst, int dstLineSize,
+ const uint8_t *src, int srcLineSize)
+{
+ // Offsets and LineSize are in bytes
+ int tileW = 4; // For a 32Bit / Pixel framebuffer, 16/4
+ int tileH = 32;
+
+ if (w*4 != srcLineSize) {
+ fprintf(stderr,"DBUG:fbdetile:intely: w%dxh%d, dL%d, sL%d\n", w, h, dstLineSize, srcLineSize);
+ fprintf(stderr,"ERRR:fbdetile:intely: dont support LineSize | Pitch going beyond width\n");
+ }
+ int sO = 0;
+ int dX = 0;
+ int dY = 0;
+ int nTRows = (w*h)/tileW;
+ int cTR = 0;
+ while (cTR < nTRows) {
+ int dO = dY*dstLineSize + dX*4;
+#ifdef DEBUG_FBTILE
+ fprintf(stderr,"DBUG:fbdetile:intely: dX%d dY%d, sO%d, dO%d\n", dX, dY, sO, dO);
+#endif
+
+ memcpy(dst+dO+0*dstLineSize, src+sO+0*16, 16);
+ memcpy(dst+dO+1*dstLineSize, src+sO+1*16, 16);
+ memcpy(dst+dO+2*dstLineSize, src+sO+2*16, 16);
+ memcpy(dst+dO+3*dstLineSize, src+sO+3*16, 16);
+ memcpy(dst+dO+4*dstLineSize, src+sO+4*16, 16);
+ memcpy(dst+dO+5*dstLineSize, src+sO+5*16, 16);
+ memcpy(dst+dO+6*dstLineSize, src+sO+6*16, 16);
+ memcpy(dst+dO+7*dstLineSize, src+sO+7*16, 16);
+ memcpy(dst+dO+8*dstLineSize, src+sO+8*16, 16);
+ memcpy(dst+dO+9*dstLineSize, src+sO+9*16, 16);
+ memcpy(dst+dO+10*dstLineSize, src+sO+10*16, 16);
+ memcpy(dst+dO+11*dstLineSize, src+sO+11*16, 16);
+ memcpy(dst+dO+12*dstLineSize, src+sO+12*16, 16);
+ memcpy(dst+dO+13*dstLineSize, src+sO+13*16, 16);
+ memcpy(dst+dO+14*dstLineSize, src+sO+14*16, 16);
+ memcpy(dst+dO+15*dstLineSize, src+sO+15*16, 16);
+ memcpy(dst+dO+16*dstLineSize, src+sO+16*16, 16);
+ memcpy(dst+dO+17*dstLineSize, src+sO+17*16, 16);
+ memcpy(dst+dO+18*dstLineSize, src+sO+18*16, 16);
+ memcpy(dst+dO+19*dstLineSize, src+sO+19*16, 16);
+ memcpy(dst+dO+20*dstLineSize, src+sO+20*16, 16);
+ memcpy(dst+dO+21*dstLineSize, src+sO+21*16, 16);
+ memcpy(dst+dO+22*dstLineSize, src+sO+22*16, 16);
+ memcpy(dst+dO+23*dstLineSize, src+sO+23*16, 16);
+ memcpy(dst+dO+24*dstLineSize, src+sO+24*16, 16);
+ memcpy(dst+dO+25*dstLineSize, src+sO+25*16, 16);
+ memcpy(dst+dO+26*dstLineSize, src+sO+26*16, 16);
+ memcpy(dst+dO+27*dstLineSize, src+sO+27*16, 16);
+ memcpy(dst+dO+28*dstLineSize, src+sO+28*16, 16);
+ memcpy(dst+dO+29*dstLineSize, src+sO+29*16, 16);
+ memcpy(dst+dO+30*dstLineSize, src+sO+30*16, 16);
+ memcpy(dst+dO+31*dstLineSize, src+sO+31*16, 16);
+
+ dX += tileW;
+ if (dX >= w) {
+ dX = 0;
+ dY += 32;
+ }
+ sO = sO + 32*16;
+ cTR += 32;
+ }
+}
+
+static int filter_frame(AVFilterLink *inlink, AVFrame *in)
+{
+ AVFilterContext *ctx = inlink->dst;
+ FBDetileContext *fbdetile = ctx->priv;
+ AVFilterLink *outlink = ctx->outputs[0];
+ AVFrame *out;
+
+ out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
+ if (!out) {
+ av_frame_free(&in);
+ return AVERROR(ENOMEM);
+ }
+ av_frame_copy_props(out, in);
+
+ if (fbdetile->type == TYPE_INTELX) {
+ detile_intelx(ctx, fbdetile->width, fbdetile->height,
+ out->data[0], out->linesize[0],
+ in->data[0], in->linesize[0]);
+ } else if (fbdetile->type == TYPE_INTELY) {
+ detile_intely(ctx, fbdetile->width, fbdetile->height,
+ out->data[0], out->linesize[0],
+ in->data[0], in->linesize[0]);
+ }
+
+ av_frame_free(&in);
+ return ff_filter_frame(outlink, out);
+}
+
+static av_cold void uninit(AVFilterContext *ctx)
+{
+
+}
+
+static const AVFilterPad fbdetile_inputs[] = {
+ {
+ .name = "default",
+ .type = AVMEDIA_TYPE_VIDEO,
+ .config_props = config_props,
+ .filter_frame = filter_frame,
+ },
+ { NULL }
+};
+
+static const AVFilterPad fbdetile_outputs[] = {
+ {
+ .name = "default",
+ .type = AVMEDIA_TYPE_VIDEO,
+ },
+ { NULL }
+};
+
+AVFilter ff_vf_fbdetile = {
+ .name = "fbdetile",
+ .description = NULL_IF_CONFIG_SMALL("Detile Framebuffer using CPU"),
+ .priv_size = sizeof(FBDetileContext),
+ .init = init,
+ .uninit = uninit,
+ .query_formats = query_formats,
+ .inputs = fbdetile_inputs,
+ .outputs = fbdetile_outputs,
+ .priv_class = &fbdetile_class,
+};
--
2.20.1
More information about the ffmpeg-devel
mailing list