[FFmpeg-devel] native mode in FFmpeg DNN module

Fri Apr 19 11:41:31 EEST 2019

Hi,

The DNN module currently supports two backends, tensorflow (dnn_backend_tf.c) and native(dnn_backend_native.c). The native mode has external dependency, imho it's not good and need a change. I think it is still a proper timing for the change, for the limited functionality and performance of current native mode.

With current implementation, the native mode involves 3 formats and 3 phases, see below.

"Tensorflow model file format (format 1)"  ----->convert model (phase 1)----->   "native model file format (format 2) "   ----->load model file into memory (phase 2)----->   "in memory representation (format 3)"   ----->inference the model (phase 3)----->

We created format 2 and format 3, we write c code for phase 2 and phase 3 within ffmpeg. The phase 1 is written in python to convert from tensorflow model file to native mode file, it is an external dependency at https://github.com/HighVoltageRocknRoll/sr. Once we add anything new in the model, for example, add a new layer, or even just add the padding option for the current Conv layer, we always need to change the external dependency. There will be many many times for such change.

I have two options to improve it.
Option 1) 
Use ONNX (https://github.com/onnx/onnx) to replace phase 1, format 2 and format 3.
Open Neural Network Exchange (ONNX) provides an open source format for AI models, and the model files are protobuf pb files

The advantage is that we don't need to worry about phase 1, format 2 and format 3, but to load the protobuf file into memory (phase 2), google just provides C++ support (without c support), while only C is allowed in FFmpeg, we have to write our c code in ffmpeg to parse/load the protobuf file.

Option 2)
Write c code in FFmpeg to convert tensorflow file format (format 1) directly into memory representation (format 3), and so we controls everything in ffmpeg community. And the conversion can be extended to import more file formats such as torch, darknet, etc. One example is that OpenCV uses this method.

The in memory representation (format 3) can still be current.

I personally prefer option 2. Anyway, will be glad to see any better options. I can continue the contribute on this area once the community decides the technical direction.

Thanks
Yejun