[Libav-user] avcodec_decode_video2 doesn't get pal8 palette and how to get "closest" pix_fmt

Tue Jul 26 19:58:00 CEST 2011

On Sun, Jul 24, 2011 at 9:41 AM, Stefano Sabatini
<stefano.sabatini-lala at poste.it> wrote:
> On date Sunday 2011-07-24 04:51:49 -0400, Matthew Einhorn encoded:
>> Hi,
>>
>> I'm using the ffmpeg API to decode videos and I ran into two issues
>> while using it.
>>
>> 1. I'm decoding by calling in a loop av_read_frame followed by
>> avcodec_decode_video2 and if frameFinished I then use sws_scale to
>> convert from my source format to dest format. This work fine for all
>> videos except pal8 videos. For pal8, sws_scale results in a completely
>> black picture. I suspected that the pal8 palette must be massed up and
>> indeed the palette returned by avcodec_decode_video2 is all zero.
>
> If the problem is the source image, you should check that the palette
> in data[1] provided to libswscale is correct (e.g. you can check that
> it is different from 0). Also check if the ff* tools can deal with the
> file correctly. If not a sample program / sample test showing the
> problem may be useful.
>

Ok, first a bit more info which I forgot to include above, I'm using
zeranoe built (git 9c2651a) win32 dlls with msvc++.
Also, ffplay plays the video well, although for a different reason
than you might think (I think...).

Upon spending some time with the debugger I've isolated the problem
into a very weird corner. But first a bit more about my code (partly
attached). My code is a dll wrapper to the ffmpeg dlls. Upon one dll
call you create an object for a video and initialize all the
format/conversion/codec contexts (open function). Then with further
dll calls you request the next frame. As said, this works fine now for
all video formats I tested with except pal8 (rawvideo). With pal8,
calling avcodec_decode_video2(m_pCodecCtx, m_pFrame, &nFrameFinished,
m_pAVPacket) copies a palette of zero into m_pFrame->data[1]
(palette_has_changed is also zero). So this has nothing to do with the
sws_scale, because sws_scale gets a bad palette. So the question is
why avcodec_decode_video2 doesn't read the palette. The video file
isn't bad because of the following.

I was able to fix this if before returning from the open function I
added one call to avcodec_decode_video2 (and of course before that to
av_read_frame). That is, if I asked ffmpeg to decode the first frame
before I returned from the function that initialized frames, context
etc. the palette was read correctly in the first and subsequent frames
(palette_has_changed was one). But if I requested the first frame
after returning from my open frame function, in a separate function,
the palette isn't read properly.

Now, this smells of something going out of context and closed when my
open function returns. It cannot be my variables because all of my
variables are created as class variables beforehand which stay put. I
also don't use any smart pointers or such. So it must be (I think)
that one of the av alloc functions clears something if I don't decode
a frame before returning from the function that called the av alloc
function. I think it's something with the decoder, possibly a buffer?

My dlls are called from the same thread every time they are called and
the dll doesn't unload or move between calls. Now ffplay does all its
work from one central main function with calls to other functions (and
it's not a dll) so that's why I think ffplay doesn't have an issue
with it.

Now, I understand that this might be difficult to debug so I'm mostly
asking for clues and what to look at. I.e. in all the format/codec
contexts structs is there some function pointer or member variables
that are responsible for getting the palettes and will help me track
down the issue?  avcodec_decode_video2 ends up calling some function
pointer so I couldn't follow through the code to see where it's
actually read. It could also be that the problem is with the the
zeranoe dlls in which case this might not be the best place to solve
it, but I doubt it because it works fine for all the other videos.

>> In particular, from what I seemed to have read and seen of ffmpeg, for
>> pal8 AVFrame data[0] is the data, while data[1] is the palette. When
>> calling avcodec_decode_video2 on a pal8 video, data[0] is indeed data
>> (bunch of different values), while data[1] is an array with all
>> elements zero. Indeed, when I edited data[1] to some random values the
>> sws_scale output image was not black anymore and you could see the
>> remnants of my picture.
>>
>
>> So I'm wondering, is the video file broken and that's why the palette
>> doesn't show up? Or did I miss a flag when initializing codec/format
>> context etc. so that the palette isn't read?
>
> AFAIK you don't need any special hacks for working with palette
> formats.
>
>> 2. I'm looking for a function similar to avcodec_find_best_pix_fmt.
>> What I want is to pass in a list of formats and the function would
>> return what's the closest format. For example, say the source format
>> is pal8 and I pass in as possible destination formats: RGB24 and
>> GRAY8. Then the function should return GRAY8.
>> avcodec_find_best_pix_fmt would return in that case RGB24 which "is"
>> the best format, but in this case would waste 2 extra bytes since pal8
>> is only 8 bytes depth and gray to start with.
>>
>> Does a function like this exist? Would it be easy for me to write such
>> a function using the ffmpeg API? And if so can I get some pointers?
>
> Should be easy to hack the logic of avcodec_find_best_pix_fmt() for
> implementing an avcodec_find_closest_pix_fmt() or such.
>

I looked through the code for the above functions and I think as is,
the avcodec_find_best_pix_fmt function should return the closest pix
format like I want. I think the only reason it doesn't (I think) is
because the pal8 format in particular might be set wrongly.

If you look at the pix_fmt_info array that the
avcodec_find_best_pix_fmt1 func is referring to, you'll see this
definition for pal8:
[PIX_FMT_PAL8] = {
    .is_alpha = 1,
    .color_type = FF_COLOR_RGB,
    .depth = 8,
},

shouldn't it be .color_type = FF_COLOR_GRAY? Because it's set to
FF_COLOR_RGB, the avcodec get loss function returns a chroma and
colorspace loss when converting from pal8 to gray8. That's why RGB24
gets picked over gray8. But I thought that pal8 is already gray (B/W)
so there shouldn't be any loss? Admittedly, I don't know too much
about the pix formats.

>> Thanks in advance for any help,
> _______________________________________________
> Libav-user mailing list
> Libav-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/libav-user
>

Note, I don't know if the mailing list will accept attached code (txt
file), so I guess I'll find out.

Thanks again,
Matt
-------------- next part --------------
// h file below at the end

///////////////////////////////////////////////////////
// Actual decoding file implementation
CDecodeFile::CDecodeFile(int nIndex):m_nIndex(nIndex)
{
	InitializeCriticalSection(&m_rFileSafe);
	m_pFormatCtx= NULL;	// Make sure you start with all pointer nulled
	m_pCodecCtx= NULL;
	m_pCodec= NULL;

	m_pFrame= NULL;
	m_pFrameOut= NULL;
	m_pConvertCtx= NULL;
	m_ePixFmtOut= PIX_FMT_NONE;

	m_pAVPacket= new AVPacket();

	m_nBufferSize= 0;

	m_nVideoStream= -1;	// No stream

	m_bDone= false;	// So that we'll start reading packets.
	m_dPts= 0;
}

int CDecodeFile::OpenFile(const char szFilename[], PixelFormat ePixFmtPreferred[], int nFmts, SDecodeFileParams* psDecodeFileParams)
{
	if ( !szFilename || !psDecodeFileParams)
		return BAD_PARAMS;

	EnterCriticalSection(&m_rFileSafe);
	if (m_pFormatCtx)	// Make sure we don't open twice the same file
	{
		LeaveCriticalSection(&m_rFileSafe);
		return FILE_OPEN;	// File was already open
	}

	// Open the video file
	if(avformat_open_input(&m_pFormatCtx, szFilename, NULL, NULL)!=0)
	{
		FinishUp();	// Clear to fresh on error
		LeaveCriticalSection(&m_rFileSafe);
		return D_CANT_OPEN_FILE;
	}

	// Populate stream info.
	if(avformat_find_stream_info(m_pFormatCtx, NULL)<0)
	{
		FinishUp();
		LeaveCriticalSection(&m_rFileSafe);
		return D_CANT_READ_FILE;
	}

	// Find the best video stream.
	m_nVideoStream= av_find_best_stream(m_pFormatCtx, AVMEDIA_TYPE_VIDEO, -1, -1, &m_pCodec, 0);
	if(m_nVideoStream<0)
	{
		FinishUp();
		LeaveCriticalSection(&m_rFileSafe);
		return D_CANT_READ_FILE; // Didn't find a video stream
	}

	// Get a pointer to the codec context for the video stream
	m_pCodecCtx= m_pFormatCtx->streams[m_nVideoStream]->codec;

	// Open codec
	if(avcodec_open2(m_pCodecCtx, m_pCodec, NULL)<0)
	{
		FinishUp();
		LeaveCriticalSection(&m_rFileSafe);
		return D_CANT_OPEN_DECODER; // Could not open codec
	}

	// Find format of output pic
	if (!ePixFmtPreferred || !nFmts)	// If there's no preffered output format use stream format
	{
		m_ePixFmtOut= m_pCodecCtx->pix_fmt;
	} else if (nFmts==1)	// Use preselected format
	{
		m_ePixFmtOut= ePixFmtPreferred[0];
	} else	// Select one of selected formats
	{
		int nLoss;
		int64_t	nMask= 0;
		for (int i= 0;i<nFmts;++i)
		{
			nMask |= ((int64_t)1)<<ePixFmtPreferred[i];
		}
		if ((m_ePixFmtOut= avcodec_find_best_pix_fmt(nMask, m_pCodecCtx->pix_fmt, 1, &nLoss))==-1)	// Try best, if fails use first on list
		{
			m_ePixFmtOut= ePixFmtPreferred[0];
		}

	}

	// Prepre frame into which packets as they are read will be copied into.
	// No need to allocate buffer for pic data because it's allocated by av_read_frame
	m_pFrame= avcodec_alloc_frame();
	if (!m_pFrame)
	{
		FinishUp();
		LeaveCriticalSection(&m_rFileSafe);
		return ALLOC_MEMORY_ERROR;	// Not enough memory
	}

	if (m_ePixFmtOut!=m_pCodecCtx->pix_fmt)	// If we need to convert pic to different output, create conversion context, frames...
	{
		// Prepre frame where the final image that has been converted to the proper output format is copied into
		m_pFrameOut= avcodec_alloc_frame();
		if(!m_pFrameOut)
		{
			FinishUp();
			LeaveCriticalSection(&m_rFileSafe);
			return ALLOC_MEMORY_ERROR; // Memory trouble
		}

		// Fill in image with buffers
		if (av_image_alloc(m_pFrameOut->data, m_pFrameOut->linesize, m_pCodecCtx->width, 
			m_pCodecCtx->height, m_ePixFmtOut, 1)<0)
		{
			FinishUp();
			LeaveCriticalSection(&m_rFileSafe);
			return ALLOC_MEMORY_ERROR; // Memory trouble
		}

		// This creates the conversion context used to convert formats.
		m_pConvertCtx = sws_getCachedContext(NULL, m_pCodecCtx->width, m_pCodecCtx->height, 
			m_pCodecCtx->pix_fmt, 
			m_pCodecCtx->width, m_pCodecCtx->height, m_ePixFmtOut, SWS_BICUBIC,
			NULL, NULL, NULL);
		if (!m_pConvertCtx)
		{
			FinishUp();
			LeaveCriticalSection(&m_rFileSafe);
			return NO_CONVERT_FMT; // Cannot convert pix formats
		}
	}

	m_bDone= false;	// Initially read packets.
	m_dPts= 0;
	// Populate return struct with values.
	psDecodeFileParams->dDuration= ((double)m_pFormatCtx->duration)/AV_TIME_BASE;
	psDecodeFileParams->nFrames= (int) m_pFormatCtx->streams[m_nVideoStream]->nb_frames;
	psDecodeFileParams->nMeanFrameRateDen= !m_pFormatCtx->streams[m_nVideoStream]->avg_frame_rate.den?
		m_pFormatCtx->streams[m_nVideoStream]->r_frame_rate.den:
		m_pFormatCtx->streams[m_nVideoStream]->avg_frame_rate.den;
	psDecodeFileParams->nMeanFrameRateNum= !m_pFormatCtx->streams[m_nVideoStream]->avg_frame_rate.den?
		m_pFormatCtx->streams[m_nVideoStream]->r_frame_rate.num:
		m_pFormatCtx->streams[m_nVideoStream]->avg_frame_rate.num;
	psDecodeFileParams->ePixFmtOut= m_ePixFmtOut;
	psDecodeFileParams->nWidth= m_pCodecCtx->width;
	psDecodeFileParams->nHeight= m_pCodecCtx->height;
	// Determine required input buffer size of read frame to copy the image.
	psDecodeFileParams->nBufferSize= avpicture_get_size(m_ePixFmtOut, m_pCodecCtx->width, m_pCodecCtx->height);
	m_nBufferSize= psDecodeFileParams->nBufferSize;	// Save required size
	if (av_image_fill_linesizes(psDecodeFileParams->anLinesizes, m_ePixFmtOut, m_pCodecCtx->width)<0)
	{
		FinishUp();
		LeaveCriticalSection(&m_rFileSafe);
		return LINESIZES_ERROR;
	}

	LeaveCriticalSection(&m_rFileSafe);

	/*int nFrameFinished;//<-----------------------------------------------------------------------------------------------------------------------------------------------
	av_read_frame(m_pFormatCtx, m_pAVPacket);
	avcodec_decode_video2(m_pCodecCtx, m_pFrame, &nFrameFinished, m_pAVPacket);//<-------------------------------------If this isn't called here, pal8 palette isn't recieved
	av_free_packet(m_pAVPacket);*/
	return 0;
}

int CDecodeFile::GetNextFrame(unsigned char auchBuffer[], int nSizeIn, double* pdPts)
{
	if (!auchBuffer || !pdPts)
		return BAD_PARAMS;

	EnterCriticalSection(&m_rFileSafe);
	if (!m_pFormatCtx)	// Make sure it's a valid object
	{
		LeaveCriticalSection(&m_rFileSafe);
		return BAD_OBJECT;
	}

	if (nSizeIn<m_nBufferSize)	// Make sure size is large enough
	{
		LeaveCriticalSection(&m_rFileSafe);
		return BAD_PARAMS;
	}

	int nRes= 0;
	double dPtsI= m_dPts;	// pts of last frame
	bool bDone= m_bDone;	// Are we reading still or are we flushing frames now?
	while(!bDone && av_read_frame(m_pFormatCtx, m_pAVPacket)>=0)	// Stop if found frame, flushing, or end of file
	{
		// Is this a packet from our video stream?
		if(m_pAVPacket->stream_index==m_nVideoStream)
		{
			// Try to decode a frame from packet data
			bDone= DecodeFrame(auchBuffer, nSizeIn);	// True if decoded full frame.
		}
		// Free the packet that was allocated by av_read_frame and holds the pic data
		av_free_packet(m_pAVPacket);
	}

	bool bMoreFlush;	// If we're flushing are there any more frames to be flushed? Otherwise seek to start
	// Are we flushing or about to start flushing?
	// bDone is only false here if we reached end of file.
	if (!bDone || m_bDone)	// First and subsequent flush
	{
		m_pAVPacket->data=NULL;	// We need to get the last frames of some formats by calling decode frame with packet with null data.
		m_pAVPacket->size=0;
		m_bDone= true;	// In subsequent calls we'll be flushing
		bMoreFlush= DecodeFrame(auchBuffer, nSizeIn);	// False if no more frames to seek.
	} else
	{
		bMoreFlush= true;	// Still more frames to get so don't seek back to zero
	}

	// If we reached the last frame we need to seek back to start
	if ((m_dPts<dPtsI && dPtsI!=0)	// We went past valid frames so got reset to zero, i.e. pts jumped to less than before, but not b/c seeking
		|| !bMoreFlush			// Done with ALL frames
		|| (m_dPts==dPtsI && dPtsI>0))		// repeating frames pts so we're done with video
	{
		if (avformat_seek_file(m_pFormatCtx, m_nVideoStream, INT64_MIN, 0, INT64_MAX, 0)>=0)
		{
			avcodec_flush_buffers(m_pCodecCtx);	// Flush so that no buffer confusion
			m_bDone= false;	// We're not done anymore because we started again
			m_dPts= 0;
			nRes= GetNextFrame(auchBuffer, nSizeIn, pdPts);	// Now get first frame in stream
			if (!nRes)	// If succesfull indicate that we looped back and picture returned is the first frame again.
				nRes= D_LAST_FRAME;
		} else
		{
			avcodec_flush_buffers(m_pCodecCtx);	// Either way flush buffers.
			nRes= D_SEEK_FAILED;
		}
	}

	*pdPts= m_dPts;	// PTS of current image
	LeaveCriticalSection(&m_rFileSafe);
	return nRes;
}

bool CDecodeFile::DecodeFrame(unsigned char auchBuffer[], int nSizeIn)
{
	int nFrameFinished= 0;

	// Decode as much of the frame from packet and copy into frame.  This needs to be called until 
	// We get nFrameFinished true, otherwise the full frame hasn't been receieved.
	int nRes= avcodec_decode_video2(m_pCodecCtx, m_pFrame, &nFrameFinished, m_pAVPacket);
	// Estimate the pts of the frame. Timestamp is converted to time in sec.
	// Don't move it into conditional below otherwise, we'd seek to start upon next call.
	m_dPts = m_pFrame->best_effort_timestamp*av_q2d(m_pFormatCtx->streams[m_nVideoStream]->time_base);
	// Do we need to account for pFrame->repeat_pict?

	// Did we get a video frame?
	if(nFrameFinished)
	{
		if (m_ePixFmtOut!=m_pCodecCtx->pix_fmt)	// If we need to convert pic to different output...
		{
			// Convert to the proper output format
			sws_scale(m_pConvertCtx, m_pFrame->data, m_pFrame->linesize, 0, 
				m_pCodecCtx->height, m_pFrameOut->data, m_pFrameOut->linesize);
			// copy to the buffer.
			avpicture_layout((AVPicture*) m_pFrameOut, m_ePixFmtOut, m_pCodecCtx->width, m_pCodecCtx->height, auchBuffer, nSizeIn);
		} else
		{	// Otherwise just copy it
			avpicture_layout((AVPicture*) m_pFrame, m_ePixFmtOut, m_pCodecCtx->width, m_pCodecCtx->height, auchBuffer, nSizeIn);
		}
		return true;	// Frame gotten
	}
	return false;	// No frame gotten
}

// h file
/** A thread safe class which houses a single file that is being decoded. */
class CDecodeFile
{
public:
	/** Initializes the object with a const integer, nIndex that can then be used to locate this object
		in a list. To open the file you need to call {@link OpenFile} on this object.*/
	CDecodeFile(int nIndex);
	/** Also closes the file (if opened). */
	virtual ~CDecodeFile();

	/** Opens and associates this object with a particular video file. This creates and initializes all
		the memory and objects needed for decoding. If it was unsucessful, it'll return an error code, and
		you can try then again to open another video file. However, if it succeded in opening a video 
		file you won't be able to open another file with this object. @see DecoderOpenFile. */
	int OpenFile(const char szFilename[], PixelFormat ePixFmtPreferred[], int nFmts,
		SDecodeFileParams* psDecodeFileParams);
	/** Returns the next frame based on our current position in stream. @see DecoderGetNextFrame.*/
	int GetNextFrame(unsigned char auchBuffer[], int nSizeIn, double* pdPts);
	/** Seeks in the video to the desired timestamp. @see DecoderSeekToTimestamp.*/
	int	SeekToTimestamp(double dTimestamp);

	/** A value that gets assigned to this object so that you can identify the object by this value. */
	const int m_nIndex;
private:
	/** Closes all open files, codecs etc. It clears the object to fresh state.
		Is not protected with critical section so calling func must protect it.*/
	void	FinishUp();
	/** Reads the last packet that was gotten using av_read_frame or if NULL and
		outputs the data on sucess to auchBuffer in the proper output format.
		Is not protected with critical section so calling func must protect it.
	@return true if a full frame was gotten and copied to buffer, false otherwise. */
	bool	DecodeFrame(unsigned char auchBuffer[], int nSizeIn);

	/** Holds the format info on the file. */
	AVFormatContext *m_pFormatCtx;
	/** Holds the codec info of the file. */
	AVCodecContext *m_pCodecCtx;
	/** A codec that used to decode the file. */
	AVCodec *m_pCodec;

	/** Frame into which incoming packet data gets stored to. */
	AVFrame *m_pFrame;
	/** The frame into which the output file will saved after being converted. */
	AVFrame *m_pFrameOut;
	/** The format into which the frame will be converted to in order to send back to user. */
	PixelFormat	m_ePixFmtOut;
	/** The struct used to convert between the picture formats. */
	SwsContext *m_pConvertCtx;
	/** Buffer into which the output picture data is copied into. */
	unsigned char* m_auchBuffer;
	/** Size of m_auchBuffer. @see m_auchBuffer. */
	int m_nBufferSize;

	/** The packet struct into which data is saved as frames are being read from stream. */
	AVPacket *m_pAVPacket;
	/** The stream number among the streams of the file that we decode. */
	int m_nVideoStream;

	/** If we're done reading the file and we now need to flush the decoder to get the remaining pictures. */
	bool m_bDone;
	/** The PTS of the last picture returned in video time. */
	double	m_dPts;

	/** Protects access to this object. */
	CRITICAL_SECTION m_rFileSafe;
};