[FFmpeg-devel] [PATCH 1/4] lavc/flacenc: add sse4 version of the 16-bit lpc encoder

James Darnley james.darnley at gmail.com
Fri Aug 8 16:30:57 CEST 2014

On 2014-07-21 01:48, Michael Niedermayer wrote:
> On Mon, Jul 21, 2014 at 12:32:23AM +0200, James Darnley wrote:
>> On 2014-03-15 00:01, Michael Niedermayer wrote:
>>> On Wed, Mar 12, 2014 at 01:03:03PM +0100, James Darnley wrote:
>>>> +; Is it worth looping correctly over the first samples?  The most that ever need
>>>> +; to be copied is 32 so we might as well just unroll the loop and do all 32.
>>> implementations should not make assumtations on their use except
>>> what is documented in the API
>>> or the other way around
>>> if some limitation is always true and you want to write an
>>> implementation that takes advantage of the limitation for optimization
>>> then this limitation should be documented in the API first
>>> (in this case of FLACDSPContext / lpc_encode)
>> So...  I've been bored lately and thought I'd come back to this.  I've
>> got a changed version which copies these samples in a loop.  You can see
>> the changes in these two links:
>>> https://gitorious.org/ffmpeg/jdarnley-ffmpeg/commit/9604911fbbe864cd0f670bdad47e7c5c2e83dc02
>>> https://gitorious.org/ffmpeg/jdarnley-ffmpeg/commit/837879d34100113105f099996ac67085d9c86396
>> These two are just about the same but apply to 16 and 32-bit.
>> Should I try to measure the difference between the two?  Or should I
>> just submit one version, possibly with suitable documentation?
> fastest is best, and docs must match implementation but docs can be
> changed for internal API

Testing showed no reliable difference.

To be specific: the new code, when measured, showed a runtime decrease
of about 0.2% (+/- 0.2), yet the function took a little more time to run
(also with a similarly large error).

Having done this I will submit the patches again using the old code but
with some small changes and documentation of its limits.  I will also
add some further documentation about the C code because its unrolled
function (used in the not CONFIG_SMALL case) also assumes a maximum
order of 32.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 618 bytes
Desc: OpenPGP digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140808/e9d8e906/attachment.asc>

More information about the ffmpeg-devel mailing list