[ardour-dev] sse optimizations

Thu Jan 19 08:47:24 PST 2006

On Thu, 2006-01-19 at 10:56 -0500, Peter Lutek wrote:
> are there risks associated with building with sse optimizations? just 
> wondering why the x86/sse options are off by default.

No risks.

I feel confident saying this as all the used assembler functions have
been throughly tested first in a testbed running different sets of data
through the functions and comparing the results with known to work C
variants for the same code. And the testbed has been tested on different
processors. After this the code was committed into ardour and a lot of
people have used the code and I haven't heard of a single report that it
would cause problems. 

Just remember that you need to specify -o to ardour at startup for the
functions to kick in.

{If you do run into problems despite all this testing, just remember
what ardour tells you at startup:  Ardour comes with ABSOLUTELY NO
WARRANTY  :) }

btw. You might be aware of this already, but the optimized assembler
code is available only for 32 bit x86 platforms. We have the functions
for x86/64 somewhat ready (or are they completely ready Mr. Rigg?), but
we will have to wait until 2.0 matures a bit before we start adding
them.

> what sort of benefits would those optimizations be likely to yield on an 
> fc3 planet-ccrma pentium-m laptop, with 1Gb RAM, writing to external 
> USB2 drives?

This is a good question. First of all, the assembler only affects
"small" parts of ardour. Most importantly: peak computation and mixing
algorithms. 

One of the main advantages in having the assembler written by hand is
that distribution packagers can compile ardour _without_ SSE. Packages
for distributions normally must run on lower grade processors as well.
But by adding the SSE code in the crucial parts of the code makes these
heavy computations faster with newer processors.

On the other hand when ardour is built with SSE instructions enabled
everywhere, our code does beat the code generated by gcc.

.. And it does. At realistic buffer sizes (64+):

gcc    | peak computation | mixing algos
------------------------------------------------
3.4.5  | 6 - 20x faster   | 1.3 - 2.2x faster
4.0.3  | 4.5 - 13x faster | 6x faster

-O2 -msse -mfpmath=sse -march=pentium4

(As you can see from there, something nasty has happened in 4.0.3 to the
optimizer. The tester also shows that there are very minute differences
between the results of the ASM code and the compiler generated code.
These differences tell us that for some reason, 4.0.3 has decided to use
the float stack instead of SSE calculations. The float stack is slow,
but it's more precise as calculations are done in 80 bits)

But. Here comes the difficult bit. These improvements only affect one
corner of what Ardour does. This does not affect plugin processing, or
GUI processing, etc. It will only make the basic processing done for
each track/bus faster. So in a session with 10 tracks and a lot of
plugins you will see a drop from (for example) 50% to 45%. Not much but
still a bit more headroom.  On the other hand, a session with 40 tracks
with no plugins, it drops from 33% to 9%. (The percentages are in DSP%
shown in the ardour editor).

Hope this helps.
-- 
Sampo Savolainen <v2 at iki.fi>