[ardour-dev] sse optimizations

Thu Jan 19 09:30:55 PST 2006

Sampo Savolainen wrote:

>On Thu, 2006-01-19 at 10:56 -0500, Peter Lutek wrote:
>  
>
>>are there risks associated with building with sse optimizations? just 
>>wondering why the x86/sse options are off by default.
>>    
>>
>
>No risks.
>
>I feel confident saying this as all the used assembler functions have
>been throughly tested first in a testbed running different sets of data
>through the functions and comparing the results with known to work C
>variants for the same code. And the testbed has been tested on different
>processors. After this the code was committed into ardour and a lot of
>people have used the code and I haven't heard of a single report that it
>would cause problems. 
>
>Just remember that you need to specify -o to ardour at startup for the
>functions to kick in.
>
>{If you do run into problems despite all this testing, just remember
>what ardour tells you at startup:  Ardour comes with ABSOLUTELY NO
>WARRANTY  :) }
>
>btw. You might be aware of this already, but the optimized assembler
>code is available only for 32 bit x86 platforms. We have the functions
>for x86/64 somewhat ready (or are they completely ready Mr. Rigg?), but
>we will have to wait until 2.0 matures a bit before we start adding
>them.
>
>  
>
>>what sort of benefits would those optimizations be likely to yield on an 
>>fc3 planet-ccrma pentium-m laptop, with 1Gb RAM, writing to external 
>>USB2 drives?
>>    
>>
>
>This is a good question. First of all, the assembler only affects
>"small" parts of ardour. Most importantly: peak computation and mixing
>algorithms. 
>
>One of the main advantages in having the assembler written by hand is
>that distribution packagers can compile ardour _without_ SSE. Packages
>for distributions normally must run on lower grade processors as well.
>But by adding the SSE code in the crucial parts of the code makes these
>heavy computations faster with newer processors.
>
>On the other hand when ardour is built with SSE instructions enabled
>everywhere, our code does beat the code generated by gcc.
>
>.. And it does. At realistic buffer sizes (64+):
>
>gcc    | peak computation | mixing algos
>------------------------------------------------
>3.4.5  | 6 - 20x faster   | 1.3 - 2.2x faster
>4.0.3  | 4.5 - 13x faster | 6x faster
>
>-O2 -msse -mfpmath=sse -march=pentium4
>
>(As you can see from there, something nasty has happened in 4.0.3 to the
>optimizer. The tester also shows that there are very minute differences
>between the results of the ASM code and the compiler generated code.
>These differences tell us that for some reason, 4.0.3 has decided to use
>the float stack instead of SSE calculations. The float stack is slow,
>but it's more precise as calculations are done in 80 bits)
>
>But. Here comes the difficult bit. These improvements only affect one
>corner of what Ardour does. This does not affect plugin processing, or
>GUI processing, etc. It will only make the basic processing done for
>each track/bus faster. So in a session with 10 tracks and a lot of
>plugins you will see a drop from (for example) 50% to 45%. Not much but
>still a bit more headroom.  On the other hand, a session with 40 tracks
>with no plugins, it drops from 33% to 9%. (The percentages are in DSP%
>shown in the ardour editor).
>
>Hope this helps.
>  
>
thanks... yes, very helpful.

since many of my sessions involve very few plugins, but quite dense 
regions with crossfades, and a lot of tracks, this might be helpful. 
i'll go re-compile and see what happens.

-p