[Ardour-Dev] Feedback on string-convert branch

Tim Mayberry mojofunk at gmail.com
Tue Apr 18 19:33:06 PDT 2017


The majority of the changes on the string-convert branch have now been
merged
onto master.

The changes that are left to commit are:

- Removal of the LocaleGuards that are no longer necessary (nearly all of
them)
  which should have no impact now that numeric conversion is independent of
  locale other than a slight benefit to performance.

- Resetting the C and C++ locales to the users preferred locale in
LocaleGuard
  destructor so that the C++ locale is not always set to C/POSIX. This
change
  will mean that theoretically we can use the C and C++ API's to generate
  localized strings for display. In reality the compiler/library support for
  C++ locales on non-linux platforms means we should probably avoid using
C++
  API(iostream) for localization, at least for now.

Both changes *should* not cause any change to current behaviour but to
reduce
the amount of change/possible bugs I'll wait until the next release cycle to
apply those.

So basically from now, where any numeric conversion that is required for
de/serialization or where numeric conversion is required to be the the "C"
locale the API defined in pbd/string_convert.h should be used (generally via
XMLNode::get/set_property) or conversely only use snprinf/sscanf,
std::stringstream or PBD::string_compose for numeric conversion where
localization is required.

If any issues are found please create a report in the bug tracker with
Session/config files attached and I'll deal with it ASAP.

On Wed, Jan 4, 2017 at 10:52 PM, Tim Mayberry <mojofunk at gmail.com> wrote:

> I've pushed a branch called string-convert to the main repository that I'd
> like
> some feedback on from the other developers when they get a chance. It
> represents a fair bit of work over the last year or so and is attempt to
> make
> improvements to the de/serialization code in Ardour.
>
> There may be a few further tweaks necessary but I feel it is ready to
> merge (
> assuming there are no issues with that) at an appropriate time.
>
> The changes result in some backwards compatible modifications to the
> Session
> file format and config files, so with that in mind it may be a candidate
> as a
> "feature" for the next major version and an increase in the Session file
> format
> version may be a good idea (although perhaps not necessary).
>
> To summarize the implementation, I've added a new C++ API in libpbd used
> for
> converting types <-> strings for built in types and for other types via
> template specialization. The API is intended to be thread safe and all
> conversions are performed as if in the C locale without changing the
> global C
> or C++ locales. Some requirements of the API were to ensure round-trip
> equality, reduce complexity, improve readability and increase the amount of
> error checking/logging.
>
> The new API is then used in the PBD::XMLNode, PBD::Properties and
> GUIObjectState classes and elsewhere to provide the above benefits and try
> to
> ensure there is only a single implementation for each type <-> string
> conversion. As a result the LocaleGuards that are used to guard internal
> string
> conversions are no longer necessary and are removed.
>
> What follows is some background information and implementation details for
> those that are interested.
>
> # I18n and Serialization
>
> An application like Ardour that has native language support will generally
> use
> a users preferred locale setting from their system environment to determine
> what language is used and aspects like formatting of numeric values.
>
> Ardour uses the GNU gettext utilities and runtime library for retrieval of
> translated messages for display in the GUI. Ardour is written primarily in
> C++,
> so for locale specific formatting of numeric input and output,
> functionality
> from either the C standard library(sscanf/snprintf) or C++ standard
> library(iostreams) would normally be used.
>
> When the data that is part of the application state, like Session and
> configuration data gets converted from an in memory binary format to a
> string/character based format for serialization to disk, the numeric data
> should not be formatted in a locale specific format as it makes correct
> parsing
> dependent on using that locale. The main issue is some locales use a comma
> to
> mark the decimal point in floating point numbers(1.25 vs 1,25) or use
> grouping
> of integral numbers(1,200 vs 1200), so to avoid this all numeric data is
> formatted using the same locale, the "C" locale.
>
> To perform numeric conversion to and from numeric types with the "C" locale
> using the standard scanf/printf family of functions the locale has to be
> changed from the current locale (usually that is the users preferred
> locale set
> at the start of program execution to support NLS/localization) using
> setlocale() and then back again after numeric conversion.
>
> C and C++ use different data structures/classes to represent locale data.
> It is
> possible that the C and C++ API's have a different "setting" for the
> current
> global locale and slightly confusingly setting the C++ locale sets the C
> and
> C++ locale but setting the C locale only sets the C locale and not the C++
> locale.
>
> The standard C++ library supports the concept of imbuing (via
> std::ios::imbue)
> an iostream with a locale so that a specific locale is used when converting
> numeric data to a string or vica verca. Unfortunately this has not been
> well
> supported by C/C++ compilers/libraries and when I recently tested it did
> not
> work with all the platforms/options/compilers we use and is not currently
> used
> in the Ardour codebase.
>
> Also, changing the global C and or C++ locale is not thread-safe as it
> modifies
> shared global data which may be used from various C/C++ run-time functions,
> such as error messages, time/date, translation (gettext), floating point
> numbers formatting (printf/iostream) and parsing (scanf/iostream),
> character
> types (<ctype.h>) and wide to multi-byte characters conversion functions.
>
> This means that if one thread is accessing the global C locale while
> another is
> modifying the locale with ::setlocale it may lead to a crash on some
> systems
> (Windows and at least some versions of OS X). This would not normally be an
> issue if the application ensures that only a single thread uses API's that
> access global locale data. The GUI thread usually needs to use the C and
> or C++
> API's for localization so if the serialization code uses those same API's
> then
> that has to be taken into consideration. If the serialization code is
> executed
> in the GUI thread as it is in Ardour currently then there is not a thread
> safety issue. I think the only case currently where it may be an issue is
> when
> executing arbitrary code loaded via plugins/loadable modules in another
> thread.
>
>
> # Current issues?
>
> ## Round Trip Equality
>
> Round trip equality meaning to convert data from a binary format to a
> string/character representation for serialization and then back again will
> result in data that is considered to be equal to the value of the data
> before
> conversion.
>
> This is mostly relevant for floating point values that (in the worst case)
> need
> to be stored/serialized with a certain level of precision (9 decimal
> digits for
> float, 17 for double) so that the state restoration/deserialization
> results in
> an equality test passing. i.e
> double d = random_double();
> assert (d == string_to<double> (to_string<double> (d)));
>
> Most of the floating point values that are stored as part of Ardour's state
> data do not use enough precision to pass a round trip equality test. I
> think
> this is sub-optimal.
>
> ## Complexity
>
> Using API's like snprinf/sscanf that rely on the global locale mean using
> LocaleGuards to ensure that conversions like float <-> string are
> performed in
> the "C" locale. You have to remember to add a LocaleGuard in the scope
> that the
> conversion occurs as you can not be sure of the current setting of the
> global
> locale. There have been a number of bugs fixed in Ardour caused by issue
> over
> time.
>
> If a C++ class like std::stringstream is used to perform conversion then
> you
> would normally have to ensure that the global C++ locale was set to "C" for
> both float and integer conversions as decimal grouping may also be
> performed
> for numeric formatting of integers with the C++ API.
>
> ## Thread Safety
>
> As mentioned above, changing the global C and C++ locales is not thread
> safe. As
> most DAW's including Ardour support loading plugins and we don't control
> what
> code is executed, it is possible (although unlikely) for a plugin to
> access the
> global C or C++ locale while we are changing the global locale and cause a
> crash.
>
> There are options on some platforms to have per thread global locales but
> it is
> not a portable solution (or a very good one IMO). There are also functions
> on
> some platforms like strtod_l and strtof_l that take a locale as a
> parameter and
> are thread safe.
>
> ## Error checking
>
> If you look at the use of the C and C++ API's for serialization in the
> Ardour
> code base there is a lack of error checking. So any solution would
> hopefully
> improve this situation and allow for any errors to at least be identified.
>
> ## Performance
>
> Not a major issue but changing the global locale on some platforms (at
> least
> Windows) can be a slow operation and some changes have been introduced in
> the
> past to mitigate this.
>
>
> # Implementation
>
> The changes introduce a simple C++ based API in the form of
> PBD::to_string()
> and PBD::string_to() template functions and conversion functions for the
> builtin types(bool,float,int32_t etc) which are declared in the
> pbd/string_convert.h header.
>
> The current implementation of the functions used to convert the builtin
> types
> use snprintf/sscanf for integer types, the glib function g_ascii_strtod and
> g_ascii_dtostr for float/double and simple code for bool.
>
> To simplify creating template specializations of to_string/string_to for
> enum
> types registered with the PBD::EnumWriter class, a simple macro
> DEFINE_ENUM_CONVERT is defined in pbd/types_convert.h. This replaces the
> need
> for i/ostream& operator>>/<<() for instance in ardour/types.h
>
> Many of the changes are related to changing the interface of XMLNode for
> setting properties which is used by much of the serialization code. The
> current XMLNode::property/add_property interface is mostly replaced by a
> new
> template based XMLNode::get/set_property interface that takes the name of
> the
> property and a reference to the variable to be de/serialized to/from and
> then
> internalizes the type <-> string conversion.
>
> For instance in the case of setting an XMLNode property:
>
>   snprintf (buf, sizeof (buf), "%.12g", _min_yval);
>   root->add_property ("min-yval", buf);
>
> Becomes:
>
>   root->set_property ("min-yval", _min_yval);
>
> Where XMLNode::set_property is defined as:
>
> template<class T>
> bool set_property (const char* name, const T& value)
> {
>   std::string str;
>   if (!PBD::to_string<T> (value, str)) {
>     return false;
>   }
>   return set_property(name, str);
> }
>
> And for getting the same property from an XMLNode:
>
> if ((prop = node.property (X_("min-yval"))) != 0) {
>   _min_yval = atof (prop->value ().c_str());
> } else {
>   _min_yval = FLT_MIN;
> }
>
> Becomes:
>
> if (!node.get_property (X_("min-yval"), _min_yval)) {
>   _min_yval = FLT_MIN;
> }
>
> Where XMLNode::get_property is defined as:
>
> template <class T>
> bool get_property (const char* name, T& value) const
> {
>   XMLProperty const* const prop = property (name);
>   if (!prop) {
>     return false;
>   }
>
>   if (!PBD::string_to<T>(prop->value (), value)) {
>     return false;
>   }
>   return true;
> }
>
> Which has the advantage in that it is checking that the property exists and
> that the string conversion was successful as you very rarely need to do
> them
> separately.
>
> The other main type of change in the branch is to replace all direct and
> indirect (via PBD::string_compose) use of std::stringstream for
> serialization
> so that the correct numeric conversion is not dependent on a
> LocaleGuard/global
> locale.
>
> As std::i/ostream derived classes by default use the global locale for
> numeric
> conversion and currently the global C++ locale is always set to use the "C"
> locale via PBD::LocaleGuard::LocaleGuard (and not reset in dtor).
>
> If it is intended that the global C++ locale is set to the users preferred
> locale and is used for conversion of numeric data to strings for display
> then
> perhaps the string_compose template should be renamed/aliased as something
> like
> i18n_compose/i18n_fmt to remind the user of the API that any numeric data
> types
> that are passed as arguments may be converted to a localized format and
> the API
> is not appropriate to use for serialization.
>
> In terms of testing performance of the new code I haven't done much
> conclusive
> testing but from what I have done on Linux there is a reduction in the time
> required to save a session of about 25%.
>
> # Examples of Session/Config file changes
>
> ## Enums are saved as literal strings rather than numbers.
>
> So for instance:
>
>   <Option name="monitoring-model" value="2"/>
>   <Option name="meter-type-master" value="16"/>
>
> Becomes:
>
>   <Option name="monitoring-model" value="ExternalMonitoring"/>
>   <Option name="meter-type-master" value="MeterK20"/>
>
> ## The string representation of all boolean values is the same
>
> At the moment this is yes/no which I think helps distiguish boolean values
> from
> numeric but it could easily be changed so they are 0/1 or whatever.
>
> So for instance:
>
>   <Option name="use-region-fades" value="1"/>
>   <Option name="use-transport-fades" value="1"/>
>   <Option name="use-monitor-fades" value="1"/>
>   <Option name="native-file-data-format" value="0"/>
>   <Option name="native-file-header-format" value="1"/>
>   <Option name="auto-play" value="0"/>
>
> Becomes:
>
>   <Option name="use-region-fades" value="yes"/>
>   <Option name="use-transport-fades" value="yes"/>
>   <Option name="use-monitor-fades" value="yes"/>
>   <Option name="native-file-data-format" value="FormatFloat"/>
>   <Option name="native-file-header-format" value="WAVE"/>
>   <Option name="auto-play" value="no"/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ardour.org/pipermail/ardour-dev-ardour.org/attachments/20170419/d350d147/attachment-0001.htm>


More information about the Ardour-Dev mailing list