Skip to content

start:

OpenOfficeConfiguration Concepts


Contents

Introduction

Some History
New Data Organization
Storage Transparency

Configuration Object Model

Value Elements
Group Elements
Set Elements

Configurations, Components and Views



Introduction


OpenOffice now contains a UNO based service for management of application configuration data. This document is intended to describe the basic concepts of the available OpenOffice configuration architecture. In the course of this, we will also explain the motivation and goals that led to the creation of this service and drove its design and implementation.

Configuration Data is data that describes the (persistent) state of an application or component. According to their usage patterns, there are at least three kinds of configuration data:

  • Static configuration: This is data that describes the configuration of the software and hardware environment. This is mostly fixed when a piece of software or hardware is installed or first used. Data may be set to precomputed values from a script, extracted by scanning the environment or solicited from the user by an interactive setup tool. Data will usually be changed only by a setup tool or script, when installing, uninstalling or reinstalling. (Sometimes a site administrator may wish to change such settings after the fact).
    Effectively the data will only be read and can be assumed constant throughout a session.

  • Static (explicit) settings: These are preferences that are explicitly controlled by the user. There is a dedicated UI to change these settings and the user does enter a special mode (e.g. by invoking a menu item) to use this UI. There must be a well defined default value for preference items,which is used when the user does not bother to change the behavior. It may be desirable to allow the user to restore that default state.
    Such data will be read much more often than changed . Changes will happen at well-defined instants in time. The state of running components which use these settings must also change at such times.

  • Dynamic (implicit) settings: These are settings that are also controlled by the user, but which the user does not change explicitly (at least during normal operation). This applies to application state that is persisted across application sessions without user interaction (e.g. open window positions or MRU document lists). Changes to data values occur continuously, but saving such state to persistent storage occurs at no particular time (other than at end of session). For some of these items an explicit UI may also be present (either to view the values in force, to add hand-crafted values or just to restore a 'clean' default state), but the UI will not be the primary way the data is changed.
    This data will be read and written with comparable frequency.

The additional catogory of 'dynamic configuration' data is not relevant for our discussion as it cannot readily be persisted. This term would apply to environment state that is dynamically checked for, so it really will be recomputed each time it is needed.

Some History

Formerly (in StarOffice up to version 5.x) configuration data was dispersed over a large number of files (named '.*rc' , '*.INI' or '*.CFG' ...) within the Office installation directories. It was simple for a developer to add new entries to such files, which led to an abundance of items with little documentation. Naming or representation of similar data became far from consistent across various components'. This way maintenance became more and more troublesome.

This multitude of files, which had to be read or written in toto when reading or modifying just a single item, also proved unsuitable for a distributed, net-centric computing environment (such as Sun(tm) ONE Webtop).

From a user perspective it was hard, if not impossible, to administrate settings for a group of users or a whole (networked) installation. The multitude of files and data formats involved (which included undocumented binary file formats) made it also difficult to repair (or even diagnose) corrupted configuration data.

New Data Organization

To overcome these problems, configuration data is now stored in a more organized and unified way.

Configuration data is organized hierarchically. The topmost level of this unified hierarchy are 'components' (or 'modules') - logical units that can be used or installed independently. Often they are associated with an actual program component. Each component can freely define the configuration data it needs. Data is structured by combining related settings into groups, thereby forming a hierarchical structure.

Configuration items cannot be added randomly, though, but must be defined in a configuration schema, which needs to be installed with the application or component. This schema defines the names, data types and structural relationship of configuration items. It can also specify default values for settings. Relevant information from the schema is accessible in the API either through explicit queries or implicitly.

We use an application-specific XML file format for schema descriptions. Schema entries can and should be documented within the schema definition. The schema language supports reusing partial structures as templates, to help achieve consistent naming and structuring for similar data.

Individual data items or whole structures may have default values. A meaningful set of default values should be specified in the schema. Some implementations may have other ways to specify (and maybe change) default values (e.g. by providing an environment for administrators to prescribe standard settings for users or user groups).

Storage Transparency

This hierarchy is accessed through a UNO service, which allows the underlying implementation to vary. In particular, we can now use various backend stores and adjust the format of the stored data accordingly. The schema information and initial data, in the format required by the backend, can be generated from the schema description (e.g using XSLT).

The initial implementation of this service for OpenOffice uses a file-based configuration store again. The hierarchical data is stored in XML files. The implementation also supports a configuration server to which it talks using an XML-based RPC protocol over a data stream implementation (initially sockets). There is an implementation of a configuration server that serves this protocol and which is used in Sun (TM) Webtop. Currently (early 2001) this server is not available as Open Source and is therefore not part of OpenOffice.org. [NOTE: The author has no information as to whether and when this might change].

It is expected that further backends will be added in the future. This can be done by (a) creating a new implementation of the UNO services described in this document, (b) implementing a server that serves the existing XML protocol or (c) adding a new storage implementation to the existing implementation (this may include a 'plugin' interface for backend implementations). In each case there is the problem of configuring the configuration store: How to specify which storage should be used (at either install or run time).


Configuration Object Model


As mentioned above, configuration elements have to be specified in a configuration schema. The configuration schema describes a hierarchical data structure. It offers several kinds of elements, some of which serve to combine related elements into logical groupings, while others hold the actual data.

Elements have a well-defined Type. The Type of a data element corresponds directly to the type of data it can hold, while the Type of a structural element depends (recursively) on the Types of the Child elements it contains and the way those are combined. A Type may be described in the schema as a separate named entity, called Template, or it may be implicit in the definition of an element. Templates allow reusing a type in multiple places in a schema and dynamic creation of elements of a given type.

Generally the restrictions on types are designed in a way that the schema describing a part of the configuration can be reconstructed from an instance of that schema (or at least that there is a reasonably efficient representation of the instance that allows reconstructing the schema).

The following kinds of elements can be used to build a configuration schema:

Value Elements

Value elements hold actual data values. Values must have one of a number of supported data types.

The following Basic types are supported:

string

Plain Text (Sequence of [printable] Unicode characters)

boolean

Boolean value (true/false)

short

16 - bit integer number

int

32 - bit integer number

long

64 - bit integer number

double

floating point number

binary

Sequence of uninterpreted octets


Please note that binary values should be used only for small chunks of data that cannot easily be repreesented otherwise. The configuration service may not handle large BLOBs efficiently.

Also supported are Lists (ordered sequences) of each of the Basic types. [NOTE: This means that the lists must be homogeneous - mixed lists are not supported]. The configuration treats lists as atomic data types - there is no support for accessing lists elementwise.

In addition a value element may be characterized as nullable (or not nullable). If it is nullable it may assume the special value NULL, which indicates the absence of any particular value. NULL is different from any legal value. In particular a string/list/binary value having zero length is not NULL. Nullable values are sometimes also called optional.

An additional value type any can be used within the schema (particularly within templates). An element having this type initially has value NULL. Such an element can be set to a non-NULL value of any of the types enumerated above. It thereby acquires the type of that value. Further changes to that value cannot change the type any more. For example when an element having type any is set to a value of type int, it becomes an ordinary element of type int and will so remain.

A value element may have a default value (which must be of the appropriate type).

To support internationalization, value elements may be marked localized in the schema. Localized elements (and only localized elements) may assume different values for different locales. This is particularly important for text strings intended for UI display, but may apply to other settings as well. The locale is one of the parameters that is used to select a configuration.

Group Elements

Group elements are the basic building block for organizing related data items. Top-Level 'component' elements (see below) are always group elements. A Group element groups together a number of child (aka member) elements. The member elements are identified by name. The Type of a Group element encompasses the number of members in the group, their names and their Types.

Member names must be unique within a group. Each element may be of a different type, but that type is fixed by the schema and cannot change (except as described above for values of type any). To change a Group element means to change one of its members. Any kind of element can be member of a a Group. By nesting Groups within Groups within Groups (etc...) hierarchical data structures can be described.

A Group assumes its default state if all its elements assume their respective default states.

Set Elements

Set elements provide a way to describe extensible or dynamic parts of the configuration. A Set is a dynamic container of elements. All elements of a Set must have the same Type. That Type must have a name, so it is usually described by a Template in the schema. The name of the element-type is given (and fixed) in the schema.

In current specification (and, of course, in the initial implementation) elements of a set are identified by name. The name must be unique within a container.

The Type of a set is determined by its element type alone. The number and accessors (=names) of elements may change dynamically.

A set can be changed by removing existing elements and by inserting new elements. Elements to be inserted must be of the proper Type. Such elements can be obtained either by removing them from the container (or a similar one) or from an appropriate factory. In our API the Set object itself can serve as the factory.

Other modifications to the set can be described as combinations of the preceding: replacing an element amounts to inserting a new element after removing the prior element of the same name; renaming an element is equivalent to first removing the element and then reinserting it under a different name.

Sets may be used to build recursive data structures, if their element-type is a structural type itself, which directly or indirectly contains a set having the same element-type.

Set elements are the one single area in the design of this component and its services, where possible extensions and variations as well as new use cases and requirements emerged rather late. It can therefore be expected that the specification will evolve so as to add to this concept and that future implementations will explore some of the variations left open by the current specification.

One such idea is to support an extensible element-type (for elements that are Groups). This would mean that multiple Group Types that derive (by extension) from a common base Type are accepted as elements. In another vein it may be possible to add other types of containers in the future, where elements are accessed by position or by simple enumeration. One area where the current implementation may change rather soon is the handling of restrictions on element names.

Sets may have default elements (which in turn must have a default state). If no default elements are given in the schema, the defaults state of a set in empty (no elements). A set assumes its default state when it contains its default elements (and only those) and if those default elements in turn assume their respective default states.

Configurations, Components and Views


At the outermost level the configuration is divided into components. A component is a collection of configuration data, that may be in use independently of other components. Often a component is associated with a particular software component, module or product. Some components may be intended to be shared between several cooperating pieces of software.


A component is characterized by a component schema, which contains the component's configuration schema. The configuration schema forms a single hierarchy, the root element of which must be a Group element. The component schema also contains the component's own template schemas. A component schema may depend upon another component schemas, which is then required to be installed when the dependent schama is installed. These dependency relationships must not be circular. Currently this dependency can either be purely semantic (i.e. any client of the dependent component will also need the other settings) or it can be utilized in the schema to import and (re)use template schemas from the other module.

A set of configuration trees covering all configuration schemas that are in effect in a given context is called a configuration. Implementations may restrict the set of components that are part of a particular configuration depending on parameters that define the context. Such parameter may include user identity and rights, locale and more. (In an implementation that supports multiple users, it therefore makes sense to speak of a particular user's configuration.)

A configuration also may specify access restrictions to parts of the configuration data. Currently such restrictions may specify that an element of the configuration is read-only. If a read-only element is a structural element (i.e. it has child elements) then its children will be read-only in turn (and so will be all its descendants, recursively). A read-only element may be considered to assume its default value invariably. This may make sense either for static configuration data (which should not be changed by user code) or in an environment where defaults may be prepared by an administrator, who may wish to enforce adherence to those common settings. How access restrictions are defined and stored, as well as the more general issue of how configurations are parameterized and selected, is mostly dependent on the storage implementation used.

A complete configuration may be quite large and it will usually contain many parts that aren't used at all during a given session. Clients of the configuration management services therefore don't access the configuration as one huge structure, but select a view of the configuration, that contains only the needed subset of all elements. The current implementation allows requesting views of specific tree fragments (given by their root element) that may also be restricted in (nesting) depth. It also distinguishes between read-only views and updatable views. Possible future extensions of this concept include views whose elements are selected by a database-like query or which may span multiple components.


Author: Jörg Barfurth ($Date: 2004/10/07 08:27:38 $)
Copyright 2000 by Sun Microsystems, Inc.. All Rights Reserved

pyright 2000 by Sun Microsystems, Inc.. All Rights Reserved.