Naming Elements

The Access Layer APIs define several elements that contain unformatted display names and formatted unique names that are used to identify the elements by their names.

In particular the elements that this article refers to, are vendors, package families and packages. The following sections describe the use of formatted unique names inside the system as these names are a part of the public interface used by external parties.

In order to encode important package information and to ensure that names are unique inside the system, names follow a clear hierarchy allowing to identify where elements are belonging to, just by looking at the name.

The Access Layer performs active validation on the names before accepting requests to store final processing results. The validation is based on "Name Format" and "Naming Rule" as defined below.

Name Format

Format Definition

  • Lowercased: Names are all written in lowercase characters. (= Names are not case sensitive).
  • Reserved Character: Single name artifacts may not contain the character ':'. This character is reserved for building hierarchical names using the same syntax as also used for building URNs.

    Escaping the delimiter is not allowed in order to ensure implementations that deal with names can be kept as simple as possible. In case of ':' is part of the source name it has to get substituted by a replacement character.

  • Character Set: Names may be assembled out of the full Unicode character set with no further limitations besides the 2 mentioned above.

Example Names

microsoft
microsoft:office
microsoft corporation:windows:7.0-3912:windows-win7:x86:en_US
novell:evolution:2.30.1.2:linux-ubuntu-karmic:x86:en_US

Example Name Interpretation

The information encoded in the example name "novell:evolution:2.30.1.2:linux-ubuntu-karmic:x86:en_US" can be interpreted as:

  • Product Vendor: novell
  • Product Name: evolution
  • Product Version: 2.30.1.2
  • OS: linux (flavour: ubuntu-karmic)
  • CPU Architecture: x86
  • Language: EN
  • Country: US

Note: In order to display the extracted vendor and product names on the screen, they need to get translated to display names first.

Notes on the Name Format

  • Names, in particular package names, may be queried using category, vendor, package family or tag based queries.
  • Based on the hierarchical nature of named elements, the names can be used to build tree structures that may be used to display and select the elements that are of interest.
  • Tree structures may be assembled dynamically by first getting a list of vendors then package families and afterwards the packages below a family.

    Check the public API methods:

Naming Rule

Naming a Vendor

Vendor names are generated following the default naming format and normalization rules as defined inside this document.
The vendor name information can be read out of the static file properties of a package file or alternatively from source that a package was coming from.

In case of no vendor information is available, the reserved vendor name "vendorUnknown" is used.

Naming a Package Family

A package family summarizes a collection of similar package contents (e.g. a product family like "Microsoft Office"). The naming follows the same basic rules as for vendors but it is assembled using the pattern: ${vendor}:${packageContentName}.

Name Artifacts Explained:

  • ${vendor}: The name of the vendor that is linked against the package family.
  • ${packageContentName}: The name of the package content which is the name of a "Product Family" for the case that the package refers to an installer package of a top-level product.

Naming a Package

All package names start with the name of their assigned package family and use the following pattern to further identify the package they name: ${familyName}:${version}:${operatingSystem}:${cpuArchitecture}:${isoLocale}

See the name examples above for an overview how a valid name can look like using this naming pattern.

Name Artifacts Explained:

  • ${version}: Specifies the version string as it is read out of the file properties or source information delivered by harvesters.

    Note: As version definitions are vendor specific absolutely no clear assumption can be made on the format of the version string. Client implementations may however get good results by using a natural language sort algorithm when sorting versions in ascending or descending order.

  • ${operatingSystem}: The operating system name follows the sub-pattern "${os}-${flavour}-${subFlavour}". In case of the operating system is specified, at least the value of ${os} must be present. See the data table below for a definition of possible values.
  • ${cpuArchitecture}: The CPU architecture describes what CPU types are supported by the package. Under the rare condition that multiple CPU types are supported, multiple type identifiers may be delimited by '-'. In case of the binary consists non CPU bound byte code, the 'universal' keyword may be used as CPU identifier.
  • ${isoLocale}: The locale specifies the language or language and country information using the 2 letter language and country identifiers defined inside the standards ISO 639 and ISO 3166. The locale is assembled using the sub-patterns ${language}-${country} or ${language}.

    Note: The locale should be omitted, in case of a package is multilingual.

Omitting Artifacts: Name artifacts can be omitted from right to left which means that it is allowed to describe an element with less details. The ${version} artifact may never be omitted.

Order Matters: The order of the name artifacts does matter. It is not allowed to use a diverting order than the one specified inside the pattern. This does also mean that it is not allowed to omit artifacts in the middle.

Delimiters: Some name artifacts use delimiters to define a local hierarchy of detail. All name artifacts besides version and familyName support the characters '-' or '_' as delimiter characters for building a local hierarchy. As an example the OS names "windows" and "windows-winxp" would both specify that the named package is depending on the Windows OS, however the latter definition is more exact than the first.

Predefined Names

In order to create predictable names, some values are predefined and validated by the Access Layer to ensure no false names can be stored inside the CoreDB.

Operating Systems

A possible incomplete list of well known operating systems and flavours. Please note that only the bold names are validated. No guarantee is made on the flavour definitions.

NameDescription
windowsAll flavours of Microsoft Windows
windows-winxpWindows XP
windows-winmeWindows Me
windows-winntWindows NT
windows-vistaWindows Vista
windows-win7Windows 7
linuxAll flavours of Linux
linux-ubuntuUbuntu Linux
linux-centosCentOS Linux
linux-debianDebian Linux
linux-rhelRedhat Enterprise Linux
linux-suseSuse Linux Enterprise
macosxAll flavours of MacOSX
macosx-tigerMacOSX Tiger
macosx-leopardMacOSX Leopard
macosx-snowleopardMacOSX Snow Leopard
bsdAll flavours of BSD
universalUsed if the binary is not OS dependent
CPU Architecture
NameDescription
x86Intel 32bit x86 compatible CPU
x64AMD64 / Intel EM64 compatible CPU
ppcIBM PowerPC compatible CPU
armARM / Strong ARM compatible CPU
universalUsed if the binary is not CPU dependent

Name Normalization

In order to achieve better matching results, names are normalized prior to applying them. Multiple levels of normalization may exist, some require no additional input whereas some may be implemented using either intelligent methods or static mapping tables.

Basic Normalization

Basic normalization takes care to normalize only the differences that do not change the meaning of the name. In particular white spaces are normalized to a single space, white spaces at the beginning and at the end are stripped and the name is converted to lower case.

Example Code
String normalizedName, displayName = " Microsoft  Corp. ";
normalizedName = displayName.replaceAll("\\s+", " ").replace(':', ';').trim().toLowerCase();

System.out.println("|" + normalizedName + "|");
// Prints: |microsoft corp.|

Note: In addition to basic normalization, this code example also removes the reserved character ':' by replacing it with ';'.

Aliasing

Aliasing is a far more complex variant of normalization using a lookup table of alias => name value pairs that allow to handle exceptional differences in names that were read out of static file properties.

As aliasing changes names depending on the used mapping table, the aliases can't be kept only inside the GRID as external systems need to be aware of changes on data that they already transferred and stored locally, outside of the GRID.

Important Note: Aliasing is not available in GRID 1.0.

An alias mapping table could look like this:

AliasTypeName
microsoft c.proposedmicrosoft corp.
microsoftruntimemicrosoft corp.
microsoft corporationappliedmicrosoft corp.
mindsoftinvalidmicrosoft corp.

The Types are defined as:

  • proposed: If implemented, some similarity comparison logic could automatically insert "proposed" elements that are sent for a manual review and either labelled runtime or invalid.
  • runtime: Specifies a valid alias that is applied to all new content but was not yet applied to existing content.

    External systems need to be aware of the runtime list and either apply it to the locally cached content or map the aliases against all names it handles.

  • applied: Specifies a valid alias that is applied to all new content and was in addition also applied to all existing GRID content.

    External systems have the choice to apply such aliases to their local cached content, continue to apply the aliases in runtime mode or re-transfer content from the GRID containing the applied aliases.

  • invalid: Marks aliases that are not valid and must not be added or applied.