TODO

Tag Query Expressions & Syntax

The following section describes how tagged files and software products can be queried inside the GRID and for what purpose.

Example Queries

  • Category "Games": (gamevendor -development -productivity) (gameapplication)
  • NFC export: (osexecutable -infected) (oslibrary -infected)
  • NFC pollution check: osfile infected

Concept

The access layer offers multiple SOAP and REST interfaces that either directly or in-directly use the tags applied to packages or files to return some sort of results. These results vary from complete file lists to a answer on the question whether a file is known good or not.

The technical concept behind tagging is to use full-text searches on tag fields that are filled by the GRID processing units. From the perspective of a client using the API, such searches are either assembled behind the scenes when using simple tag based service methods, or they can be assembled manually by following the defined "Tag Query Syntax, Version 1.0" to build a query expression that pretty much equals what can be used in common search engines.

For more details on the use cases, see the examples section below.

Syntax Definiton - "Tag Query Syntax, Version 1.0"

A common tag query uses a simple search gramar also common for most search engines:

  • Tags are delimted by whitespaces
  • Adding "-" in front of a tag negates the search (the tag must not be included in results)
  • Groups can be built by adding braces arround tags. Multiple groups are combined with OR (= the same as if multiple individual queries would be combined in a union).
  • Tags in one group are combined with AND (all have to be included (or excluded when prefixed with "-"))

Example Query

(mustbe1Group1 -mustnotbe1Group1) (mustbe1Group2 -mustnotbe1Group2 mustbe2Group2)

Note: The syntax is designed to be easy to parse and translate to a corresponding SQL query as well as powerfull enough to satisfy the needs of building flexible queries.

Required / WellKnown Tags

As tag queries are used by some predefined methods that identify files and packages by tags, the following list summarizes the tags that are required by the access layer to function correctly:

  • "infected": Has to be applied to all packages and files that should be reported as files containing a virus, grayware or anything else that we label potentially dangerous. The direct depdendency on this tag is located inside the REST methods "isKnownGood".

    Note: The tags listed above are not only a defined requirement from the access layer towards the tagging process inside the processing units inside the GRID. It's also part of the public interface that is offered to the outside and must not change suddenly as external dependencies may exist after the access layer was released.

Detailed Examples on the usage of Tag Queries

Use in "Categorization"

The categorization document is summarized inside the FAQ document that can be found on the main site.

Use in "Identification of Files"

For reprocessing

TODO

For exporting them to NFC or similar services

TODO