General Questions on the API usage
Categorization specific Questions
Harvesting specific Questions
The Access Layer is implemented using "Java Enterprise Edition" technology. All interfaces that are published as SOAP or REST do have direct Java delegates that can be viewed inside the "Public ApiDocs".
WSDL and WADL documents are built dynamically out of the interface definitions and therefore the "Public ApiDocs" do always contain the latest and most accurate information on the interfaces offered for the web.
IMPORTANT NOTE: Interfaces that conform to the "Profile Level 0" can be found inside the packages " com.trendmicro.grid.acl.l0 " and " com.trendmicro.grid.acl.l0.datatypes ".
More information on this topic can be obtained inside the module documentation for "WS Server Api".When starting TinyJEE, the application server prints all servlet entry points to stdout. By opening these paths inside a Browser most information can easily be gathered.
For your reference, here's an incomplete list of WSDL and WADL addresses:
A complete list of WADL and WSDL urls including a quick reference can be found at WSServer Api .
Look at the interfaces:
REST: - http://host:port/rs/level-0/files/isKnownGood/{sha1} - http://host:port/rs/level-0/files/isKnownGood/{sha1}-{md5} SOAP: - http://host:port/ws/level-0/internal/files - isFileTaggedWith(FileIdentifier, tags) - getFileInformation(FileIdentifier)
The information provided by this interfaces typically contains update times, reference counts and tags. The later can be used to decide whether a file is of a specific type or "infected" by a virus.
Note: REST services may return with HTTP 204 if the implementing service method returns a 'null' value. Please check the apidocs for more information when 'null' values are returned.The terminology for "software product" is called "package" inside the interfaces. A "Package" is a term that is applied to any grouping of files including "software products".
This requires 2 steps:
SOAP: - http://host:port/ws/level-0/internal/files - isFileTaggedWith(FileIdentifier, tags) - getFilesTaggedWith(tags, ...) - getMatchingFiles(tagQueryExpression, ...) - isPackageTaggedWith...(..., tags, ...) - getPackageNamesTaggedWith(tags, ...) - getMatchingPackageNames(tagQueryExpression, ...)
Java Client Example:
int pageNumber = 0; ListPage<FileIdentifier> fileIds = null; do { fileIds = fileService.getFilesTaggedWith( new String[] {"systemlibrary"}, pageNumber); pageNumber = fileIds.getPageNumber() + 1; //TODO: Do something with the IDs. } while(!fileIds.isLastPage());
Look at the interfaces:
SOAP: - http://host:port/ws/level-0/packages - Parents: getReferencingPackageNames...(...) - Children: getFilesContainedInPackage...(...)
There are no general limits regarding the number of recursions, linked files, amount of packages etc.
Names, Metadata, Tags and some more values do fall under certain limits that are enforced by the access layer. The following snipped shows the compiled-in (and configurable) limits:
/** * Defines the maximum amount of characters that may be used in the various * name fields (Note: The size was chosen based on the maximum index size * supported by MSSQL, it should not be increased). */ int MAX_NAME_LENGTH = 432; /** * Defines the maximum amount of characters that may be used in the various * display name fields. */ int MAX_DISPLAY_NAME_LENGTH = 256; /** * Defines the maximum amount of characters that all tags may consume when * serialized to a whitespace delimited string (hard limit, compiled-in). */ int MAX_TAG_STRING_LENGTH = 1024 * 1024; /** * Defines the actually applied limit for the tag string length. * Configurable; via command line parameter "-Dgacl.max.tag.string.length=value". */ int TAG_STRING_LENGTH = Math.min(MAX_TAG_STRING_LENGTH, Integer.getInteger("gacl.max.tag.string.length", 64 * 1024)); /** * Defines the maximum amount of bytes that the serialized metadata element * may consume. (Note: "serialized metadata" means the length of the UTF-8 * encoded XML representation excluding any unnecessary whitespaces) * (hard limit, compiled-in). */ int MAX_SERIALIZED_METADATA_LENGTH = 1024 * 1024; /** * Defines the actually applied limit for the serialized metadata length. * Configurable; use command line parameter "-Dgacl.max.serialized.metadata.length=value". */ int SERIALIZED_METADATA_LENGTH = Math.min(MAX_SERIALIZED_METADATA_LENGTH, Integer.getInteger("gacl.max.serialized.metadata.length", 256 * 1024)); /** * Defines the actually applied limit for incoming batch request. * <p/> * If the limit is reached, web-services will not not continue parsing a * request and returning an error in order to protect the server from DOS attacks. * <p/> * Configurable; use command line parameter "-Dgacl.max.incoming.request.batch.size=value". */ int MAX_INCOMING_REQUEST_BATCH_SIZE = Integer. getInteger("gacl.max.incoming.request.batch.size", 100); /** * Defines the maximum amount of characters that may be used in the remote * (public) URI field (using ASCII URI encoding). */ int SOURCE_MAX_REMOTE_URI_LENGTH = 2 * 1024; /** * Defines the maximum amount of characters that may be used in the internal * URI field (using ASCII URI encoding). */ int SOURCE_MAX_INTERNAL_URI_LENGTH = 1024; /** * Defines the maximum amount of characters that may be used in the content * tag field. */ int SOURCE_MAX_CONTENT_TAG_LENGTH = 128; /** * Defines the maximum amount of characters that may be used in the domain * name field. */ int SOURCE_DOMAIN_MAX_NAME_LENGTH = 256;
In addition, processing has a per session limit to avoid out-of-memory errors:
// The max prepared source count is MAX_PREPARED_JOBS * MAX_SOURCES_PER_JOB per session. // // This limit applies only to jobs & sources that were prepared but not yet started. // Exceeding the limit will log and discard the eldest job or source but it will not // cause an error. Therefore it is in general safe to prepare jobs and not further // process them as the data is either cleaned when the session times out or when more // than the declared limit of jobs are prepared. // However it's the responsibility of the client to not exceed the limit with jobs or // sources that must not get lost. // // Attention, every source may consume several KB of RAM, depending directly on the // size of the metadata element. Adjust these values with care and only when needed. /** * Defines the actually applied limit for the amount of prepared jobs within a single * session. * Configurable; via command line parameter "-Dgacl.max.prepared.jobs=value". */ public static final int MAX_PREPARED_JOBS = Math.max(1, Integer.getInteger("gacl.max.prepared.jobs", 256)); /** * Defines the actually applied limit for the amount of sources that may be assigned to * a single job. * Configurable; via command line parameter "-Dgacl.max.sources.per.job=value". */ public static final int MAX_SOURCES_PER_JOB = Math.max(1, Integer.getInteger("gacl.max.sources.per.job", 16));
Views
Categories are generally organized in tree-structures called views. Views can be regional to satisfy regional differences in the classification for an item falling under a certain category.
Categories
Categories are represented by "category definitions" that consist mainly of a name and a tag query expression that is used to query tagged packages or files from the GRID. At no time a category is directly assigned to a "package" or "file", instead the wiring is always performed through evaluation of the "tag expression".
Example - Category "Games":
Category { name = "Games"; tagQueryExpression = "(gamevendor -development -productivity) (gameapplication)"; }
Tag Query Expressions
Tag query expression can be used to get files and packages names that satisfy the given query. Whenever categorization is used, either packages names or files must be queried at some point in time to support any further operation.
Example query:
(mustbe1Group1 -mustnotbe1Group1) (mustbe1Group2 -mustnotbe1Group2 mustbe2Group2)The syntax is designed to be easy to parse and translate to a corresponding SQL query as well as powerful enough to satisfy the categorization needs.
Note: Details on the syntax definition can be found inside the module WSServerApi under the pageTag Queries.
A view can be obtained via a SOAP interface and is generally speaking a tree of category definitions:
SOAP: - http://host:port/ws/level-0/categories - String[] viewName getCategoryViewNames(Locale) - CategoryView getCategoryView(Locale, String viewName)
There are generally speaking 2 possibilities, one involves some client side logic, the other is completely GRID controlled.
GRID controlled variant:
Java Client Example:
Locale locale = Locale.getDefault(); Category games = categoryService.getPlainCategory(locale, "Games"); int pageNumber = 0; ListPage<String> packageNames = null; do { packageNames = packageService.getMatchingPackageNames( games.getTagQueryExpression(), games.getTagQueryExpressionVersion(), pageNumber); pageNumber = packageNames.getPageNumber() + 1; //TODO: Do something with the names. } while(!packageNames.isLastPage());
Custom variant:
Generally speaking the custom variant is pretty much the same, however it advertises that a client application is capable of understanding the query format for categories and is also aware of the underlying "tag pool" to build custom categories.
To query details on a source URL, the Access Layer offers a couple of interfaces with some being more lightweight than others.
Withing the normal Harvesting workflow, lot's of queries are issues against the GRID; retrieving only as much information as required is highly recommended.
Start with the following interfaces that are especially related to source URLs, processing and file information
SOAP: - http://host:port/ws/level-0/internal/sources - (All methods)]] - http://host:port/ws/level-0/internal/processing - (All methods)]] - http://host:port/ws/level-0/processing - getFileInformation(fileId)
The Access Layer offers one particular data type called "SourceInformation" that mainly consists of the values "Last-Modified" and a custom "ContentTag" (e.g. usable as ETAG) that can be used inside the harvester to decide whether a remote URL should be processed further.
Java Client Example:
URI remoteURI = URI.create("http://remoteHost/path/to/file/to/harvest"); HttpUrlConnection huc = remoteURI.toURL().openConnection(); SourceInformation info = sourceService.getSourceInformationForURL(remoteURI); if (info != null) { String contentTag = info.getContentTag(); if (contentTag != null) huc.setRequestProperty("If-None-Match", contentTag); Date lastModified = info.getLastModified(); if (lastModified != null) huc.setIfModifiedSince(lastModified.getTime()); } if (huc.getResponseCode() == HttpUrlConnection.NOT_MODIFIED) continue; // continue with next
Information that is specific to the source URI can be exchanged with the processing modules by converting it into Metadata packages as defined inside the module Metadata Handler.
Once converted, the Metadata package can be attached to the source when adding or updating it inside the GRID system. See more details below..
Note: Source information can be exchanged in both directions as interfaces allow to read & write metadata on sources.When dealing with information that is common to a complete domain (e.g. microsoft.com), use the data type "SourceDomain" and look for the domain related retrieval and update services inside the SOAP service: http://host:port/ws/level-0/internal/sources
Technically, the way it works, is similar to sources.
Sending a single file or multiple linked files for processing, always involves 5 major steps:
Java Client Example:
URI remoteURI = URI.create("http://remoteHost/path/to/setup.exe"); File localFile = new File("setup.exe"); String contentTag = "xyfre3sfds442"; Metadata metadata = ...; // Step 1 UUID jobId = processingService.prepareJob(); // Step 2 SourceIdentifier sourceId = processingService.assignProcessSource( jobId, remoteURI, new Date(localFile.lastModified()), contentTag, metadata); // Step 3 URL transferURL = processingService.assignContentToProcessSource(sourceId); HttpUrlConnection huc = transferURL.openConnection(); huc.setDoOutput(true); huc.setMethod("PUT"); FileUtils.copy(localFile, huc.getOutputStream()); // Step 4 processingService.startJob(jobId);