Module - Jpa Datasource (Jpa Datasource)

Implements generic database bindings using the JPA technology.

About

This module implements nearly the entire DB related operations using JPA without the use of any DB specifc operations (no native SQL queries, only standardized JPQL queries are used).
As a result 99% of the DB functions are not bound to the underlying DB server technology and this allows to test nearly the entire persistence functionality using a Unit test suite that runs on an embedded HSQL2 DB. (see Reports)

The only areas where DB operations get specific is the full-text searches that are needed to query files or packages by tags. These specific operations are implemented inside the modules HSQL Datasource and MSSQL Datasource together with the code that is required to attach the specific database to JPA.

The paragraph above was true for versions prior to 1.2.3, see next section how tag queries were made DB independent

Tag Queries & Local Fulltext Index

Starting with version 1.2.3 this module contains 2 generic tag query implementations that support different levels of indexed access to tagged elements using tag matching API methods (= methods containing "getMatching*" in the API method name):

Implementation Description
SearchTagQueryProvider Implements a tag query provider that retrieves results from a local fulltext index that is created and queried using Hibernate Search (backed by Apache Lucene).

This provider uses Hibernate Search to build a DB independent local fulltext index of a subset of information that is required to support tag matching queries that do not require the DB for searching. Thus all information required for search and result building is stored within the local fulltext index.

When enabled this provider creates a local FT index below the resource folder and registers an incremental update task that runs every 5 minutes. Indexing is performed by querying changed files and packages from the core DB and place them into a local index. This process is performed from most to least recent entries using an internal persistent processing queue to survive restarts.
For the first time a new ACL is installed this process may take long to index all existing entries. See logs and JMX exports to check to progress (relevant log IDs are start:TMACL-02130, subrange-processed:TMACL-02120, finished:TMACL-02140, errors:TMACL-02090). During the indexing process, queries return the results of already indexed sub-ranges where a sub-range covers a time range of 6 hours measured by "CREATED" for existing content and "LAST-PROCESSED" for newly added content.

This provider is NOT enabled by default. To use it edit "startup.params.(sh|bat)" and make sure the system property "-Dgacl.use.local.fulltext-index=true" is set.
Please be aware of the fact that when using this provider, the local storage space requirements will grow roughly towards multiple tens of GBs in addition to the normal space requirements depending on the size of the core DB.

If the index got corrupted or re-indexing should be forced it is safe to delete the folders "fulltext-index" and "fulltext-index-queue" located below the folder "resource" when the service is stopped. Forcing re-indexing without deleting the existing index can currently only be triggered using the methods that are exported via JMX. Under normal circumstances re-indexing is not required to be triggered manually.
GenericTagQueryProvider Implements a generic tag query provider that acts as the default provider when no other provider is enabled.

This provider uses JPA's JPQL to be database neutral and implements tag queries by scanning for matching tags using LIKE. Thus no index will be used on tags, however an index is used for limiting the scanned rows by date.

For most tags the cardinality of an index is low as many tags can be found in almost all rows. Indexed access to tags is most useful if either all information can be read purely out of the index or when the requested tags exist only for a small subset of rows. Therefore this generic provider performs well for most queries especially when used with small date ranges and small result sets.

Using this provider may however not be the best solution when tag queries need to be more scalable (e.g. produce no DB hit) or are executed without date range limitation.

Spring

This module uses Spring Beans implementing most of the repository APIs defined inside the module Datasource Api. Implementations are detected by default (prefix is "jpa") and made available inside the repository-selection.[mode].xml files.

Transactions and DB Connections

Please note that all DB operations are performed in a transaction safe manner even without a declarative usage of COMMIT and ROLLBACK. This is the case as all Spring Beans are wrapped in auto-proxies that deal with transaction borders and connection retrieval and release.

A call to a method inside this module automatically gets a DB connection from the pool and commits and releases it on a normal method termination. Throwing any exception inside the managed code will cause a rollback (even if the method exits normally).

The complete toolchain that is responsible for the JPA 2.0 functionality, is: Spring 3.0, Hibernate 3.5 and TinyJEE 1.0.

Advanced Storage Options (System Properties)

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/**
 * Specifies the default list of meta data keys that are tried to extract
 * the filename of a package inside the source metadata element. The information
 * is used to link packages against the package "root".
 */
String META_FILENAME_DEFAULT_KEYS = "downloadName contentName originalFileName contentFileName";
  
/**
 * Is the value that holds the actual search list of meta-data keys.
 * (Can be customized via "-Dgacl.source.filename.metakeys=filenameKey1 filenameKey2 ...")
 */
List<String> META_FILENAME_KEYS = unmodifiableList(asList(
        System.getProperty("gacl.source.filename.metakeys", META_FILENAME_DEFAULT_KEYS).split(" ")));
  
/**
 * Specifies the maximum amount of source links to fetch in a fast lookup buffer when
 * checking the requirement of linking single package members against sources.
 * <p/>
 * Packages with more members like this will use slower single-member checks.
 */
int MAX_SOURCE_LINK_FETCH_SIZE = Integer.getInteger("gacl.source.link.fetchsize", 4 * 1024);
  
/**
 * Is the amount of time to keep the information on a package that more than
 * {@link StorageOptions#MAX_SOURCE_LINK_FETCH_SIZE} source links were returned when creating the
 * fast lookup buffer.
 */
long KEEP_OVERSIZED_SOURCE_LINK_INFO_INTERVAL = Long.getLong("gacl.source.link.oversized.ttl", 5 * 60 * 1000);
  
/**
 * Defines a delta time window in MS that is used to update timestamps with a single
 * update command instead of an expensive per-member basis. (If timestamps of all members
 * are within this window, the timestamps are set as if all members contained the timestamp
 * of the first entry)
 */
int TIME_UPDATES_ACCURACY = Integer.getInteger("gacl.timeupdate.accuracy", 5 * 60 * 1000);
  
/**
 * Defines the characters to consider as path delimiters when checking whether files
 * were renamed, removed or added. Setting this value to an empty string causes the
 * system to disable handling paths. If the same file is then contained more than
 * once in a single package it will be logged as invalid input data.
 */
String PATH_DELIMITERS = System.getProperty("gacl.path.delimiters", "/!\\");
  
/**
 * Controls whether hibernate search is enabled to index tag fields locally using
 * an Apache Lucene backend for all tag matching queries.
 * <p/>
 * When enabled, all "getMatching**" methods do not produce a DB hit.
 */
boolean USE_LOCAL_FULLTEXT_INDEX = parseBoolean(
        System.getProperty("gacl.use.local.fulltext-index", "false"));
  
/**
 * Specifies whether the generic tag query provider is used when the fulltext index
 * is queried for very small time ranges that are below the resolution of the FT index or
 * when the FT index is currently build and more than one day of content is pending to
 * get indexed.
 * <p/>
 * Time range queries are below the resolution when they are smaller than a DAY for FIRST_SEEN,
 * LAST_REQUESTED and are below an hour for LAST_PROCESSED.
 * <p/>
 * When enabled, all "getMatching**" methods query tags using un-indexed filtering on a date range
 * when the specified range was below the limit mentioned above.
 */
boolean USE_GENERIC_FALLBACK_WITH_FULLTEXT_INDEX = parseBoolean(
        System.getProperty("gacl.use.generic.fallback.with.fulltext-index", "true"));

About JPA

JPA is a standard allowing to persist and query object graphs to and from a relational database. Object graphs are persisted as relational tables allowing to reuse the data with traditional methods.

The Java Persistence API draws on ideas from leading persistence frameworks and APIs such as Hibernate, Oracle TopLink, and Java Data Objects (JDO), and well as on the earlier EJB container-managed persistence. The Expert Group for the Enterprise JavaBeans 3.0 Specification (JSR 220) has representation from experts in all of these areas as well as from other individuals of note in the persistence community.

More information on JPA can be gathered from: