Distributed Data Retrieval Protocol
Portal Reference Implementation Design
$revision$
Introduction
This document describes the reference implementation
of the Portal Component of the Distributed Data Retrieval Protocol (DDRP). Please
refer to the Requirements Document for an
overview of the entire system. The Portal Component is an application that communicates
with multiple providers and performs operations to retrieve and integrate data. The
reference implementation of the portal follows the protocol. It is a set of classes
built to interact with distributed providers and is interfaced with via a set of well
defined API calls.
Object Model
Objects
- PortalServices
-
The entry point into the portal. All "clients" interface with the portal via API calls
made to PortalServices or via streamed HTTP requests. This can be thought of as the main class of the
portal component.
- _providers - list of providers for querying. The list could be all available providers (found
during discovery) or a list of a select few specified at the time of construction (via config?).
- getProviderList() - returns the list of providers configured for this instance.
- getCompleteProviderList() - returns the full list of providers (from the registry). This could
be the same data as getProviderList() if there is no distinct subset of providers already specified.
- setProviderList() - sets the providers available. This may be done completely internally
after provider discovery or may be done at the time of construction.
- process() - accepts a request of X format and hands it off to the PortalRequestHandler for processing.
During construction, the following things will occur:
- configuration file read
- logging services are established
- ProviderCache is constructed and if necessary, provider discovery takes place
- provider metadata is filled out, via a request to each provider of concern
- the pool of handlers is constructed
- listening for requests initiated
- PortalConfig
-
The configuration object containing required values. The configuration object is generally loaded from a configuration file.
- RegistryAccess
-
The API into the provider registry. Based on assumptions, this will be a wrapper around UDDI SOAP
requests (i.e. the caller makes API calls that are converted internally to the appropriate SOAP request).
- discoverProviders() - queries the UDDI registry for available providers.
- ProviderCache
-
Manages provider data obtained from the registry (via RegistryAccess) and from the providers directly (i.e.
metadata information).
- _providers - cached list of providers from discovery.
- _fetchDate - the datetime the providers were discovered, null if no discovery has occurred yet.
- _expiryTime - time, in seconds, to cache data for. Set to any number less than 1 (< 1) to prohibit
caching and force discovery with every request.
- _defaultExpiryTime - constant, defaults to 86,400 (1 day).
- getProviders() - returns the providers discovered if cached and still valid, otherwise retrieves
the providers, sets the cache and then returns them.
- setProviders() - sets the _providers variable (simply a modifier).
- getDefaultExpiryTime() - returns the configure default expiry time for informational purposes
- Provider
-
Data object, or bean if you will, representing an individual datasource. To note, the
requirements allow for each physical provider to host many databases. A Provider in this case relates to
one database only. We do not support an array of databases within the Provider object. Perhaps this
object should be called Datasource to avoid confusion.
- _ip - the IP address of the provider.
- _institution - the known name of the institution (e.g. California Academy of Sciences)
- _database - name of the database upon which data resides
- _metadata - object containing the provider offerings
Accessors and Modifiers are provided for each of the above attributes but are omitted here for brevity.
- Metadata
-
Data object representing the metadata associated with an individual datasource. This object encapsulates
a particular provider's classification and offerings. Metadata requirements must be defined still.
- types - the classification of the provider (e.g. Mammals). A provider can be classified in N types.
- supportedOperations - the operations supported by the provider.
Accessors and modifiers are provided for each of the above attributes but are omitted for brevity. An
assumption is made that a class of constants (or constant keys) will exist for type data such as supportedOperations. Acceptable or valid values for such type data will be specified in the protocol.
- PortalRequestHandler
-
The individual processor of a request. The PortalRequestHandler is the basic workflow component that takes
such steps as pairing down the available providers based on offerings, marshalling the request into protocol
compliant XML, threading submittal of requests to various providers, collecting responses and unmarshalling
responses. Of course, all of this work is not done in this class alone, rather, the PortalRequestHandler
acts as a controller to these processes. It will be a runnable object since the Portal must be able to handle
any number of requests at a given time. Likely, pooling of this object will be implemented in order to
manage resources.
- PortalRequestHandlerThread
-
A worker thread controlled by a PortalRequestHandler. An instance of this thread is instantiated per request
to a single database of a provider. It streams the request to the provider and awaits the response. It
is possible some pooling could occur here, or at least, some maximum limit placed on the number of individual
threads spawned at a time.
- ProviderFilterer
-
An interface specifying the required methods for a provider filter to implement. The interface accounts for
abstracting the filter component such that any number of filters, based on varying schemas and rulesets, may be
implemented.
- Darwin2ProviderFilter
-
An implementation of ProviderFilterer based on the Darwin Core v.2.0 Federation Schema.
- RequestMarshaller
-
Simply translates requests in one form into another form. This class could possibly be static.
- marshallRequest() - converts the original request structure to an ArrayList of protocol compliant
requests. Each of these requests corresponds to a query for an individual database on a physical
provider.
- ResponseMarshaller
-
Simply translates responses in one form into another form. Largely, ResponseMarshaller may just append
numerous responses together for streaming back to the caller. This class could possibly be static.
Notes
Some additional design considerations:
- a common utility package will be developed containing things like global constants, logger classes, and
configuration classes.
-