DiGIR
Distributed Generic Information Retrieval
Initial Specification and Implementation
Revision 0.91
As participating members of TDWG subgroup, our goal is to contribute to the specification of a protocol for retrieving structured data from multiple, heterogeneous databases. In this project, we intend to inform the protocol specification by developing the software that implements the protocol. The purpose of this document is to record the understood requirements and assumptions we have for the protocol and initial implementations.
Key Terminology:
The initial purpose and scope of this project is to support distributed data retrieval across a loosely coupled federation(s) biological collections databases. Many such databases exist (perhaps > 1,000) and a growing subset (> 100) have been made publicly available via the Web. Several client-server systems exist that allow a user to query several databases at once, but the protocols, semantics and software are all tightly coupled in each of these systems. There is no standard and/or unified method to do distributed queries. This project hopes to establish an open standard and lay the groundwork for a generic protocol, capable supporting many communities, without regard to discipline or domain (data semantics). Our design goals include:
to use open protocols and standards, such as HTTP, XML, and UDDI to leverage existing and emerging IT infrastructure;
to decouple the protocol, software and semantics; [Portal and provider software can be developed independently. We expect each portal to cater toward different (sub)communities and data integration functions (e.g., collection data with geographic layers). Different implementations of providers and portals may be targeted for different operating systems]
to automate the establishment of a new provider as much as possible, automatable tasks include installation of provider software, testing, and registration of the provider in a centralized, global registry.