DiGIR Schema Readme
Explanation, Usage, and Tips
$Id: schemaReadme.html,v 1.2 2003/04/25 02:03:15 peejinator Exp $
DiGIR request and response formats are specified via a number of XML Schema
documents. The primary structure of a request or response is specified in
the protocol XML Schema (digir.xsd). For the latest version(s) see:
http://digir.sourceforge.net/schema/protocol
The available data elements, and their types, for searching and returning in
a request or response are specified in a conceptual (or content) schema. A
current conceptual schema and excellent example is that representing the
Darwin Core V2 dataset (darwin2.xsd). For the latest version(s) see:
http://digir.sourceforge.net/schema/conceptual/darwin
Additionally, predefined record structures for responses can be defined and
some examples of such (brief and full) can be found in subdirectories of the above location.
Understanding the DiGIR Schema
The DiGIR schema involved makes use of a variety of XML
Schema offerings. Lets take a look at some of the less straightforward
techniques here:
-
In the schema you will see a number of abstract elements for various kinds of data. For example:
- searchableData
- returnableData
- searchableReturnableData
As abstract elements, one cannot create concrete instances of them.
The elements are provided to be used as substitutionGroups when content data
elements are defined in conceptual schema. They act as placeholders in the
protocol schema within the filter (particularly within COPs) structure.
In terms of data defined within a conceptual schema, it can be either
searchable, returnable or both. Because XML Schema does not allow the use of multiple substitutionGroups per single element, each combination had to be created as its own element. One might consider why we did not create types and a complex inheritance tree to achieve the sort of typing we desired. The answer is two fold: 1) instance documents would not be as clean without the use of substitutionGroups and 2) XML Schema does not support multiple inheritance.
Examining how substitutionGroups were used here, one can think of them as acting like tag interfaces in java. In specific, in java there is an interface called Serializable and objects can implement this interface when they are serializable. Although there is no contract about supported implementation here, we have, in effect, tagged a data element as searchable, returnable, or both by specifying its substitutionGroup as one of the above. This is an important tehnique which one might find used elsewhere within the protocol schema.
- Similar to the Data elements above, we've defined the comparison operators (COPs) and made each of them part of substitutionGroups within the larger filter structure (LOP). However, with the COPs, each does extend from a COP (or multiCOP) type and this was important to ensure common and similar structure for each one (i.e. contined only one element and also the appropiate attribute).
-
Also, a few other place holder elements are defined, to be substituted or used in a request. These include:
- record: to be used as the container in custom (user defined) record structures
- requiredList: to be used as a flag (via substitutionGroup) for an element within a conceptual schema that is simply a sequence of the data elements that are required for each record within a result set
- For the purposes of the multiCOP a listType is defined. This is necesary to limit the elements within a multiCOP (such as IN) to be N of any 1 element.
Using the DiGIR Schema to create a Conceptual (content) schema
Given how the protocol schema is defined, it is necessary that all conceptual
schema adhere to certain techniques of definition. Here are some tips on creating a conceptual schema for use with the DiGIR protocal.
- All data elements must be defined with the nillable attribute set to true (i.e. nillable="true"). In result data, if a column value is null the record should be returned with xsi:nil="true" and the element should be left empty. Also, as seen above, if an element is not supported in results, in order for the element to be left empty, it must be defined as null. For example, here is how one would specify a nillable data element:
<xsd:element name="Collector" type="digir:stringData" substitutionGroup="digir:searchableReturnableData" nillable="true"/>
And how the element looks as null in the results:
<darwin:Collector xsi:nil="true"/>
- All data elements much define a substitutionGroup that tags them as searchable, returnable, or both. For example:
<xsd:element name="Collector" type="digir:stringData" substitutionGroup="digir:searchableReturnableData" nillable="true"/>
- All data elements that can be instantiated must not be abstract. Any abstract element should not be used as a searchable concept.
- Its highly recommended that for each data element an annotation is provided describing such things as, data types, possible ranges, domains, etc. This will encourage more valid instance documents.
- Noting required elements (a returnable data element that is required in all result sets) is done through the creation of an abstract element with the substitutionGroup of digir:requiredList. The element is abstract, and therefore can never be instantiated and we flag it as the requiredList via the substituionGroup. An example is:
<xsd:element name="requiredList" substitutionGroup="digir:requiredList" abstract="true">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="DateLastModified "/>
<xsd:element ref="InstitutionCode"/>
<xsd:element ref="CollectionCode"/>
<xsd:element ref="CatalogNumber"/>
<xsd:element ref="ScientificName"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
This should be done in all conceptual schema but the list (sequence) could contain no elements.
DiGIR Namespaces
The DiGIR namespace naming convention follows that of the w3c. The protocol schema namespaces are assigned as follows:
http://digir.net/schema/protocol/year/version
For example:
http://digir.net/schema/protocol/2003/1.0
To be consistant, many generating conceptual schemas have followed the same
convention. It is important to note that although the namespaces look like
URLs, they do not (nor are they required to) necessarily indicate the physical
location of the schema file. At the time of writing this document, the
protocol schema is stored under sourceforge and the namespace location to be
used in referring documents is:
http://digir.sourceforge.net/schema/protocol/2003/1.0/digir.xsd
Understanding DiGIR Information Domains
An addition to the DiGIR protocol has been Information Domains. An
Information Domain is intended to link together related schemas for the
purposes of application configuration. For example, the Darwin Core version
2 folks also specify two predefined result set schemas as well as some other
default information. In order to universally associate these various schemas
and values, an Information Domain schema was created as an addition to the
protocol schema. Information Domains are commonly referred to simply as
infodos. The schema for an infodo is currently available here:
http://digir.sourceforge.net/schema/protocol/2003/1.0/infodo.xsd
As an example, please see the Darwin Core 2 infodo instance available here:
http://digir.sourceforge.net/schema/conceptual/darwin/2003/1.0/darwin2Infodo.xml
Schema Versioning Process
When dealing with any evolving specification, it is important to appropriately version it. Giving the distributed nature of this project, a clear process should be followed. The suggested process is:
- Use the DiGIR CVS repository on the SourceForge site. The schema documents, for both the protocal and conceptual schemas, as well as some examples, are in the xml module in CVS. All edits to a schema should be done via a checked out version of the document and commits should be done regularly. It is not recommended that you keep a document out with changes for an extended period of time as that can introduce conflicts in a multi-developer environment.
- Once you are satisfied with you changes and they have been tested,
you are ready to "release" the schema. After the schema(s) are committed to
CVS, you should tag (via cvs tag) the schema, as well as any dependant
schemas with the release name (e.g. beta1, r1_0, etc.). All dependant
schemas should use the same tag. Because of cvs restrictions (e.g. tags
must start with a letter and cannot contain a ".", 1.0 will be tagged as
r1_0.
- After tagging the files, create a directory with the name of the
release version (e.g. beta1, 1.0, etc.) in the htdocs (webspace) area of the
DiGIR SourceForge site under the appropriate directory. The directory
structure also contains a year and follows the namespace paradigm. For
example, protocol/schema/2003/1.0 and conceptual/darwin/2003/1.0.
- Then scp (secure copy) the same versions of the schema(s) to this(these) new directory(ies). Here, snapshots of all released version will reside. This location acts as, and should be used in other schema or instance documents, as the schemaLocation. You should not remove previous versions of the schemas as other people may still be referencing them.