This is [not yet] a W3C NOTE for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C NOTEs as reference material or to cite them as other than "work in progress". A list of current W3C notes can be found at: http://www.w3.org/pub/WWW/TR
Note: since notes are subject to frequent change, you are advised to reference the above address, rather than the addresses of notes themselves.
This document is a technical specification of Resource Description Messages (RDM), an Internet Resource Discovery mechanism. It contains a basic overview of the RDM model, describes the syntax and protocol, and provides some example usage. It also contains a technical description of Harvest's Summary Object Interchange Format (SOIF).
Resource Description Messages (RDM) is a mechanism to discover and retrieve metadata about network-accessible resources, known as Resource Descriptions (RD). A Resource Description consists of a list of attribute-value pairs (e.g., Author = Darren Hardy, Title = RDM) and is associated with a resource via a URL. Agents can generate RD's automatically (e.g., a WWW robot), or people can write RD's manually (e.g., a librarian or author). Once a repository of Resource Descriptions is assembled, the server can export it via RDM as a programatic way for WWW agents to discover and retrieve the RD's.
RDM is a messaging format which two processes can use to exchange resource descriptions across a network. In RDM, one process (a client or agent) sends a request RDM message to another process (a server) which processes the request, then sends a response RDM message, similiar to the HTTP/1.0 request/response model. The most common use of RDM is when the requester selects a set of RD's to transfer in bulk based on some scoping criteria. For example, a request to send all RD's that have changed since the previous week. In addition, RDM supports the notion of a catalog to which the scoping criteria applies; this enables a single RDM server to provide access to many different catalogs.
RDM allows access to a schema definition to which the RD's conform, a taxonomy description by which the RD's are organized, and a server description used for feature negotiation (e.g., which scoping criteria syntax is accepted).
RDM uses Harvest's SOIF format to encode the RD's,
and the Harvest Gatherer command language to specify incremental retrievals.
The RDM format is implemented as an HTTP/1.0 Content-type called
application/x-rdm
. In practice, RDM use HTTP/1.0 as its bulk
transport layer.
To summarize, RDM supports access to:
Harvest's Summary Object Interchange Format (SOIF) is a syntax for transmitting resource descriptions and other kinds of structured objects. Each RD is represented in SOIF as a list of attribute-value pairs (e.g., Company = 'Netscape'). SOIF handles both textual and binary data as values, and with some minor extensions multivalued attributes. Finally, SOIF is a streaming format which allows bulk transfer of many RD's in a single, efficient stream.
RDM supports the following scenarios:
RD Retrieval | requester retrieves one or more RDs from the server |
---|---|
RD Submission | requester sends one or more RDs and an optional Schema definition to the server for update or deletion |
Server Description Retrieval | requester retrieves one Server Description from the server |
Schema Description Retrieval | requester retrieves one Schema definition from the server |
Taxonomy Description Retrieval | requester retrieves one Taxonomy definition from the server |
Status Retrieval | requester retrieves the current server status from the server |
Note that RD Submission is useful for sending unsolicited RD's to a server (e.g., push/advertise model).
Each RDM message contains a header and a body. The header identifies the nature of the RDM message and to which catalog it applies, and the body contains the data required to carry out the needed request (e.g., scoping critiera). Both the header and body of the RDM messages is encoded using SOIF as described below.
The RDM header section begins with a SOIF object of type RDMHEADER which must contain as least the following attributes:
RD-Request | Request to send RD's based on given query |
---|---|
RD-Request-Deleted | Request to send RD's have been deleted based on given query |
RD-Response | Contains RD's |
RD-Response-Deleted | Contains RD's which should be deleted |
Schema-Description-Request | Request to send the schema definition |
Schema-Description-Response | Contains the schema definition |
Server-Description-Request | Request to send the server description |
Server-Description-Response | Contains the server description |
Taxonomy-Description-Request | Request to send the taxonomy description |
Taxonomy-Description-Response | Contains the taxonomy description |
Status-Request | Request to send status message - used to test existence. |
Status-Response | Contains general status information about the result of the request |
RDM-Query-Language: (required only for *Request* messages)
A string which identifies which query language is used in the given request (e.g., gatherer). How the scoping criteria in the RDM body is defined depends on the RDM-Query-Language used.
The following RDM header attributes are optional:
Some example RDM headers include:
@RDMHEADER { - RDM-Version{3}: 1.0 RDM-Type{14}: status-request } @RDMHEADER { - RDM-Version{3}: 1.0 RDM-Type{9}: rd-request RDM-Query-Language{8}: gatherer Catalog-Service-ID{39}: x-catalog://www.netscape.com:80/techpubs } @RDMHEADER { - RDM-Version{3}: 1.0 RDM-Type{11}: rd-response RDM-Error-Message{9}: 0 results }
The content of the RDM message body depends on the values from the RDM-Type and the RDM-Query-Language attributes in the RDM header. The Reply RDM Type is the type of RDM message that the RDM server would return in response to the given RDM request.
RDM-Type |
RDM-Query-Language/ RDM-Response-Interpret |
Body Content | Reply RDM Type |
---|---|---|---|
RD-Request RD-Request-Deleted |
Gatherer | Gatherer Scope/View | RD-Response |
RD-Response RD-Response-Deleted |
none |
Schema Description (optional) SOIF stream |
Status-Response |
Server-Description-Request | none | empty | Server-Description-Response |
Server-Description-Response | none | Server Description | Status-Response |
Schema-Description-Request | Schema-Basic | Schema-Basic Query | Schema-Description-Response |
Schema-Description-Response | none | Schema Description | Status-Response |
Taxonomy-Description-Request | Taxonomy-Basic | Taxonomy-Basic Query | Taxonomy-Description-Response |
Taxonomy-Description-Response | none | Taxonomy Description | Status-Response |
Status-Request | none | empty | Status-Response |
Status-Response | none | Status message | none |
Catalog Service IDs (CSID) identify the specific Catalog Server by name,
host, and port number. CSIDs are encoded in the URL
syntax using the access method x-catalog, as follows:
x-catalog://host:port/name
(e.g.,
x-catalog://www.netscape.com:80/techpubs). If not present, the RDM
server will use its default catalog service.
The CSID has a dual purpose: (1) to identify the host, port of the RDM server, and (2) to identify the catalog to which the request applies.
A SOIF stream contains one or more SOIF objects each of which contains the structured content of a resource description. SOIF is a simple machine-readable syntax as defined here.
The SOIF grammar is as follows (also see the Harvest User's Manual):
SOIF ::= OBJECT SOIF | OBJECT OBJECT ::= @ TEMPLATE-TYPE { URL ATTRIBUTE-LIST } ATTRIBUTE-LIST ::= ATTRIBUTE ATTRIBUTE-LIST | ATTRIBUTE ATTRIBUTE ::= IDENTIFIER {VALUE-SIZE} DELIMITER VALUE URL ::= RFC1738-URL-Syntax | "-" TEMPLATE-TYPE ::= Alpha-Numeric-String IDENTIFIER ::= Alpha-Numeric-String VALUE ::= Arbitrary-Data VALUE-SIZE ::= Number DELIMITER ::= ":<TAB>" | "\\072\\011"
An example SOIF object:
@DOCUMENT { http://www.netscape.com:80/ Title{20}: Welcome to Netscape! Last-Modified{29}: Thu, 16 May 1996 11:45:39 GMT }
In RDM, we allow SOIF objects with zero or more attributes. Specifically, we extend the syntax to include:
ATTRIBUTE-LIST ::= ATTRIBUTE ATTRIBUTE-LIST | ATTRIBUTE | NULL
The ATTRIBUTE IDENTIFIER in a SOIF stream is defined to be one of the SOIF-Attribute values in the schema description.
SOIF does not explicitly allow for a single attribute name to have multiple values. So, to handle multiple values for the same attribute name, RDM uses the attribute naming convention of appending a hyphen and number to identify the value. For example, the attribute names title-1, title-2, and title-3 all refer to the 3 values associated with the attribute title.
This convention is also useful for embedding tuples within a single SOIF object. For example, title-1, author-1, title-2, author-2, title-3, author-3, ... would represent 3 records each with 2 attributes (title and author).
If a RDM request contains a scope specification, then it may also contain a view specification. The server will filter the result set of resource descriptions through the view before returning them to the requester.
Each views has 3 possible attributes:
The view is encoded using SOIF and the SOIF type RDMQUERY.
To send only a specific set of attributes for each RD rather than full RD use Attribute. In this case, we're only interested in the URL, Title, Author, and Last-Modified attributes. The result set contains at most 10 hits and the result set is ordered by the Title attribute:
@RDMQUERY { - View-Attributes{x}: URL,Title,Author,Last-Modified View-Hits{x}: 10 View-Order{x}: Title }
For IR-based searches, you want to sort by the score (or relevance ranking):
@RDMQUERY { - View-Attributes{x}: URL,Title,Author,Last-Modified,Score View-Hits{x}: 10 View-Order{x}: -Score,+Title }
This is a very basic query language based on the Harvest Gatherer command language. It allow two queries: send every RD, and send any RD which has been modified since a given date (based on the RD-Last-Modified attribute). It also uses the standard RDM view specification. The query is encoded using SOIF and the SOIF type RDMQUERY which may also contain a view specification.
@RDMQUERY { - Scope{3}: all }
@RDMQUERY { - Scope{34}: since Sun, 06 Nov 1994 08:49:37 GMT }
@RDMQUERY { - Scope{x}: all View-Attributes{x}: URL,Title,Last-Modified }
Server Descriptions provide system-level access information to the client (such as supported RDM query languages), help distributed query routing clients better select promising catalog services to search, and provide human-readable descriptions of a catalog service. A server description is written as a single SOIF object with the object type of RDMSERVER.
A server description must contain the following attributes
@RDMSERVER (required) | |
---|---|
Attribute | Description |
Supported-RDM-Type | a comma-separated list of the supported RDM types |
Supported-RDM-Query-Language | a comma-separated list of the supported RDM query languages |
SD-Last-Modified | date when the Server Description was last modified |
SD-Expires | date when the Server Description will expire |
@RDMSERVER (optional) | |
Description | human-readable description of the content |
Supported-Catalog-Service-ID | a comma-separated list of the supported catalog-service-id's |
Maintainer | RFC822 email address for the maintainer of the service |
Sample-RD-* | sample RD's from the server |
@RDMSERVER { x-catalog://powell.mcom.com:80/wwwdvl Description{x}: Contains information intended for World Wide Web developers1 Supported-RDM-Type{x}: rd-request,taxonomy-description-request,server-description-request,schema-description-request,status-request Supported-Catalog-Service-ID{x}: x-catalog://powell.mcom.com:80/wwwdvl Supported-RDM-Query-Language{x}: gatherer,sample-keyword-ql RD-Sample-1{x}: @DOCUMENT ... RD-Sample-2{x}: @DOCUMENT ... RD-Sample-3{x}: @DOCUMENT ... SD-Last-Modified{x}: Thu, 16 May 1996 11:29:00 GMT SD-Expires{x}: Thu, 01 Jan 1997 00:00:00 GMT }
The data model that SOIF provides is a flat name space for the attributes, and treats all values as blobs. The RDM schema definition language extends this data model by providing:
The schema defintion language consists of the following information:
Attribute | Description | Default | Required? |
---|---|---|---|
|
|||
Schema-Definition- Language-Version |
Version of the schema definition language used. | 1.0 | Yes |
Schema-Name | Name of the schema. Used in the SOIF TEMPLATE-TYPE field. | None | Yes |
URL | URL that can be used to retrieve the schema definition. Located in the URL field of the SOIF. | None | Yes |
Last-Modified | Date of when the schema definition was last modified. Formated as in HTTP/1.0 date format specification. | None | Yes |
Number-of-Entries | The number of entries in this schema description. | None | Optional |
Maintainer | Email address of those responsible for maintaining the schema. | None | Optional |
|
|||
SOIF-Attribute[-#] | Attribute name used in the SOIF stream. SOIF-Attribute should be used as a lookup key into the schema description to access the description for that value. The SOIF-Attribute names must be unique within a single schema description. | None | Yes |
Description[-#] | Textual description of the column. | None | Optional |
|
|||
Data-Type[-#] | Data type for the column. Follow standard RDBMS data types like varchar, date, blob, int, serial, etc. | varchar | Yes |
Content-Type[-#] | Content type of the column used to determine the format of the data as in HTTP/1.0's Content-types like text/plain, image/jpeg. | text/plain | Yes |
|
|||
Enforce-Uniqueness[-#] | Boolean flag indicating that the values for this column should be enforced as unique (0 = not unique, 1 = unique). | Not unique. | Optional |
Index-Attribute[-#] | Integer indicating if the column should be indexed or not, and in which priority order (0 = not present, 1 = first, 2 = second, etc.) | Not indexed. | Optional |
|
|||
Is-Internal[-#] | Boolean flag indicating whether the column should be surfaced to the user or is for internal use only (0 = surface to user, 1 = for internal use only). Items like foreign key columns fall into this category. | Surface to user | Optional |
Default-View-Order[-#] | Integer indicating if the column is part of the default view's order and in which order (0 = not present, 1 = first, 2 = second, etc.) | Not present | Optional |
Default-View-Attribute[-#] | Integer indicating if the column is part of the default view's attribute and in which order (0 = not present, 1 = first, 2 = second, etc.) | Not present | Optional |
|
|||
Table-Name[-#] | User-level table name. | None | Optional |
Column-Name[-#] | User-level column name. | None | Optional |
System-Table-Name[-#] | Internal name used to refer to the table which allows duplicate user-level names. | None | Optional |
System-Column-Name[-#] | Internal name used to refer to the column which allows duplicate user-level names. | None | Optional |
Foreign-Key- System-Table-Name[-#] |
Name of the related foreign key table, if any. Because of the assumption of a single rooted branching hierarchical data model, this is assumed to always refer to a single parent. | None | Optional |
Foreign-Key- System-Column-Name[-#] |
Name of the related foreign key column, if any. Because of the assumption of a single rooted branching hierarchical data model, this is assumed to always refer to a single parent. | None | Optional |
In-Root-Table[-#] | Boolean flag indicating whether the table is the root table or not (0 = not in root table, 1 = in root table). | Not in root table | Optional |
Below is a very simple schema description:
@SCHEMA { - Schema-Definition-Language-Version{x}: 1.0 Last-Modified{x}: Thu, 16 May 1996 00:00:00 GMT Number-of-Entries{x}: 3 Maintainer{x}: dhardy@netscape.com SOIF-Attribute-1{x}: Title Description-1{x}: Contains the Title of the resource. SOIF-Attribute-2{x}: Author Index-Attribute-2{x}: 1 Description-2{x}: Full name of the resource authors. SOIF-Attribute-3{x}: Abstract Data-Type-3{x}: blob Description-3{x}: Brief description of the resource. }
Below is a very simple query language intended to return the pieces of the
schema that have a particular attributed defined. The syntax is that
Scope is defined Attribute
.
To send the entire schema:
@RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: schema-description-request RDM-Query-Language{x}: schema-basic } @RDMQUERY { - Scope{x}: defined SOIF-Attribute }
To send the entire only those pieces in the default view:
@RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: schema-description-request RDM-Query-Language{x}: schema-basic } @RDMQUERY { - Scope{x}: defined Default-View-Attribute }
To send the entire only those pieces in the default view, and impose a view:
@RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: schema-description-request RDM-Query-Language{x}: schema-basic } @RDMQUERY { - Scope{x}: defined Default-View-Attribute View-Attributes{x}: soif-attribute,default-view-attribute }
A Taxonomy description defines a hierarchical taxonomic structure in which documents can be organized. The Taxonomy description is represented as a stream of SOIF objects that consists of one TAXONOMY object and one or more CLASSIFICATION objects. The @TAXONOMY object defines the identifier of the taxonomy, as well as a human-readable description of the taxonomy. The @CLASSIFICATION object defines a single classification or category within the taxonomy. Below is an example taxonomy with the following layout:
@TAXONOMY { - Id{x}: Netscape Sample 1 Description{x}: Sample Taxonomy which captures a basic Business organizational structure. } @CLASSIFICATION { - Id{x}: Education/Training Parent-Id{x}: ROOT Taxonomy-Id{x}: Netscape Sample 1 } @CLASSIFICATION { - Id{x}: Education/Training:Internal Training Classes Parent-Id{x}: Education/Training Taxonomy-Id{x}: Netscape Sample 1 } @CLASSIFICATION { - Id{x}: Education/Training:Internal Training Classes:Course Descriptions Parent-Id{x}: Education/Training:Internal Training Classes Taxonomy-Id{x}: Netscape Sample 1 } @CLASSIFICATION { - Id{x}: Education/Training:Internal Training Classes:Schedule Parent-Id{x}: Education/Training:Internal Training Classes Taxonomy-Id{x}: Netscape Sample 1 }
A taxonomy description contains the following attributes:
@TAXONOMY (required) | |
---|---|
Attribute | Description |
Id | a string identifying the taxonomy. |
@TAXONOMY (optional) | |
Description | human-readable one-line description of the content |
A classification description contains the following attributes:
@CLASSIFICATION (required) | |
---|---|
Attribute | Description |
Id | a string identifying the classification. |
Parent-Id | a string identifying the parent classification |
Taxonomy-Id | a string identifying to which taxonomy the classification belongs. |
@CLASSIFICATION (optional) | |
Description | human-readable one-line description of the content |
The following query language allows the client to retrieve only the pieces of a Taxonomy that are a given distance away from the given Classification. For example, return only the classifications that are immediately under the root.
The syntax of the query language (only 2 commands: descendant and children and their aliases) is as follows:
Scope = descendant Classification | descendant/N Classification children Classification | anklebiter Classification where Classification = ROOT | Classification-Id
To send the entire taxonomy (node and everything below):
@RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: taxonomy-description-request RDM-Query-Language{x}: taxonomy-basic } @RDMQUERY { - Scope{x}: descendant ROOT }
To send the entire only the children (e.g., node and 1 directly below):
@RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: taxonomy-description-request RDM-Query-Language{x}: taxonomy-basic } @RDMQUERY { - Scope{x}: anklebiter ROOT } OR @RDMHEADER { - RDM-Version{x}: 1.0 RDM-Type{x}: taxonomy-description-request RDM-Query-Language{x}: taxonomy-basic } @RDMQUERY { - Scope{x}: anklebiter Education/Training:Internal Training Classes }
A status message is a HTML 2.0 document.
RDM's request/response model is well-suited for HTTP. RDM messages can be transfered between processes across a network via HTTP/1.0. RDM uses the HTTP/1.0 Content-type application/x-rdm to transfer an RDM message via HTTP.
Clients can submit RDM messages in 2 ways via HTTP/1.0:
In both cases, the client receives back an HTTP/1.0 response. The response's entity header Content-type must be set to application/x-rdm and the entity body contains the RDM response message. We encourage that the response also contain the Expires and Content-length entity-headers.
A client can send a RDM message by submitting an HTTP/1.0 request using:
Request Line | Entity Header | Value |
---|---|---|
Method | POST | |
Request-URI | /rdm/incoming | |
Content-type | application/x-rdm | |
Content-length | recommended | |
Content-encoding | optional | |
User-agent | recommended | |
Authorization | optional |
When submitting RDM requests using the GET method, a client can send
an RDM request for one of the supported RDM-Types below by submitting
an
HTTP/1.0
request. Use the GET method and encode the content of the RDM
request inside of the Request-URI which starts with
/rdm/incoming
. Encode the RDM attributes as per the
application/x-www-form-urlencoded specification (e.g., %ab hex encodings):
RDM Type |
WWW Form Attribute=Value |
||||||||
---|---|---|---|---|---|---|---|---|---|
RD-Request |
|
||||||||
Status-Request |
|
||||||||
Schema-Request |
|
||||||||
Server-Request |
|
||||||||
Taxonomy-Request |
|
To test the existence/readiness of an RDM server, use the status request RDM message:
http://host:port/rdm/incoming?type=status-request
To collect all of the RDs from an RDM server, use the RD Request RDM message and Gatherer query language:
http://host:port/rdm/incoming?type=rd-request&ql=gatherer&scope=all
To send a query to the RDM server, use the RD Request RDM message and a toy "sample-keyword-ql" query language:
http://host:port/rdm/incoming?type=rd-request&ql=sample-keyword-ql&scope=netscape&view-hits=10
http://host:port/rdm/incoming?type=rd-request&ql=sample-keyword-ql&scope=netscape&view-hits=10&view-attributes=url,title,author
Rather than using application/x-rdm
for the RDM Content-Type,
we'd like to register a type name officialy with the appropriate Internet
standards bodies. Perhaps use application/rdm
or
protocol/rdm
or whatever.
Steve Pennebaker and Don Eastman helped define and stress-test the schema and taxonomy definition languages. Many people from the Harvest community have contributed to SOIF, and provided a great deal of input on the Harvest architecture and mechanisms. Much of this input formed the basis of RDM (e.g., HTTP-based transport layer, query language negotiation, etc.). They are acknowledged in detail on the Harvest home page.