W3C: NOTE-rdm-960724

Resource Description Messages (RDM)

W3C NOTE 24-Jul-96

This version:
http://www.w3.org/pub/WWW/TR/NOTE-rdm.html
$Id: NOTE-rdm.html,v 1.4 1996/12/09 03:45:06 jigsaw Exp $
Latest version:
http://www.w3.org/pub/WWW/TR/NOTE-rdm.html
Author:
Darren Hardy <dhardy@netscape.com>


Status of this document

This is [not yet] a W3C NOTE for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C NOTEs as reference material or to cite them as other than "work in progress". A list of current W3C notes can be found at: http://www.w3.org/pub/WWW/TR

Note: since notes are subject to frequent change, you are advised to reference the above address, rather than the addresses of notes themselves.

Abstract

This document is a technical specification of Resource Description Messages (RDM), an Internet Resource Discovery mechanism. It contains a basic overview of the RDM model, describes the syntax and protocol, and provides some example usage. It also contains a technical description of Harvest's Summary Object Interchange Format (SOIF).


Contents


What is RDM?

The Basics

Resource Description Messages (RDM) is a mechanism to discover and retrieve metadata about network-accessible resources, known as Resource Descriptions (RD). A Resource Description consists of a list of attribute-value pairs (e.g., Author = Darren Hardy, Title = RDM) and is associated with a resource via a URL. Agents can generate RD's automatically (e.g., a WWW robot), or people can write RD's manually (e.g., a librarian or author). Once a repository of Resource Descriptions is assembled, the server can export it via RDM as a programatic way for WWW agents to discover and retrieve the RD's.

The Details

RDM is a messaging format which two processes can use to exchange resource descriptions across a network. In RDM, one process (a client or agent) sends a request RDM message to another process (a server) which processes the request, then sends a response RDM message, similiar to the HTTP/1.0 request/response model. The most common use of RDM is when the requester selects a set of RD's to transfer in bulk based on some scoping criteria. For example, a request to send all RD's that have changed since the previous week. In addition, RDM supports the notion of a catalog to which the scoping criteria applies; this enables a single RDM server to provide access to many different catalogs.

RDM allows access to a schema definition to which the RD's conform, a taxonomy description by which the RD's are organized, and a server description used for feature negotiation (e.g., which scoping criteria syntax is accepted).

RDM uses Harvest's SOIF format to encode the RD's, and the Harvest Gatherer command language to specify incremental retrievals. The RDM format is implemented as an HTTP/1.0 Content-type called application/x-rdm. In practice, RDM use HTTP/1.0 as its bulk transport layer.

To summarize, RDM supports access to:

What is SOIF?

Harvest's Summary Object Interchange Format (SOIF) is a syntax for transmitting resource descriptions and other kinds of structured objects. Each RD is represented in SOIF as a list of attribute-value pairs (e.g., Company = 'Netscape'). SOIF handles both textual and binary data as values, and with some minor extensions multivalued attributes. Finally, SOIF is a streaming format which allows bulk transfer of many RD's in a single, efficient stream.


RDM Format

RDM Request/Response Model

RDM supports the following scenarios:

RD Retrieval requester retrieves one or more RDs from the server
RD Submission requester sends one or more RDs and an optional Schema definition to the server for update or deletion
Server Description Retrieval requester retrieves one Server Description from the server
Schema Description Retrieval requester retrieves one Schema definition from the server
Taxonomy Description Retrieval requester retrieves one Taxonomy definition from the server
Status Retrieval requester retrieves the current server status from the server

Note that RD Submission is useful for sending unsolicited RD's to a server (e.g., push/advertise model).

RDM Format Syntax

Each RDM message contains a header and a body. The header identifies the nature of the RDM message and to which catalog it applies, and the body contains the data required to carry out the needed request (e.g., scoping critiera). Both the header and body of the RDM messages is encoded using SOIF as described below.

RDM Header

The RDM header section begins with a SOIF object of type RDMHEADER which must contain as least the following attributes:

RDM-Query-Language: (required only for *Request* messages)

A string which identifies which query language is used in the given request (e.g., gatherer). How the scoping criteria in the RDM body is defined depends on the RDM-Query-Language used.

The following RDM header attributes are optional:

Some example RDM headers include:


RDM Body

The content of the RDM message body depends on the values from the RDM-Type and the RDM-Query-Language attributes in the RDM header. The Reply RDM Type is the type of RDM message that the RDM server would return in response to the given RDM request.

RDM-Type RDM-Query-Language/
RDM-Response-Interpret
Body Content Reply RDM Type
RD-Request
RD-Request-Deleted
Gatherer Gatherer Scope/View RD-Response
RD-Response
RD-Response-Deleted
none Schema Description (optional)
SOIF stream
Status-Response
Server-Description-Request none empty Server-Description-Response
Server-Description-Response none Server Description Status-Response
Schema-Description-Request Schema-Basic Schema-Basic Query Schema-Description-Response
Schema-Description-Response none Schema Description Status-Response
Taxonomy-Description-Request Taxonomy-Basic Taxonomy-Basic Query Taxonomy-Description-Response
Taxonomy-Description-Response none Taxonomy Description Status-Response
Status-Request none empty Status-Response
Status-Response none Status message none


Catalog Service IDs (CSID)

Catalog Service IDs (CSID) identify the specific Catalog Server by name, host, and port number. CSIDs are encoded in the URL syntax using the access method x-catalog, as follows: x-catalog://host:port/name (e.g., x-catalog://www.netscape.com:80/techpubs). If not present, the RDM server will use its default catalog service.

The CSID has a dual purpose: (1) to identify the host, port of the RDM server, and (2) to identify the catalog to which the request applies.


SOIF Stream

A SOIF stream contains one or more SOIF objects each of which contains the structured content of a resource description. SOIF is a simple machine-readable syntax as defined here.

The SOIF grammar is as follows (also see the Harvest User's Manual):

An example SOIF object:

Number of SOIF attributes

In RDM, we allow SOIF objects with zero or more attributes. Specifically, we extend the syntax to include:

The ATTRIBUTE IDENTIFIER in a SOIF stream is defined to be one of the SOIF-Attribute values in the schema description.

Multivalue SOIF

SOIF does not explicitly allow for a single attribute name to have multiple values. So, to handle multiple values for the same attribute name, RDM uses the attribute naming convention of appending a hyphen and number to identify the value. For example, the attribute names title-1, title-2, and title-3 all refer to the 3 values associated with the attribute title.

This convention is also useful for embedding tuples within a single SOIF object. For example, title-1, author-1, title-2, author-2, title-3, author-3, ... would represent 3 records each with 2 attributes (title and author).


RDM View Specification

If a RDM request contains a scope specification, then it may also contain a view specification. The server will filter the result set of resource descriptions through the view before returning them to the requester.

Each views has 3 possible attributes:

The view is encoded using SOIF and the SOIF type RDMQUERY.

To send only a specific set of attributes for each RD rather than full RD use Attribute. In this case, we're only interested in the URL, Title, Author, and Last-Modified attributes. The result set contains at most 10 hits and the result set is ordered by the Title attribute:

For IR-based searches, you want to sort by the score (or relevance ranking):


RDM-Query-Language: Gatherer

This is a very basic query language based on the Harvest Gatherer command language. It allow two queries: send every RD, and send any RD which has been modified since a given date (based on the RD-Last-Modified attribute). It also uses the standard RDM view specification. The query is encoded using SOIF and the SOIF type RDMQUERY which may also contain a view specification.


Server Descriptions

Server Descriptions provide system-level access information to the client (such as supported RDM query languages), help distributed query routing clients better select promising catalog services to search, and provide human-readable descriptions of a catalog service. A server description is written as a single SOIF object with the object type of RDMSERVER.

A server description must contain the following attributes

@RDMSERVER (required)
Attribute Description
Supported-RDM-Type a comma-separated list of the supported RDM types
Supported-RDM-Query-Language a comma-separated list of the supported RDM query languages
SD-Last-Modified date when the Server Description was last modified
SD-Expires date when the Server Description will expire
@RDMSERVER (optional)
Description human-readable description of the content
Supported-Catalog-Service-ID a comma-separated list of the supported catalog-service-id's
Maintainer RFC822 email address for the maintainer of the service
Sample-RD-* sample RD's from the server

Example Server Description


Schema Description

The data model that SOIF provides is a flat name space for the attributes, and treats all values as blobs. The RDM schema definition language extends this data model by providing:

  1. Data type and format information for the values (e.g., varchar and application/rfc822-address, or blob and text/html).
  2. Hints to the RDM client as to which attributes should be surfaced to the user-level, and attributes which are included in the default view.
  3. Hints to an indexer as to which attributes should be indexed, and attributes which should be used to supress duplicates.
  4. A mapping between attribute names and (table name, column name) tuples, which helps an RDM client to place this data into the relational data model to support RDBMS backends.
  5. Other semantic information, such as indexable columns and foreign keys, which helps in mapping the SOIF objects into the relational data model.

The schema defintion language consists of the following information:

Attribute Description Default Required?
Global parameters
Schema-Definition-
Language-Version
Version of the schema definition language used. 1.0 Yes
Schema-Name Name of the schema. Used in the SOIF TEMPLATE-TYPE field. None Yes
URL URL that can be used to retrieve the schema definition. Located in the URL field of the SOIF. None Yes
Last-Modified Date of when the schema definition was last modified. Formated as in HTTP/1.0 date format specification. None Yes
Number-of-Entries The number of entries in this schema description. None Optional
Maintainer Email address of those responsible for maintaining the schema. None Optional
Identity
SOIF-Attribute[-#] Attribute name used in the SOIF stream. SOIF-Attribute should be used as a lookup key into the schema description to access the description for that value. The SOIF-Attribute names must be unique within a single schema description. None Yes
Description[-#] Textual description of the column. None Optional
Richer Values
Data-Type[-#] Data type for the column. Follow standard RDBMS data types like varchar, date, blob, int, serial, etc. varchar Yes
Content-Type[-#] Content type of the column used to determine the format of the data as in HTTP/1.0's Content-types like text/plain, image/jpeg. text/plain Yes
Indexer Hints
Enforce-Uniqueness[-#] Boolean flag indicating that the values for this column should be enforced as unique (0 = not unique, 1 = unique). Not unique. Optional
Index-Attribute[-#] Integer indicating if the column should be indexed or not, and in which priority order (0 = not present, 1 = first, 2 = second, etc.) Not indexed. Optional
View Information
Is-Internal[-#] Boolean flag indicating whether the column should be surfaced to the user or is for internal use only (0 = surface to user, 1 = for internal use only). Items like foreign key columns fall into this category. Surface to user Optional
Default-View-Order[-#] Integer indicating if the column is part of the default view's order and in which order (0 = not present, 1 = first, 2 = second, etc.) Not present Optional
Default-View-Attribute[-#] Integer indicating if the column is part of the default view's attribute and in which order (0 = not present, 1 = first, 2 = second, etc.) Not present Optional
Relational Model Mappings
Table-Name[-#] User-level table name. None Optional
Column-Name[-#] User-level column name. None Optional
System-Table-Name[-#] Internal name used to refer to the table which allows duplicate user-level names. None Optional
System-Column-Name[-#] Internal name used to refer to the column which allows duplicate user-level names. None Optional
Foreign-Key-
System-Table-Name[-#]
Name of the related foreign key table, if any. Because of the assumption of a single rooted branching hierarchical data model, this is assumed to always refer to a single parent. None Optional
Foreign-Key-
System-Column-Name[-#]
Name of the related foreign key column, if any. Because of the assumption of a single rooted branching hierarchical data model, this is assumed to always refer to a single parent. None Optional
In-Root-Table[-#] Boolean flag indicating whether the table is the root table or not (0 = not in root table, 1 = in root table). Not in root table Optional

Example Schema Description

Below is a very simple schema description:


RDM-Query-Language: Schema-Basic

Below is a very simple query language intended to return the pieces of the schema that have a particular attributed defined. The syntax is that Scope is defined Attribute.

To send the entire schema:

To send the entire only those pieces in the default view:

To send the entire only those pieces in the default view, and impose a view:


Taxonomy Description

A Taxonomy description defines a hierarchical taxonomic structure in which documents can be organized. The Taxonomy description is represented as a stream of SOIF objects that consists of one TAXONOMY object and one or more CLASSIFICATION objects. The @TAXONOMY object defines the identifier of the taxonomy, as well as a human-readable description of the taxonomy. The @CLASSIFICATION object defines a single classification or category within the taxonomy. Below is an example taxonomy with the following layout:

A taxonomy description contains the following attributes:

@TAXONOMY (required)
Attribute Description
Id a string identifying the taxonomy.
@TAXONOMY (optional)
Description human-readable one-line description of the content

A classification description contains the following attributes:

@CLASSIFICATION (required)
Attribute Description
Id a string identifying the classification.
Parent-Id a string identifying the parent classification
Taxonomy-Id a string identifying to which taxonomy the classification belongs.
@CLASSIFICATION (optional)
Description human-readable one-line description of the content


RDM-Query-Language: Taxonomy-Basic

The following query language allows the client to retrieve only the pieces of a Taxonomy that are a given distance away from the given Classification. For example, return only the classifications that are immediately under the root.

The syntax of the query language (only 2 commands: descendant and children and their aliases) is as follows:

To send the entire taxonomy (node and everything below):

To send the entire only the children (e.g., node and 1 directly below):


Status Message

A status message is a HTML 2.0 document.


How does RDM work over HTTP?

RDM's request/response model is well-suited for HTTP. RDM messages can be transfered between processes across a network via HTTP/1.0. RDM uses the HTTP/1.0 Content-type application/x-rdm to transfer an RDM message via HTTP.

Client Access

Clients can submit RDM messages in 2 ways via HTTP/1.0:

  1. Using the POST method; or
  2. Using the GET method

In both cases, the client receives back an HTTP/1.0 response. The response's entity header Content-type must be set to application/x-rdm and the entity body contains the RDM response message. We encourage that the response also contain the Expires and Content-length entity-headers.

Using the POST method

A client can send a RDM message by submitting an HTTP/1.0 request using:

Request Line Entity Header Value
Method POST
Request-URI /rdm/incoming
Content-type application/x-rdm
Content-length recommended
Content-encoding optional
User-agent recommended
Authorization optional

Using the GET method

When submitting RDM requests using the GET method, a client can send an RDM request for one of the supported RDM-Types below by submitting an HTTP/1.0 request. Use the GET method and encode the content of the RDM request inside of the Request-URI which starts with /rdm/incoming. Encode the RDM attributes as per the application/x-www-form-urlencoded specification (e.g., %ab hex encodings):

RDM Type WWW Form
Attribute=Value
RD-Request
type=rd-request
csid=csid
ql=query-language
scope=scope
view-attributes=view-attributes
view-hits=view-hits
view-order=view-order
Status-Request
type=status-request
csid=csid
Schema-Request
type=schema-description-request
csid=csid
Server-Request
type=server-description-request
csid=csid
Taxonomy-Request
type=taxonomy-description-request
csid=csid
Example URLs

To test the existence/readiness of an RDM server, use the status request RDM message:

To collect all of the RDs from an RDM server, use the RD Request RDM message and Gatherer query language:

To send a query to the RDM server, use the RD Request RDM message and a toy "sample-keyword-ql" query language:


MIME Type Registration

Rather than using application/x-rdm for the RDM Content-Type, we'd like to register a type name officialy with the appropriate Internet standards bodies. Perhaps use application/rdm or protocol/rdm or whatever.


Acknowledgements

Steve Pennebaker and Don Eastman helped define and stress-test the schema and taxonomy definition languages. Many people from the Harvest community have contributed to SOIF, and provided a great deal of input on the Harvest architecture and mechanisms. Much of this input formed the basis of RDM (e.g., HTTP-based transport layer, query language negotiation, etc.). They are acknowledged in detail on the Harvest home page.


References

  1. T. Berners-Lee, L. Masinter, and M. McCahill, Uniform Resource Locators, RFC 1738.
  2. T. Berners-Lee et al., HTML 2.0 specification.
  3. T. Berners-Lee et al., HTTP 1.0 specification.
  4. D. Hardy, M. Schwartz, and D. Wessels, Harvest User's Manual -- Version 1.3.
  5. D. Hardy, M. Schwartz, and D. Wessels, Summary Object Interchange Format (SOIF).