NODAL: Introduction

Lee Iverson
OHS Design Group

Lee Iverson <leei@ai.sri.com>
Last modified: Wed May 2 10:20:02 2001

What Do We Want?

Through years of discussion and months of consensus building within Doug Engelbart's Open Hyperdocument System meeting groups, we have developed a consensus on a certain set of design principles for a collaborative document repository. It must be:

Document-Oriented: We start from a point-of-view in which we want to support the sharing and interchange of hyperlinked documents. Traditionally, this means SGML, HTML and XML, so support for those standards is a minimum. However, the ubiquity of Microsoft Word and the Office toolset suggests that we want to support their data formats as well. Consider also plain text documents, and various multimedia types such as images, audio and video formats. Supporting "documents" begins to appear as if it is a general data modeling problem.
Filesystem-Like: In order to be usable as a subsystem for publication and interoperability with existing systems, the repository will need to appear as a traditional filesystem in certain contexts. The whole-document interfaces this will provide should be addressable as per a traditional hierarchical filesystem.
Database-like: Databases systems were a response to the lack of granular shareability and searchability of standard filesystems. The repository must support a wide variety of methods for accessing content and properly handle interlocking of updates from a variety of users. Maybe what we are developing is a kind of next generation SQL for hyperdocuments?
Shared and Reusable: One of the basic tenets of collaboration is to share knowledge and allow teams to build on the knowledge of any of their members. Clearly then, one of the most basic goals of a repository designed to support collaborative activity is to allow for the sharing and reuse of data and knowledge within communities of users.
Adaptable Interfaces: Clearly if we wish to attain ubiquity it is necessary to build the system around interfaces which are adaptable to a wide range of implementation languages and network interfaces. We wish to develop an abstract specification which can be mapped to implementations in C/C++, Java, LISP, and Python, various component architectures such as COM, CORBA and Gnome and a variety of different network protocols such as HTTP, WDDX, FTP, POP/IMAP, and IRC.
Granular Interfaces: One of the greatest problems with using traditional filesystems as shareable document repositories is the lack of granularity of the units for sharing (e.g. if I update a single line in a file, another user should not need to reload the entire file). We assume then that the basic interfaces into the document data models should allow for navigation and editing within the document as easily as between documents.
Addressable and Linkable: In order to build higher level organizations of knowledge and dialog, it is clearly necessary to be able to reference and reuse information from other documents at arbitrary granularity. It is thus necessary to be able to insert links between documents (and of course hyperlinks are essential components of HTML and XML in any case). We must thus provide the ability to address any point (or range) within one of our documents and build links to and from that point.
Synchronous and Asynchronous: Collaboration comes in many forms, from face-to-face meetings, to telephone conferences to independent development of source code. If we wish the repository to be an intermediary for all of these types of activity, it is necessary to understand and support the ability to share documents both synchronously (for live collaboration) and asynchronously.
Versioned and Attributed: If there has been one clear observation made in the both sociological and computational analysis of what makes collaborative communities work is that trust is primary. Sharing knowledge or information is a much more trustable activity if a user has confidence that it will be used and his/her contribution will be properly acknowledged. From a system point-of-view, tracking changes with fine-grained version control and detailed attribution can be seen as a means to that end. When documents are more active, such as with computer source code, this versioning and attribution becomes critical in tracking enhancements and bugs.
Control of Security and Privacy: Once again, trust is essential for a system to promote community and collaboration. Without the ability of users and managers to control access and modifiability of their contributions. Moreover, many different levels of publication of selected materials may be necessary within a single repository (e.g. team-only, department-only, organization-only, or public). It should be possible to easily control such access and have confidence in the security of the protections provided.

In essence, we are describing a need for a new kind of database language which has a standard, language-independent API, a document-oriented data modelling language, a fully addressable and navigable object heirarchy, and an extensible security model. In the next section we will propose just such an architecture.

NODAL: An Object-Oriented SQL for Documents

Careful readers will notice that much of the motivational requirements outlined above are exactly those which lead to the design and development of relational database management systems (RDMS) 20 years ago and the development of SQL 10-15 years ago. There was a need for an interoperable, shareable, secure resource behind many kinds of enterprise-level applications. These database management systems filled that need admirably and SQL became a standard language for modelling data in RDBMS's and formulating updates and queries to those databases.

Unfortunately, RDMS systems do not adapt well to the kinds of graph-like document structures that are represented by modern markup languages and various forms of knowledge representation languages (e.g. XTM, RDF/S, DAML+OIL, etc.). They typically do not handle tree or graph structures well, do not track change histories for table rows, and do not provide granular and adaptable control over security and privacy at levels lower than the individual table. Moreover, their networking models severely limit the ability to maintain small-scale local caches of data and operate well when transactions and queries are distributed over a wide-area network.

We would like to suggest a new paradigm. NODAL is a language for data modelling that directly and efficiently supports arbitrarily complicated typed graph structures with a very small number of general building blocks. This language is able to express the internal structure of a wide variety of document formats from markup languages to multimedia files and will allow applications and knowledge bases to access and share their contents heedless of the containing data format.

NODAL implementations will automatically manage distributed, multi-user change tracking, attribution and historical recovery of individual nodes in these graphs. The design is expressed using modern object-oriented principles and will allow server implementations to support a variety of access protocols. The client APIs will allow applications to be built directly on top of this data modelling language so as to painlessly support synchronous and asynchronous collaboration in the development and exploitation of shared documents and knowledge bases in a wide variety of user-oriented tools.

Why Not Just XML?

Many proponents of XML technology (and XML databases) suggest that XML and recently the XML Schema language is a basis for supporting exactly these capabilities. They claim that XML is a general data modelling language and that Document Object Model (DOM) interfaces to XML database implementations form a general, interoperable basis for shared document management. If that is so, then why hasn't this revolution already started? The simple answer is that XML has both problems and limitations that make its applicability to the range of problems we hope to solve somewhat limited. One of the most fundamental problems is the lack of separation between the XML data model and the expressive syntax. This complicates many aspects of the design of XML-aware applications and libraries to the point that XML-based specifications have become enormously complicated (e.g. XML Schema) Pointedly, the XML language does not even have a broadly recognized data model of its own. The XML Infoset is still an area of debate and the lack of consensus on its structure makes the continued development of such things as the DOM itself increasingly difficult.

In the NODAL design, we have expressly separated the data modelling language and APIs from the serialization of said data so that we may support a wide variety of serializations for the same document model. In this context, we see XML as a tool we can both use to build NODAL serializations and protocols and as a particular target application which may be built using the NODAL tools. We might thus define an XML Infoset using the NODAL language and use the client APIs as a basis for a DOM implementation.

Related Work

Kimber's Groves

Subversion

INXAR

Castor

Requirements Table

Over the course of discussions in Douglas Engelbart's OHS group and eventually a small group of collaborators referred to as Nodeland, we have come to some consensus on a minimal set of architectural requirements for implementing an Open Hyperdocument System. I have selected from amongst those requirements, a subset which I feel are directly addressed by the NODAL design. I list these, with some explanation below and will provide hyperlinks to the sections of the design documents in which these requirements are addressed.

IO: Interoperability
- LI: Language Independence
- II: Implementation Independence
- AI: Application Independence
- LD: Legacy Document Support
TC: Transclusion (Content Reusability)
GA: Granular Addressability
- PA: Path addressing
- OA: Object addressing
GV: Granular Versioning
- AU: Audit trail
- RV: Revisions, versions and history
- AT: Attribution
HL: Hyperlinks
- BL: Bidirectional Linking
- EL: External Links
- TL: Typed Links
DS: Distributed & Synchronized
- SC: Synchronous Collaboration
- AC: Asynchronous Collaboration
- EM: EMail Integration
- NM: Notification & Messaging
ON: Ontologies
OT: Ontology Translations
SA: Secure, Access-controlled
DT: Data translator API