NODAL: Introduction
Lee Iverson
OHS Design Group
Lee Iverson <leei@ai.sri.com>
Last modified: Wed May 2 10:20:02 2001
What Do We Want?
Through years of discussion and months of consensus building within
Doug Engelbart's Open Hyperdocument System meeting groups, we have
developed a consensus on a certain set of design principles for
a collaborative document repository. It must be:
- Document-Oriented:
We start from a point-of-view in which we want
to support the sharing and interchange of hyperlinked documents.
Traditionally, this means SGML, HTML and XML, so support for those
standards is a minimum. However, the ubiquity of Microsoft Word and
the Office toolset suggests that we want to support their data formats
as well. Consider also plain text documents, and various multimedia
types such as images, audio and video formats.
Supporting "documents" begins to appear as if it is a general
data modeling problem.
- Filesystem-Like:
In order to be usable as a subsystem for publication
and interoperability with existing systems, the repository will need to
appear as a traditional filesystem in certain contexts. The
whole-document interfaces this will provide
should be addressable as per a traditional
hierarchical filesystem.
- Database-like:
Databases systems were a response to the lack of granular
shareability and
searchability of standard filesystems. The repository must support
a wide variety
of methods for accessing content and properly handle interlocking of
updates from a variety of users. Maybe what we are developing is
a kind of next generation SQL for hyperdocuments?
- Shared and Reusable:
One of the basic tenets of collaboration is to
share knowledge and allow teams to build on the knowledge of any of
their members. Clearly then, one of the most basic goals of a repository
designed to support collaborative activity is to allow for the sharing
and reuse of data and knowledge within communities of users.
- Adaptable Interfaces: Clearly if we wish to attain ubiquity
it is necessary to build the system around interfaces which are adaptable
to a wide range of implementation languages and network interfaces. We
wish to develop an abstract specification which can be mapped to
implementations in C/C++, Java, LISP, and Python, various component
architectures such as COM, CORBA and Gnome and a variety of different
network protocols such as HTTP, WDDX, FTP, POP/IMAP, and IRC.
- Granular Interfaces:
One of the greatest problems with using traditional filesystems
as shareable document repositories is the lack of granularity of
the units for sharing (e.g. if I update a single line in a file,
another user should not need to reload the entire file). We
assume then that the basic interfaces into the document data
models should allow for navigation and editing within
the document as easily as between documents.
- Addressable and Linkable:
In order to build higher level organizations of knowledge and
dialog, it is clearly necessary to be able to reference and
reuse information from other documents at arbitrary granularity.
It is thus necessary to be able to insert links between
documents (and of course hyperlinks are essential components of
HTML and XML in any case). We must thus provide the ability to
address any point (or range) within one of our
documents and build links to and from that point.
- Synchronous and Asynchronous:
Collaboration comes in many forms, from face-to-face meetings,
to telephone conferences to independent development of source
code. If we wish the repository to be an intermediary for all
of these types of activity, it is necessary to understand and
support the ability to share documents both synchronously (for
live collaboration) and asynchronously.
- Versioned and Attributed:
If there has been one clear observation made in the both
sociological and computational analysis of what makes
collaborative communities work is that trust is primary.
Sharing knowledge or information is a much more trustable
activity if a user has confidence that it will be used and
his/her contribution will be properly acknowledged. From a
system point-of-view, tracking changes with fine-grained version
control and detailed attribution can be seen as a means to that
end. When documents are more active, such as with computer
source code, this versioning and attribution becomes critical in
tracking enhancements and bugs.
- Control of Security and Privacy:
Once again, trust is essential for a system to promote community
and collaboration. Without the ability of users and managers to
control access and modifiability of their contributions.
Moreover, many different levels of publication of selected
materials may be necessary within a single repository
(e.g. team-only, department-only, organization-only, or public).
It should be possible to easily control such access and have
confidence in the security of the protections provided.
In essence, we are describing a need for a new kind of database
language which has a standard, language-independent API,
a document-oriented data modelling language, a fully addressable
and navigable object heirarchy, and an extensible security
model. In the next section we will propose just such an
architecture.
NODAL: An Object-Oriented SQL for Documents
Careful readers will notice that much of the motivational
requirements outlined above are exactly those which lead to the
design and development of relational database management systems
(RDMS) 20 years ago and the development of SQL 10-15 years ago.
There was a need for an interoperable, shareable, secure
resource behind many kinds of enterprise-level applications.
These database management systems filled that need admirably and
SQL became a standard language for modelling data in RDBMS's and
formulating updates and queries to those databases.
Unfortunately, RDMS systems do not adapt well to the kinds of
graph-like document structures that are represented by modern
markup languages and various forms of knowledge representation
languages (e.g. XTM, RDF/S, DAML+OIL, etc.). They typically do
not handle tree or graph structures well, do not track change
histories for table rows, and do not provide granular and
adaptable control over security and privacy at levels lower than
the individual table. Moreover, their networking models
severely limit the ability to maintain small-scale local caches
of data and operate well when transactions and queries are
distributed over a wide-area network.
We would like to suggest a new paradigm. NODAL is a language
for data modelling that directly and efficiently supports
arbitrarily complicated typed graph structures with a very small
number of general building blocks. This language is able to
express the internal structure of a wide variety of document
formats from markup languages to multimedia files and will allow
applications and knowledge bases to access and share their
contents heedless of the containing data format.
NODAL implementations will automatically
manage distributed, multi-user change tracking, attribution and
historical recovery of individual nodes in these graphs. The
design is expressed using modern object-oriented principles
and will allow server implementations to support a variety of
access protocols. The client APIs will allow applications to be
built directly on top of this data modelling language so as to
painlessly support synchronous and asynchronous collaboration in
the development and exploitation of shared documents and
knowledge bases in a wide variety of user-oriented tools.
Why Not Just XML?
Many proponents of XML technology (and XML databases) suggest
that XML and recently the XML Schema language is a basis for
supporting exactly these capabilities. They claim that XML is a
general data modelling language and that Document Object Model
(DOM) interfaces to XML database implementations form a general,
interoperable basis for shared document management. If that is
so, then why hasn't this revolution already started? The simple
answer is that XML has both problems and limitations that make
its applicability to the range of problems we hope to solve
somewhat limited. One of the most fundamental problems is the
lack of separation between the XML data model and the expressive
syntax. This complicates many aspects of the design of
XML-aware applications and libraries to the point that XML-based
specifications have become enormously complicated (e.g. XML
Schema) Pointedly, the XML language does not even have a broadly
recognized data model of its own. The XML Infoset is still an
area of debate and the lack of consensus on its structure makes
the continued development of such things as the DOM itself
increasingly difficult.
In the NODAL design, we have expressly separated the data
modelling language and APIs from the serialization of said data
so that we may support a wide variety of serializations for the
same document model. In this context, we see XML as a tool we
can both use to build NODAL serializations and protocols and as
a particular target application which may be built using the
NODAL tools. We might thus define an XML Infoset using the
NODAL language and use the client APIs as a basis for a DOM
implementation.
Related Work
Kimber's Groves
Subversion
INXAR
Castor
Requirements Table
Over the course of discussions in Douglas Engelbart's OHS group
and eventually a small group of collaborators referred to as
Nodeland, we have come to some consensus on a minimal set of
architectural requirements for implementing an Open Hyperdocument
System. I have selected from amongst those requirements, a subset
which I feel are directly addressed by the NODAL design. I list
these, with some explanation below and will provide hyperlinks to
the sections of the design documents in which these requirements
are addressed.
- IO: Interoperability
- LI: Language Independence
- II: Implementation Independence
- AI: Application Independence
- LD: Legacy Document Support
- TC: Transclusion (Content Reusability)
- GA: Granular Addressability
- PA: Path addressing
- OA: Object addressing
- GV: Granular Versioning
- AU: Audit trail
- RV: Revisions, versions and history
- AT: Attribution
- HL: Hyperlinks
- BL: Bidirectional Linking
- EL: External Links
- TL: Typed Links
- DS: Distributed & Synchronized
- SC: Synchronous Collaboration
- AC: Asynchronous Collaboration
- EM: EMail Integration
- NM: Notification & Messaging
- ON: Ontologies
- OT: Ontology Translations
- SA: Secure, Access-controlled
- DT: Data translator API