NODAL: Introduction
    Lee Iverson
OHS Design Group
    
      
	Lee Iverson <leei@ai.sri.com>
      
      
Last modified: Wed May  2 10:20:02 2001
    
    
    What Do We Want?
    Through years of discussion and months of consensus building within
      Doug Engelbart's Open Hyperdocument System meeting groups, we have
      developed a consensus on a certain set of design principles for
      a collaborative document repository.  It must be:
    
    - Document-Oriented:
      We start from a point-of-view in which we want
      to support the sharing and interchange of hyperlinked documents.
      Traditionally, this means SGML, HTML and XML, so support for those
      standards is a minimum.  However, the ubiquity of Microsoft Word and
      the Office toolset suggests that we want to support their data formats
      as well.  Consider also plain text documents, and various multimedia
      types such as images, audio and video formats.
      Supporting "documents" begins to appear as if it is a general
      data modeling problem.
- Filesystem-Like:
      In order to be usable as a subsystem for publication
      and interoperability with existing systems, the repository will need to
      appear as a traditional filesystem in certain contexts.  The
      whole-document interfaces this will provide
      should be addressable as per a traditional
      hierarchical filesystem.
- Database-like:
      Databases systems were a response to the lack of granular
      shareability and
      searchability of standard filesystems.  The repository must support
      a wide variety
      of methods for accessing content and properly handle interlocking of
      updates from a variety of users. Maybe what we are developing is
      a kind of next generation SQL for hyperdocuments?
- Shared and Reusable:
      One of the basic tenets of collaboration is to
      share knowledge and allow teams to build on the knowledge of any of
      their members.  Clearly then, one of the most basic goals of a repository
      designed to support collaborative activity is to allow for the sharing
      and reuse of data and knowledge within communities of users.
- Adaptable Interfaces:  Clearly if we wish to attain ubiquity
      it is necessary to build the system around interfaces which are adaptable
      to a wide range of implementation languages and network interfaces.  We
      wish to develop an abstract specification which can be mapped to
      implementations in C/C++, Java, LISP, and Python, various component
      architectures such as COM, CORBA and Gnome and a variety of different
      network protocols such as HTTP, WDDX, FTP, POP/IMAP, and IRC.
- Granular Interfaces:
      One of the greatest problems with using traditional filesystems
      as shareable document repositories is the lack of granularity of
      the units for sharing (e.g. if I update a single line in a file,
      another user should not need to reload the entire file).  We
      assume then that the basic interfaces into the document data
      models should allow for navigation and editing within
      the document as easily as between documents.
- Addressable and Linkable:
      In order to build higher level organizations of knowledge and
      dialog, it is clearly necessary to be able to reference and
      reuse information from other documents at arbitrary granularity.
      It is thus necessary to be able to insert links between
      documents (and of course hyperlinks are essential components of
      HTML and XML in any case).  We must thus provide the ability to
      address any point (or range) within one of our
      documents and build links to and from that point.
- Synchronous and Asynchronous:
      Collaboration comes in many forms, from face-to-face meetings,
      to telephone conferences to independent development of source
      code.  If we wish the repository to be an intermediary for all
      of these types of activity, it is necessary to understand and
      support the ability to share documents both synchronously (for
      live collaboration) and asynchronously.
- Versioned and Attributed:
      If there has been one clear observation made in the both
      sociological and computational analysis of what makes
      collaborative communities work is that trust is primary.
      Sharing knowledge or information is a much more trustable
      activity if a user has confidence that it will be used and
      his/her contribution will be properly acknowledged.  From a
      system point-of-view, tracking changes with fine-grained version
      control and detailed attribution can be seen as a means to that
      end.  When documents are more active, such as with computer
      source code, this versioning and attribution becomes critical in
      tracking enhancements and bugs.
- Control of Security and Privacy:
      Once again, trust is essential for a system to promote community
      and collaboration.  Without the ability of users and managers to
      control access and modifiability of their contributions.
      Moreover, many different levels of publication of selected
      materials may be necessary within a single repository
      (e.g. team-only, department-only, organization-only, or public).
      It should be possible to easily control such access and have
      confidence in the security of the protections provided.
In essence, we are describing a need for a new kind of database
      language which has a standard, language-independent API,
      a document-oriented data modelling language, a fully addressable
      and navigable object heirarchy, and an extensible security
      model. In the next section we will propose just such an
      architecture.
    
    NODAL: An Object-Oriented SQL for Documents
     Careful readers will notice that much of the motivational
      requirements outlined above are exactly those which lead to the
      design and development of relational database management systems
      (RDMS) 20 years ago and the development of SQL 10-15 years ago.
      There was a need for an interoperable, shareable, secure
      resource behind many kinds of enterprise-level applications.
      These database management systems filled that need admirably and
      SQL became a standard language for modelling data in RDBMS's and
      formulating updates and queries to those databases.
      
     Unfortunately, RDMS systems do not adapt well to the kinds of
      graph-like document structures that are represented by modern
      markup languages and various forms of knowledge representation
      languages (e.g. XTM, RDF/S, DAML+OIL, etc.). They typically do
      not handle tree or graph structures well, do not track change
      histories for table rows, and do not provide granular and
      adaptable control over security and privacy at levels lower than
      the individual table.  Moreover, their networking models
      severely limit the ability to maintain small-scale local caches
      of data and operate well when transactions and queries are
      distributed over a wide-area network. 
     We would like to suggest a new paradigm.  NODAL is a language
      for data modelling that directly and efficiently supports
      arbitrarily complicated typed graph structures with a very small
      number of general building blocks.  This language is able to
      express the internal structure of a wide variety of document
      formats from markup languages to multimedia files and will allow
      applications and knowledge bases to access and share their
      contents heedless of the containing data format. 
     NODAL implementations will automatically
      manage distributed, multi-user change tracking, attribution and
      historical recovery of individual nodes in these graphs. The
      design is expressed using modern object-oriented principles
      and will allow server implementations to support a variety of
      access protocols. The client APIs will allow applications to be
      built directly on top of this data modelling language so as to
      painlessly support synchronous and asynchronous collaboration in
      the development and exploitation of shared documents and
      knowledge bases in a wide variety of user-oriented tools.
    
    Why Not Just XML?
    Many proponents of XML technology (and XML databases) suggest
      that XML and recently the XML Schema language is a basis for
      supporting exactly these capabilities.  They claim that XML is a
      general data modelling language and that Document Object Model
      (DOM) interfaces to XML database implementations form a general,
      interoperable basis for shared document management.  If that is
      so, then why hasn't this revolution already started?  The simple
      answer is that XML has both problems and limitations that make
      its applicability to the range of problems we hope to solve
      somewhat limited.  One of the most fundamental problems is the
      lack of separation between the XML data model and the expressive
      syntax.  This complicates many aspects of the design of
      XML-aware applications and libraries to the point that XML-based
      specifications have become enormously complicated (e.g. XML
      Schema) Pointedly, the XML language does not even have a broadly
      recognized data model of its own.  The XML Infoset is still an
      area of debate and the lack of consensus on its structure makes
      the continued development of such things as the DOM itself
      increasingly difficult.
    In the NODAL design, we have expressly separated the data
      modelling language and APIs from the serialization of said data
      so that we may support a wide variety of serializations for the
      same document model. In this context, we see XML as a tool we
      can both use to build NODAL serializations and protocols and as
      a particular target application which may be built using the
      NODAL tools. We might thus define an XML Infoset using the
      NODAL language and use the client APIs as a basis for a DOM
      implementation. 
    Related Work
    Kimber's Groves
    Subversion
    INXAR
    Castor
    Requirements Table
     Over the course of discussions in Douglas Engelbart's OHS group
    and eventually a small group of collaborators referred to as
    Nodeland, we have come to some consensus on a minimal set of
    architectural requirements for implementing an Open Hyperdocument
    System.  I have selected from amongst those requirements, a subset
    which I feel are directly addressed by the NODAL design.  I list
    these, with some explanation below and will provide hyperlinks to
    the sections of the design documents in which these requirements
    are addressed.
    
    
      - IO: Interoperability
	
	  - LI: Language Independence
- II: Implementation Independence
- AI: Application Independence
- LD: Legacy Document Support
 
- TC: Transclusion (Content Reusability)
-  GA: Granular Addressability
	
	  - PA: Path addressing
- OA: Object addressing
 
- GV: Granular Versioning
	
	  - AU: Audit trail
- RV: Revisions, versions and history
- AT: Attribution
 
- HL: Hyperlinks
	
	  - BL: Bidirectional Linking
- EL: External Links
- TL: Typed Links
 
- DS: Distributed & Synchronized
	
	  - SC: Synchronous Collaboration
- AC: Asynchronous Collaboration
- EM: EMail Integration
- NM: Notification & Messaging
 
- ON: Ontologies
- OT: Ontology Translations
- SA: Secure, Access-controlled
- DT: Data translator API