NODAL: Motivation
    Lee Iverson
OHS Design Group
    
      
	Lee Iverson <leei@ai.sri.com>
      
      
Last modified: Wed May  2 10:21:38 2001
    
    
    Collaboration is a Basic Need
    Mainstream Document Collaboration
    One of the most fundamental barriers to collaboration between both
      users and applications is the granularity of data access.  File
      systems organize information in directories of "files". Sharing
      files is a simple process of placing them on a networked file
      system or simply sending them to a group via electronic mail or
      some other message passing system. This model assumes either that
      files change rarely or that they have single authors or both.
      Sometimes file sharing is augmented or replaced with email
      attachments.
    What happens however when we try to use this shared file
      and email approach to collaboratively create documents or
      accumulate group knowledge?  Quite simply, a difficult,
      cumbersome process. A few examples will illustrate some of the
      problems:
    Microsoft Word
    Microsoft has made some effort to allow multiple users to
      exchange and edit documents produced with Microsoft Word.
      The sharing model is simple: a user sends (or makes available) a
      copy of their Word document to another user who edits (or
      reviews) it and sends it back.  The modified Word document
      contains a change history which may be navigated by the first
      user (the change integrator) and individual changes made by the
      other user may be approved or rejected via a GUI interface.
    Word Advantages
    
      - Integrated: The version tracking is completely
	integrated into 
	the document and application, so a user need learn no other
	tools than the additional menu entries provided by the Version
	Control submenu.
Word Problems
    
      - Hidden: Since all of the oustanding version history is
	stored in the document itself, many users not familiar with
	the details of the Version Control toolbox are unaware that
	this content is still available in their file.  This is a huge
	information security hole and has been anecdotally responsible
	for significant problems when "uncleaned" files are passed on
	or delivered to outsiders.
- Bloat: There are two ways in which file systems can become
	bloated with these "fat" Word files.  The first has to do with
	the fact that Word documents (especially single user
	documents) can become quite large even when their immediately
	visible content is fairly small, simply because of the change
	history being carried around.  The second has to with people's
	email folders, which often fill up with many versions of the
	same set of "shared" Word documents as they are passed around
	for editing or for integration of contributions.  Since one
	often wants to retain a record of the reasons for changes,
	these emails are often kept in mail folders to provide a trace
	of the document history.  This behaviour, which Word docs
	explicitly recommend, can lead to an enormous waste of file
	system resources on Word attachements in mail folders.
- Asynchronous: Clearly the model does not allow for live
	document sharing between users, with an email exchange or
	shared file system store required as part of any exhange of
	information.
- Ad Hoc: Clearly people-management skills must be an integral
	part of any attempt to use Word as a basis for collaboration
	as it provides no support for anything other than a single
	user as change integrator.  It does provide some ability to
	enforce this strategy by allowing that user to "lock" the
	document and require others to submit "review" comments
	instead of actual edits, but that can quickly become
	cumbersome as well.
CVS
    The Concurrent Version System (CVS) was developed
      as a response to problems perceived by distributed groups of
      developers when using the Revision Control System (RCS) for
      collaborative software development (using plain text files).
      RCS maintained the integrity of its version control by requiring
      users to check out
      files in the repository, modify them and then check them back
      in.  While a file was checked out by one user, it was
      unavailable for modification by other developers (locked). In
      may kinds of scenarios, especially software development, a
      single developer may need to change many files in concert before
      a new consistent (and thus check-in-able) state is achieved.
      Locking all of these files while this one developer is
      attempting to attain consistency is untenable as soon as more
      than two or three developers are working on a single project.
    
    
      CVS solves this problem by removing the locking requirement and
      allowing multiple users to all be editing the same set of files
      simultaneously.  Users maintain local versions of the repository
      and make changes as they wish.  A user occassionally "updates"
      his local copy by asking CVS to apply those changes made by
      other users to his local copy. This may lead to conflicts, in
      which a section of a file which the user has modified has also
      been modified by someone else.  It is then up to the user to
      resolve these conflicts be editing with a text editor.
      When checking changes back in to the
      repository, a user must first "update" after which an automatic
      process detects the changes made to a particular file and then
      sends these to the repository, which updates its database.
    CVS Advantages
    
      - Locking: CVS identified the clear problem posed by locking
	when multiple files are being shared by multiple users.
- Flexible security: The CVS repository is accessible in a
	number of different ways, including via password-protected
	network channels and SSH-secured logins.  Moreover, read and
	write access can be controlled independently, although not
	granularly.
CVS Problems
    
      - Text files only: While CVS does handle non-text files, it
	does so by treating them as unitary data items doesn't ttrack
	differences at all.
- Line oriented: The algorithms that CVS uses to determine the
	differences between versions of a file are not exact and flag
	blocks of lines which have changed. This leaves the system
	very sensitive to changes in formatting or layout of
	structured text files.  It is thus almost completely useless
	for Web development if any of the users adopts editing tools
	which don't maintain "extraneous" text formatting information
	(e.g. virtually any structured XML or HTML
	editor).
- Not quite live: Since the notification methods that CVS uses
	are completely ad-hoc and asynchronous, it cannot be used for
	live collaboration at all.  Moreover, even within its
	asynchronous model, conflict management and workflow
	management remain big problems.
A New Model for Collaborative Work