Lee Iverson <email@example.com>
OHS Design Group
Last modified: Wed May 2 10:21:38 2001
Collaboration is a Basic Need
Mainstream Document Collaboration
One of the most fundamental barriers to collaboration between both
users and applications is the granularity of data access. File
systems organize information in directories of "files". Sharing
files is a simple process of placing them on a networked file
system or simply sending them to a group via electronic mail or
some other message passing system. This model assumes either that
files change rarely or that they have single authors or both.
Sometimes file sharing is augmented or replaced with email
What happens however when we try to use this shared file
and email approach to collaboratively create documents or
accumulate group knowledge? Quite simply, a difficult,
cumbersome process. A few examples will illustrate some of the
Microsoft has made some effort to allow multiple users to
exchange and edit documents produced with Microsoft Word.
The sharing model is simple: a user sends (or makes available) a
copy of their Word document to another user who edits (or
reviews) it and sends it back. The modified Word document
contains a change history which may be navigated by the first
user (the change integrator) and individual changes made by the
other user may be approved or rejected via a GUI interface.
- Integrated: The version tracking is completely
the document and application, so a user need learn no other
tools than the additional menu entries provided by the Version
- Hidden: Since all of the oustanding version history is
stored in the document itself, many users not familiar with
the details of the Version Control toolbox are unaware that
this content is still available in their file. This is a huge
information security hole and has been anecdotally responsible
for significant problems when "uncleaned" files are passed on
or delivered to outsiders.
- Bloat: There are two ways in which file systems can become
bloated with these "fat" Word files. The first has to do with
the fact that Word documents (especially single user
documents) can become quite large even when their immediately
visible content is fairly small, simply because of the change
history being carried around. The second has to with people's
email folders, which often fill up with many versions of the
same set of "shared" Word documents as they are passed around
for editing or for integration of contributions. Since one
often wants to retain a record of the reasons for changes,
these emails are often kept in mail folders to provide a trace
of the document history. This behaviour, which Word docs
explicitly recommend, can lead to an enormous waste of file
system resources on Word attachements in mail folders.
- Asynchronous: Clearly the model does not allow for live
document sharing between users, with an email exchange or
shared file system store required as part of any exhange of
- Ad Hoc: Clearly people-management skills must be an integral
part of any attempt to use Word as a basis for collaboration
as it provides no support for anything other than a single
user as change integrator. It does provide some ability to
enforce this strategy by allowing that user to "lock" the
document and require others to submit "review" comments
instead of actual edits, but that can quickly become
cumbersome as well.
The Concurrent Version System (CVS) was developed
as a response to problems perceived by distributed groups of
developers when using the Revision Control System (RCS) for
collaborative software development (using plain text files).
RCS maintained the integrity of its version control by requiring
users to check out
files in the repository, modify them and then check them back
in. While a file was checked out by one user, it was
unavailable for modification by other developers (locked). In
may kinds of scenarios, especially software development, a
single developer may need to change many files in concert before
a new consistent (and thus check-in-able) state is achieved.
Locking all of these files while this one developer is
attempting to attain consistency is untenable as soon as more
than two or three developers are working on a single project.
CVS solves this problem by removing the locking requirement and
allowing multiple users to all be editing the same set of files
simultaneously. Users maintain local versions of the repository
and make changes as they wish. A user occassionally "updates"
his local copy by asking CVS to apply those changes made by
other users to his local copy. This may lead to conflicts, in
which a section of a file which the user has modified has also
been modified by someone else. It is then up to the user to
resolve these conflicts be editing with a text editor.
When checking changes back in to the
repository, a user must first "update" after which an automatic
process detects the changes made to a particular file and then
sends these to the repository, which updates its database.
- Locking: CVS identified the clear problem posed by locking
when multiple files are being shared by multiple users.
- Flexible security: The CVS repository is accessible in a
number of different ways, including via password-protected
network channels and SSH-secured logins. Moreover, read and
write access can be controlled independently, although not
- Text files only: While CVS does handle non-text files, it
does so by treating them as unitary data items doesn't ttrack
differences at all.
- Line oriented: The algorithms that CVS uses to determine the
differences between versions of a file are not exact and flag
blocks of lines which have changed. This leaves the system
very sensitive to changes in formatting or layout of
structured text files. It is thus almost completely useless
for Web development if any of the users adopts editing tools
which don't maintain "extraneous" text formatting information
(e.g. virtually any structured XML or HTML
- Not quite live: Since the notification methods that CVS uses
are completely ad-hoc and asynchronous, it cannot be used for
live collaboration at all. Moreover, even within its
asynchronous model, conflict management and workflow
management remain big problems.
A New Model for Collaborative Work