September 05, 2008

Relating Documents via User Activity

Elin Pedersen (Google) and David McDonald (Uni Washington) wrote an interesting paper entitled "Relating Documents via User Activity: The Missing Link". The research was "carried out as part of a project in the Office of the CTO, Microsoft".

The Abstract:

"In this paper we describe a system for creating and exposing relationships between documents: a user’s interaction with digital objects (like documents) is interpreted as links – to be discovered and maintained by the system. Such relationships are created automatically, requiring no priming by the user. Using a very simple set of heuristics, we demonstrate the uniquely useful relationships that can be established between documents that have been touched by the user. Furthermore, this mechanism for relationship building is media agnostic, thus discovering relationships that would not be found by conventional content based approaches. We describe a proof-of-concept implementation of this basic idea and discuss a couple of natural expansions of the scope of user activity monitoring."

They use a system called Ivan, which monitors a user's activity during a task taking note in particular of times when documents are on the screen together, when the users switches between them, or performs other manipulations like cutting and pasting from one document to the other. Ivan helps the user find clusters of documents that were used at the same time and repeatedly, and it also find relationships between documents. The inventors state that it's a mixture of a recommendation system like Amazon's, and Google's page ranking. Ivan captures user activity and then builds relationships.

Activity is captured through message spying, and they focus on symmetrical relationships for pairs of documents. A relationship is established when one document is opened and another is already open. When a user performs actions between those documents, the relationship is strengthened.

They had problems with matching file system events with window events and also found it a challenge to get a reliable and non-invasive stream of user interaction events. They've decided to look at web apps instead of the desktop, as they feel that the desktop will in time be redundant.

It's interesting work as it touches on another way of discovering relevant documents. User interaction hasn't been used enough I think, and this is a nice piece of work going in that direction. Not all documents are text only and so an algorithm based on activity rather than simply content may be very efficient.

