My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
and update your bookmarks.

October 13, 2008

Cognition - a short interview

I've been playing with the Cognition search engine for a while now and also sent the link on to some colleagues of which my friend Dan who is a proper algorithm geek, like I am.  Dr Kathleen Dahlgren from Cognition answered some questions for us, here they are:

- How does cognition feel about personalised search?

Personalized search can be augmented when the search engine understands language and can automatically see relationships that are opaque to pattern-matchers.  For example, if a person is interested in rhythm and blues, they are also interested in R&B, and probably blues as well.  But not blues meaning a bad mood.  These subtleties are all handled by Cognition.

- Are there plans for a multilingual solution?

There are plans.  The semantic map is relevant in all languages; it is universal.  But linguists need to tie concepts to the words of other languages.

- How are the ontologies constructed?

Originally they were constructed by hand.  Currently Cognition adds digitized ontologies automatically.

-  Cognition claims that no other NLP processing technology comes close in breadth and depth of understanding of English... how so?

The closest semantic map, WordNet, has 2.5 times fewer word stems and 20 times less
semantic information.

- What exactly is meant by the "context" of the text they are processing?

The context is the other words in a sentence.  So in “strike a match”, “strike” means “ignite” and “match” means “phosphorus-tipped stick”.  But in “striking workers”, “strike” means “walkout”.

- What metrics are used to measure the quality of the engine?

We have many different metrics and regression tests.  Our main method is to index identical content with another search engine, produce 50 typical queries, and test them for relevance using the two search engines.  Recall is measured as relative recall, lacking a gold standard in which all documents have been inspected.  In relative recall, the total of relevant search results by the two search engines is counted as full recall.  In such tests, Cognition always performs with over 90% precision and recall.  Google, for example, in 3 such tests had 20% precision and 20% recall.
- What exactly is meant by a "phrase" in the stat database?

A phrase is a frequently-occurring set of terms that are always juxtaposed, such as The Bill of Rights, U.S. Congress, United Airlines, or Securities and Exchange Commission.  

- Are there prebuilt macros for common phrases?

Yes – 200,000 of them.

It's really a very interesting system to use, and I reckon it'll improve leaps and bounds in the future as well.  We will be playing with this a great deal, I'll blog about it again, so watch this space!

No comments:

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at