My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
http://www.scienceforseo.com
and update your bookmarks.

August 08, 2008

Cuil vs Powerset

There's been an awful lot of talk around Cuil, the alternative search engine to Google that xooglers launched recently. There hasn't been much noise around Powerset (founded in 2005), recently aquired by Microsoft. It's important to note that the engine is running on wikipedia for now.

The guys from Powerset get to join the core Search Relevance team. Powerset is is a "semantic" search engine, meaning that they move more towards intent-based search. The search engine does not base the importance of documents based on links but rather on content using dictionaries, thesauri, syntax, setence structure, and whole host of other NLP tools to extract meaning.

Powerset people state: "Powerset is first applying its natural language processing to search, aiming to improve the way we find information by unlocking the meaning encoded in ordinary human language."

Cuil owners state that it is a "contextual" engine. They also state: "When we find a page with your keywords, we stay on that page and analyze the rest of its content, its concepts, their inter-relationships and the page’s coherency."

Sounds pretty similar doesn't it?

What's the difference?

Well Powerset allows you to use natural language queries (normal conversation expression), and also aggregates information across multiple articles.

The interface is pretty cool, and very very different to Google or Cuil for that matter. You type in a query (cake making) and you get a list of wikipedia results. There is a little drop down button on the left of the result, which when pressed displays beneath the result the actual text (and images if you want) of that result. I don't have to go to the actual website to see the info. I can also click on a display of the article to go to the part of the document that I want.

There are also links at the bottom suggesting related searches, and low and behold, they are all relevant.

It doesn't do so well with natural language queries, which I am not surprised about. No one as far as I know has managed to do a properly good job on this problem area yet. It requires natural language understanding which we haven't found a good solution for yet. It isn't rubbish though, I tried "How do I make cake" and I did get some instructions as a first result. Of course "cake" is also a band, there was no dismbiguation. The very relevant links at the bottom do help a great deal though.

Cuil is a lot more traditional in that it gives you a number of individual documents to look at. They're not ranked in any particular order, and they are in columns. "How do I make cake" returns nothing, because they don't support natural language querying, which I find a bit concerning because this is something which naturally should eventually become the norm. There is loads of research in this area. So, I enter "cake making", and for this I get a mixed bag of results. A few are relevant, one is perfect but the others are a bit of a mess. I don't get any related searches suggested, which limits my options.

I do see Powerset working in a way that indicates that indeed they are working hard to "unlock meaning" as it suggests a load of thing and it also the results are varied enough for me, but still very relevant for the most part. Cuil however doesn't seem to be doing anything apart from throwing out a bag of results from relevant to totally not, and doesn't help me to get better results.

Powerset is a superior engine imho. Cuil to me, is in need of a trip back to the drawing board. One very important thing to note however is that Powerset is working in a "closed domain" which is wikipedia. Cuil has apparently 3 times more documents than Google. That's a really big index. In IR mostly people try and keep the index small because of cost but also performance. With an index that big, I think you'd need to be pretty confident that your super tough and clever algorithm can handle all the intricacies of language and all of the analysis of individual documents, as well as clusters of documents.

What Powerset does that I really like, is not actually take you to the webpage unless you request it. You can read the info from the results page. How does this affect seo? Content, content, content...I think there are going to be more and more openings for writers and the like in the future. The optimisation becomes based not around individual pages but rather around your entire site which is checked for relevancy. If only one of your pages is about "cake" and the others about "candles", I don't think you'd return for "cake" even if that particular page was really optimised. You might return for "birthdays though" if you see what I mean.

No comments:

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at scienceforseo.blogspot.com.