My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
and update your bookmarks.

January 06, 2009

Document clustering - a short intro

Clustering is super important in all systems that deal with any kind of information.  In information retrieval systems like digital libraries and search engines they are used to group the documents into clusters.  These are all documents that share similarities.  

This can get really really complex very quickly, and there are loads of different clustering methods happening at all different stages to produce sufficiently exact results.  Here I'm sharing with you a presentation on the topic which isn't too involved and is quite high level.  There are some maths but you can ignore them if you like, you won't completely lose out or anything.

Enjoy :)


Chris McGiffen said...

This presentation gives a good overview to understand clustering, although I am wondering if any SEO's actually use clustering in their work.

At MediaCo we have looked at using it in query space analysis to some effect and content analysis to little effect (structural issues generally override any recomendations we come up with from clustering). However, as your presentation shows there are many ways to skin a cat (or, indeed, cluster) and perhaps the effectiveness can yet be improved upon :)

So if I may ask do any one other SEO's make use of clustering?

Chris McG

CJ said...

Hi Chris,

I've used clustering on big sites, with topic detection, and I made it more sensitive so I could see how the system thought it was split up. This sometimes sheds light on how your site content is spread and how it actually fits together.

Assumption is not my friend!

Alex said...

Hi CJ.
That's a really good and clean overview with the right level of complexity.

Things a SEO can spot in a (large) web site thanks to clustering:
- topic not well covered
- topic covered in too many pages
- similar pages/topics that compete against each other for certain queries
- others..?

My 2 cents :-)

CJ said...

Nice one Alex. I like your blog BTW :)

Alex said...

Thanks :-)
So you understand italian?

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at