My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
and update your bookmarks.

December 11, 2008

Google research beyond LSI

Google picked up Amrit Gruber who is doing an internship with them.  He's pretty valuable because of his PhD research in statistical text analysis (which is what LSI is).  His method is uses Hidden Topic Markov Models (HTMM) and a working version was released in 2007.

In this post Google mention PLSI (Probabilistic latent semantic indexing) and also Latent Dirichlet Allocation as examples of varients to LSI.

It's different because instead of treating the document as a bag of words, it uses a Temporal Markov Structure.  

Read the Google post here, and OpenHTTM is available here.  Good old Google, thanks for sharing.

This supports my post about how LSI in its very basic form as summarized in various places as well as the excellent Wikipedia is not the variety used in Google, whatever Matt Cutts says.  Yes it is used, but he doesn't give away the important information, what he presents is a very very basic version.  It's like saying "Yes, we use glue in our computer chips" or "Yes, here at NASA we use Glue as an adhesive for our rockets".  It's unlikely to be the glue your child uses at playschool :)

No comments:

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at