My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
and update your bookmarks.

October 21, 2008

Internet progress fast - CS slow

I wrote a rather long post at High Rankings and decided that it deserved a place on the blog.  A very good point was made by Randy about how fast the Internet moves.  There are new developments almost daily, new systems, new ways of doing things emerge, and we all keep up with the new trends and algorithms.  Computer science research that is 4 years old (or even older!) isn't as current, it is true.  

This is because it takes ages for a lot of methods to be evaluated properly so that they can safely be used in public systems like search engines or social networks for example.  Some systems aren't designed to use some methods, and only when they have gone through many iterations, they suddenly see the need to incorporate a certain method or even a few.

Stemming for example is quite old, it goes back to 1966 when the Lovins stemmer was made.  Google I believe (but not totally sure of the exact date) implemented stemming to queries in 2003.  That's 37 years!  I think they were already using it in the internal system though, it's a pretty standard method in IR after all.  I wrote a stemmer in 2005 and it only started being used in 2007, not a lot of people saw any use for a stemmer that stemmed to exact words, but now it's pretty standard too.  That took 2 years.

PageRank came about in 1995 and was implemented when Google was publicly released in 1998, that's 3 years.  

I work in conversational systems and it has taken a while for the science community and also the industry to see why they could be useful.  Now there's a lot of research in this area, and the first chatbot was invented in 1966 (ELIZA).  It's not until recently that companies have started using chatbots on their websites (Ikea for example) and suddenly the potential for such systems in IR is being realised.  Long wait!  We don't even have all the technology needed yet to make something really good.

I think it's really important for the SEO community to keep track of papers released by IR researchers and also NLP/AI researchers when the work is related to search engines particularly.  It's useful to learn about the methods being developed and then it gives some insight into how they might be implemented (although this could take some time!).  You can use Citeseer to find them, or DBLP, and checking the references too can be useful.  Those are where my massive reading list comes from!

Of course some methods do get implemented quite quickly and I think that this happens when they are specifically built for a system in build progress.  The big search engines have people working solely on this and also companies like IBM for example.  What I mean is that you shouldn't discount methods that have been published a few years ago.  A lot of social media stuff was published quite some years ago too.

Happy reading :)


Anonymous said...

Check out this Web 2.0 approach to chatbots:

Just as Deep Blue brute-forced it in chess with speed, the idea behind the Chatbot Game is to brute-force it with a huge number of user-submitted Google-like chat rules.

CJ said...

Hi Anon,

Thanks for the link!

Yes Deep blue used brute force and most chatbots also use that, but there is plenty of other stuff going on.

Conversational agent research is not necessarily related to chatbots, it's Q&A and also IR interestingly. I have seen some very cool systems that definitely are not using brute force AI :)

Yey for progress!

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at