Science for SEO: rankings

January 15, 2009

Search Engine Result Evaluation

Search engines are often evaluated using information retrieval techniques such a precision and recall. These methods are very effective metrics in these systems but less so in search engines. The reasons for this is that high precision isn't necessarily a good measure of user satisfaction. The quality of the resources is of course a factor but what users class as authoritative may vary.

This does really show that results are personal to each user, we're not looking for the same things every time and if we are, maybe not for the same reasons. This is why personalisation is a good solution, but that's a topic for another day.

Usually you can classify queries into navigational ones or information motivated ones. This also affects the evaluation of the search engine. Information ones are hardest because you're looking for a bunch of relevant documents but the query isn't usually rich enough to establish what exactly is needed. Navigational queries such as looking for the Sofitel in Bangkok are much easier because they're more exact.

You can use human evaluators or automated methods to check how good the results are. Human evaluators are very biased towards their own motivations of course which have in the past shown that results vary widely. Automated testing isn't biased of course, the machine doesn't care, but it isn't always very representative of human search if you like. Google use human evaluators and also live traffic experiments.

Here I'll introduce a few papers you might find interesting on the subject. I've chosen a bit of a mixture but of course there are many more ways to do this.

"Search Engine Ranking Efficiency Evaluation Tool" by Alhalabi, Kubat and Tapia from the University of Miami.

They also note that "precision" and "recall" doesn't take into consideration ranking quality. They propose using SEREET (Search Engine Ranking Efficiency Evaluation Tool).

They compare a known correctly ordered list to a search engine's one. The method is to start at 100 points and then deduct from those each time a relevant document isn't present in the search engine rankings and also if an irrelevant document is returned. It's basically (the number of misses/RankLength) x 100. RankLength is the number of links in the rank list.They found it was more sensitive to change and efficient in space and time.

"Automatic Search Engine Performance Evaluation with Click-through Data Analysis" by Liu, Fu, Zhang, Ru from Tsinghua University.

They note than human evaluation is too time consuming to be an efficient method of evaluation. Their click-through data analysis method allows them to evaluate automatically. Navigational type queries, query topics and answers are made by the system based on user query and click behaviour. They found that they got similar results from those of human evaluators.

"Evaluation of Web-Based Search Engines Using User-Effort Measures" - Tang and Sun from Reutgers University

They looked at "user-effort-sensitive evaluation measures", namely search length, rank correlation and first 20 full precision. They say this is better because it focuses on the quality of the ranking. They found overall that the 3 measures were consistent. "Search length" is the number of non-relevant documents the users has to sift through, "Rank correlation" is comparing the user ranking to the search engine ranking, and "First 20 Full Precision" is the ratio of relevant document within the total set of documents returned.

Identifying the Influential Bloggers in a Community

This paper was presented at WSDM 08 by Nitin Agarwal, Huan Liu, Lei Tang (Arizona State University) & Philip S. Yu (University of Illinois at Chicago). "Identifying the Influential Bloggers in a Community" can be read at the ACM.

They look at the very important area of research concerning how we deal with the huge amount of data generated by bloggers and how we rank these blog posts.

I've presented you with a short summary of the main points:

Whether a blogger is active or not does not necessarily mean that s/he is not influential. Very active bloggers can be influential and just as easily not. The influential ones however are very important because they can help companies in developing new business ideas, identify key concerns and trends, competitive products,...Bloggers can become product advocates, and basically, they are market movers. The blogging on the recent US electoral campaign shows how bloggers can have influence over social and political issues also.

The researchers say that 64% of companies have identified the importance of the blogospere for their business. Instead of trawling through endless posts in the relevant community, the best entry point are the most influential posts.

Technorati reports a 100% increase in the size of the Blogosphere every month. This is huge and means that methods need to be developed in order to deal with this enormous amount of data.

You can't (as we've seen before) use PageRank or HITS or whatever method applied to search engines for the Blogosphere, because the blogs are sparsely linked, and the Random Surfer model just doesn't work for this. Web pages can gain authority over time, but this is not necessarily true of Blogs. As they say, a blog post and a bloggers influence actually decreases over time. This is because even more sparsely linked posts come into existence.

They say that there is research going on regarding ranking on topic similarity but this is still very much on the drawing board right now. They say that you could use traffic information, number of comments and more of these kinds of statistics, however you'd be leaving out all of those inactive bloggers.

They identify 4 groups of bloggers:

"active and influential, active and non-influential, inactive and influential, and inactive and non-influential". They create an influence score based on whether the blogger has any influential posts.

You're influential in the following circumstances (obviously you could probably add quite a few more):

Recognition - An influential blog post is recognized by many.
Activity Generation - A blog post’s capability of generating activity (comments, follow-up discussions...)
Novelty - Novel ideas exert more influence (lots of outlinks means that the post is not novel)
The blog post length is positively correlated with number of comments which means longer blog posts attract people’s attention.

For example:

Active & influential:

"‘Erica Sadun’ submitted 152 posts in the last 30 days, among which 9 of them are influential, attracting a large number of readers evidenced by 75 comments and 80 citations".

Inactive but influential:

"‘Dan Lurie’ published only 16 posts (much fewer than 152 posts comparing with ‘Erica Sadun’, an active influential blogger) in the last 30 days".

This is a very good example of a paper addressing the issues we're encountering in Blog post retrieval, categorisation and so on. It is a very very important area of research and needs imho to receive a lot more attention and budget dare I say :)