My blog has moved!

You should be automatically redirected in 6 seconds. If not, visit
http://www.scienceforseo.com
and update your bookmarks.

July 21, 2008

Natural language querying

A natural language query is expressed in conversational syntax. An example would be "What is a tornado". In keyword search you would enter the term "tornado", and some engines have commands such as "define: tornado".

What makes natural language querying different is that not only can the user simply ask for information, but also the search engine has more chance of getting the right information.

Google recognises some natural language queries, like our example because of the question type pattern. We call these Wh-words: who, what, where, when, and how slips in there too. Because we can tell the system to recognise the pattern that these types of sentence have, the translation is pretty much "define:tornado".

Things get more tricky of you entered for example "if i exercise will i lose weight?". What we'd ideally want is a list of documents explaining about the benefits of exercise for weight loss, etc...Google doesn't get it completely wrong, but the results are quite clumsy, centered around the right topic, just my answer isn't there. That's because Google isn't a natural language querying system right now.

These systems are also called Q&A systems (question-answering). They are expected to not only retrieve the correct information from the index or knowledge base, but also formulate a natural language answer. In our example "If I exercise will I lose weight?", a good answer from the system would be "yes, exercising increases metabolic rate and helps burn fat". Then the user continues the conversation to learn more. Access to relevant documents is available.

How do these systems work? They use natural language processing techniques, question classifiers, information retrieval techniques, natural language generation techniques such as grammars, taxonomies of constructions,named entity recognition, tagging and parsing...there is quite an arsenal of tools here.

The big question is whether we actually want a natural language answer and a conversation with a machine on a daily basis to get our information or not. Google researchers are I am sure working on natural language querying, but I don't think they would be working on generating a natural language answer, simply a better collection of results.

An interesting thing to consider is the use of anaphora of sorts, so how the engine keeps track of the questions you've asked, to get a feel for the area you're working on. If you've asked about strawberries, then jars, then whatever...the engine might think..right this is about jam and cooking. It should also know when you change the subject. It creates some kind of continuity in the querying. It means that context is retained by the engine.

But. It's really really hard to do. I haven't used a single system that worked properly in an open domain. Some questions are interrogative and some are assertive, this needs to be recognised by the system. Understanding the syntactics and semantics of a question despite the tools available isn't easy. The system needs to understand not only what is in the index, but also what exists in the world of the user. It might need to know that water is wet for example in order to understand what we're talking about around a particular topic.

Bottom line? It's hard. Do we need it? Yes, I think so. It's one of the best ways for engines to work with us, and for them to deal with our information more accurately.

How does it impact seo work? I think that seeing as the machines still index and retrieve the documents the same way, it isn't much of an issue in that sense. On the other hand having well written and coherent content becomes more and more important.

Want to try some? Try Qualim, Start and OpenEphyra.

No comments:

Creative Commons License
Science for SEO by Marie-Claire Jenkins is licensed under a Creative Commons Attribution-Non-Commercial-No Derivative Works 2.0 UK: England & Wales License.
Based on a work at scienceforseo.blogspot.com.