It supports all major languages, performs stemming using Porter and Krovetz, indexes loads of file formats, uses part-of-speech tagging and named entity recognition, and has an API of course (C++, C# and Java).
- Supports major language modeling approaches such as Indri and KL-divergence, as well as vector space, tf.idf, Okapi and InQuery
- Relevance- and pseudo-relevance feedback
- Wildcard term expansion (using Indri)
- Passage and XML element retrieval
- Cross-lingual retrieval
- Smoothing via Dirichlet priors and Markov chains
- Supports arbitrary document priors (e.g., Page Rank, URL depth)
There is a new engine from the Lemur project called Indri which uses inference networks.