There are a number of packages which will allow you to use LSA/I and also offer many other useful things regarding semantic analysis, IE and IR for example.
(LSI/A is also applicable to source code too and also images).
For coding your own, you'll need to in short:
- Have a stopword file
- Process each file
- Compute the weights
- Normalize
- Print your data
There's a MATLAB (most unis will have licences allowing you to get a free copy) toolbox called TMG which will allow for clustering, retrieval, indexing, dimensionality reduction and classification - a powerful package indeed! Also MATLAB does a whole load of things because there are plenty of extensions freely available such as the SVM Toolbox.
JLSI is a Java implementation freely available.
The semantic-engine which also uses LSI/A in C++ (Google code).
The semantic vectors package is also available in Java + Lucene.
There's a working online tool at Uni Colorado LSA group. It also does other types of classification.
There's gCLUTO with a nice interface for you - it gives you a graphical representation of clusters.
There's a demo here from Telecordia.
There's also a PLSI parser here. If you want to try the other variant and compare.
I think that will do for now, I hope that you have fun with these :)
No comments:
Post a Comment