Software
Links
N-gram Statistics Package (NSP)
http://www.d.umn.edu/~tpederse/nsp.html
Suite of Perl tools for counting and analyzing word n-grams in text; provides standard tests of association for identifying word n-grams in large corpora and allows users to implement other tests with minimal Perl knowledge.
Observable Operator Modeling Kit
Machine learning library for Observable Operator Models (OOMs) suitable for time-series and sequence data classification and prediction. OOMs are similar but more powerful than HMMs. [C++, BSD license]
Pattern Recognition Application Programmer's Interface (PRAPI)
http://www.ee.oulu.fi/~topiolli/?section=cpplibs
A C++ library for many pattern recognition tasks; main focus is on image analysis, but a general architecture and XML-based data interchange format allows it to be used for many other tasks as well.
Pfam
A large collection of multiple sequence alignments and trained hidden Markov models covering many common protein domains.
PRODIGY System
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/prodigy/Web/prodigy-home.html
An architecture for planning and learning. [Free]
Sequence Alignment and Modeling System (SAM)
http://www.cse.ucsc.edu/research/compbio/sam.html
A collection of tools for creating and using HMMs for biological sequences. Free license for academic and nonprofit usages.
Software Packages for Graphical Models/Bayesian Networks
http://www.ai.mit.edu/~murphyk/Software/bnsoft.html
Directory of software tools for modeling graphs and Bayesian networks. Some have learning capabilities.
Statistical Decision Trees
http://www.isip.msstate.edu/projects/speech/software/legacy/decision_tree/index.html
A program for inducing Bayesian decision trees. Applications to speech. [Free]
SUBDUE Knowledge Discovery in Structural Databases
The program discovers interesting and repetitive subgraphs in a labeled graph representation using the minimum description length principle. Applications to molecular biology. [Free]
The Bow Toolkit
http://www.cs.cmu.edu/~mccallum/bow/
A library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). [Free]



