The accuracy of recognition hypothesis produced by the acoustic model
can be further enhanced using a language model. The acoustic model
might produce several alternate similar words that the language model
helps to disambiguate. Language models are also useful in limiting
search time for beam search based acoustic models. N-gram models which
predict the probability of a word based on the previous
words
are a common and effective approach. Current systems like Sphinx and
HTK favor models with N=3, which are called trigrams. While there
are alternatives to N-gram models that rely on grammar, syntax, subject
verb agreement and trigger words, N-gram models have the distinct
advantage of being easy to train since N-gram probabilities can be
easily estimated from a large corpus of text automatically. A trigram
model may be trained simply by using the equation:
Here,
refers to the frequency of occurrence
of the trigram
in the training text and
refers to the frequency of occurrence of the bigram
.
In practice, for a large vocabulary all possible trigrams will not
be present in the training corpus. In that case bigram or unigram
probabilities are used in the place of trigram probabilities after
reducing the probability by a back-off weight, which accounts
for the fact that the next higher n-gram has not been seen and therefore
has a lower chance of occurring.