Package org.apache.nutch.analysis.lang

Text document language identifier.


Class Summary
HTMLLanguageParser Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
LanguageIdentifier Identify the language of a content, based on statistical analysis.
LanguageIndexingFilter An IndexingFilter that add a lang (language) field to the document.
NGramProfile This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction.

Package org.apache.nutch.analysis.lang Description

Text document language identifier.

Language profiles are based on material from

Copyright © 2011 The Apache Software Foundation