Package org.apache.nutch.analysis.lang

Text document language identifier.


Class Summary
HTMLLanguageParser Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
LanguageIdentifier Identify the language of a content, based on statistical analysis.
LanguageIndexingFilter An IndexingFilter that add a lang (language) field to the document.
LanguageQueryFilter Handles "lang:" query clauses, causing them to search the "lang" field indexed by LanguageIdentifier.
NGramProfile This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction.

Package org.apache.nutch.analysis.lang Description

Text document language identifier.

Language profiles are based on material from

Copyright © 2006 The Apache Software Foundation