Skip navigation links

Package org.apache.nutch.parsefilter.naivebayes

Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.

See: Description

Package org.apache.nutch.parsefilter.naivebayes Description

Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist. CAUTION: Set the parser.timeout to -1 or a bigger value than 30, when using this classifier.
Skip navigation links

Copyright © 2017 The Apache Software Foundation