Uses of Interface
org.apache.nutch.parse.Parse
-
Packages that use Parse Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.similarity org.apache.nutch.scoring.similarity.cosine Implements the cosine similarity metric for scoring relevant documentsorg.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
-
Uses of Parse in org.apache.nutch.analysis.lang
Methods in org.apache.nutch.analysis.lang with parameters of type Parse Modifier and Type Method Description NutchDocument
LanguageIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.crawl
Methods in org.apache.nutch.crawl with parameters of type Parse Modifier and Type Method Description byte[]
MD5Signature. calculate(Content content, Parse parse)
abstract byte[]
Signature. calculate(Content content, Parse parse)
byte[]
TextMD5Signature. calculate(Content content, Parse parse)
byte[]
TextProfileSignature. calculate(Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.indexer
Methods in org.apache.nutch.indexer with parameters of type Parse Modifier and Type Method Description NutchDocument
IndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse.NutchDocument
IndexingFilters. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
Run all defined filters. -
Uses of Parse in org.apache.nutch.indexer.anchor
Methods in org.apache.nutch.indexer.anchor with parameters of type Parse Modifier and Type Method Description NutchDocument
AnchorIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheAnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors. -
Uses of Parse in org.apache.nutch.indexer.arbitrary
Methods in org.apache.nutch.indexer.arbitrary with parameters of type Parse Modifier and Type Method Description NutchDocument
ArbitraryIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheArbitraryIndexingFilter
filter object uses reflection to instantiate the configured class and invoke the configured method. -
Uses of Parse in org.apache.nutch.indexer.basic
Methods in org.apache.nutch.indexer.basic with parameters of type Parse Modifier and Type Method Description NutchDocument
BasicIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheBasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields. -
Uses of Parse in org.apache.nutch.indexer.feed
Methods in org.apache.nutch.indexer.feed with parameters of type Parse Modifier and Type Method Description NutchDocument
FeedIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to theIndexer
for indexing within the Nutch index. -
Uses of Parse in org.apache.nutch.indexer.filter
Methods in org.apache.nutch.indexer.filter with parameters of type Parse Modifier and Type Method Description NutchDocument
MimeTypeIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.geoip
Methods in org.apache.nutch.indexer.geoip with parameters of type Parse Modifier and Type Method Description NutchDocument
GeoIPIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.jexl
Methods in org.apache.nutch.indexer.jexl with parameters of type Parse Modifier and Type Method Description NutchDocument
JexlIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.links
Methods in org.apache.nutch.indexer.links with parameters of type Parse Modifier and Type Method Description NutchDocument
LinksIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.metadata
Methods in org.apache.nutch.indexer.metadata with parameters of type Parse Modifier and Type Method Description NutchDocument
MetadataIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.more
Methods in org.apache.nutch.indexer.more with parameters of type Parse Modifier and Type Method Description NutchDocument
MoreIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.replace
Methods in org.apache.nutch.indexer.replace with parameters of type Parse Modifier and Type Method Description NutchDocument
ReplaceIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.staticfield
Methods in org.apache.nutch.indexer.staticfield with parameters of type Parse Modifier and Type Method Description NutchDocument
StaticFieldIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheStaticFieldIndexer
filter object which adds fields as per configuration setting. -
Uses of Parse in org.apache.nutch.indexer.subcollection
Methods in org.apache.nutch.indexer.subcollection with parameters of type Parse Modifier and Type Method Description NutchDocument
SubcollectionIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.tld
Methods in org.apache.nutch.indexer.tld with parameters of type Parse Modifier and Type Method Description NutchDocument
TLDIndexingFilter. filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.indexer.urlmeta
Methods in org.apache.nutch.indexer.urlmeta with parameters of type Parse Modifier and Type Method Description NutchDocument
URLMetaIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object. -
Uses of Parse in org.apache.nutch.microformats.reltag
Methods in org.apache.nutch.microformats.reltag with parameters of type Parse Modifier and Type Method Description NutchDocument
RelTagIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-
Uses of Parse in org.apache.nutch.parse
Classes in org.apache.nutch.parse that implement Parse Modifier and Type Class Description class
ParseImpl
The result of parsing a page's raw content.Methods in org.apache.nutch.parse that return Parse Modifier and Type Method Description Parse
ParseResult. get(String key)
Retrieve a single parse output.Parse
ParseResult. get(Text key)
Retrieve a single parse output.Parse
ParseStatus. getEmptyParse(Configuration conf)
Creates an emptyParse
instance containing the statusMethods in org.apache.nutch.parse that return types with arguments of type Parse Modifier and Type Method Description RecordWriter<Text,Parse>
ParseOutputFormat. getRecordWriter(TaskAttemptContext context)
Iterator<Map.Entry<Text,Parse>>
ParseResult. iterator()
Iterate over all entries in the <url, Parse> map.Methods in org.apache.nutch.parse with parameters of type Parse Modifier and Type Method Description static ParseResult
ParseResult. createParseResult(String url, Parse parse)
Convenience method for obtainingParseResult
from a singleParse
output.Constructors in org.apache.nutch.parse with parameters of type Parse Constructor Description ParseImpl(Parse parse)
-
Uses of Parse in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring with parameters of type Parse Modifier and Type Method Description float
AbstractScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
float
ScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
This method calculates a indexed document score/boost.float
ScoringFilters. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
AbstractScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
void
ScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process.void
ScoringFilters. passScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth with parameters of type Parse Modifier and Type Method Description float
DepthScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
DepthScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.scoring.link
Methods in org.apache.nutch.scoring.link with parameters of type Parse Modifier and Type Method Description float
LinkAnalysisScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
void
LinkAnalysisScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.scoring.metadata
Methods in org.apache.nutch.scoring.metadata with parameters of type Parse Modifier and Type Method Description void
MetadataScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
Takes the metadata, which was lumped inside the content, and replicates it within your parse data. -
Uses of Parse in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic with parameters of type Parse Modifier and Type Method Description float
OPICScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
Dampen the boost value by scorePower.void
OPICScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData. -
Uses of Parse in org.apache.nutch.scoring.similarity
Methods in org.apache.nutch.scoring.similarity with parameters of type Parse Modifier and Type Method Description void
SimilarityScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
float
SimilarityModel. setURLScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.scoring.similarity.cosine
Methods in org.apache.nutch.scoring.similarity.cosine with parameters of type Parse Modifier and Type Method Description float
CosineSimilarity. setURLScoreAfterParsing(Text url, Content content, Parse parse)
-
Uses of Parse in org.apache.nutch.scoring.tld
Methods in org.apache.nutch.scoring.tld with parameters of type Parse Modifier and Type Method Description float
TLDScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
-
Uses of Parse in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta with parameters of type Parse Modifier and Type Method Description void
URLMetaScoringFilter. passScoreAfterParsing(Text url, Content content, Parse parse)
Takes the metadata, which was lumped inside the content, and replicates it within your parse data. -
Uses of Parse in org.creativecommons.nutch
Methods in org.creativecommons.nutch with parameters of type Parse Modifier and Type Method Description NutchDocument
CCIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
-