Uses of Package
org.apache.nutch.crawl
-
Packages that use org.apache.nutch.crawl Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.fetcher The Nutch multi-threaded fetching moduleorg.apache.nutch.hostdb org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.metadata A Multi-valued Metadata container, and set of constant fields for Nutch Metadata.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.protocol Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
.org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http
,httpclient
, etc.)org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium.org.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.orphan Scoring filter to modify score or status of orphaned pages (no inlinks found for a configurable amount of time).org.apache.nutch.scoring.similarity org.apache.nutch.scoring.similarity.cosine Implements the cosine similarity metric for scoring relevant documentsorg.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.scoring.webgraph org.apache.nutch.segment A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.org.apache.nutch.tools Miscellaneous tools.org.apache.nutch.tools.warc Tools to import / export between Nutch segments and WARC archives.org.apache.nutch.util Miscellaneous utility classes.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.analysis.lang Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.crawl Class Description AbstractFetchSchedule This class provides common methods for implementations ofFetchSchedule
.AdaptiveFetchSchedule This class implements an adaptive re-fetch algorithm.CrawlDatum FetchSchedule This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals.Generator.SelectorEntry Inlink An incoming link to a page.Inlinks A list ofInlink
s.NutchWritable Signature -
Classes in org.apache.nutch.crawl used by org.apache.nutch.fetcher Class Description CrawlDatum NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.hostdb Class Description CrawlDatum NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer Class Description CrawlDatum Inlinks A list ofInlink
s.NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.anchor Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.arbitrary Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.basic Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.feed Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.filter Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.geoip Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.jexl Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.links Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.metadata Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.more Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.replace Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.staticfield Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.subcollection Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.tld Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.indexer.urlmeta Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.metadata Class Description NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.microformats.reltag Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.file Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.ftp Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.htmlunit Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.http Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.http.api Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.httpclient Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.interactiveselenium Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.okhttp Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.protocol.selenium Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.depth Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.link Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.metadata Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.opic Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.orphan Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.similarity Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.similarity.cosine Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.tld Class Description CrawlDatum Inlinks A list ofInlink
s. -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.urlmeta Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.apache.nutch.scoring.webgraph Class Description NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.segment Class Description CrawlDatum NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.tools Class Description Generator.SelectorEntry -
Classes in org.apache.nutch.crawl used by org.apache.nutch.tools.warc Class Description NutchWritable -
Classes in org.apache.nutch.crawl used by org.apache.nutch.util Class Description CrawlDatum -
Classes in org.apache.nutch.crawl used by org.creativecommons.nutch Class Description CrawlDatum Inlinks A list ofInlink
s.