Uses of Interface
org.apache.nutch.plugin.Pluggable
-
Packages that use Pluggable Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.collection Subcollection is a subset of an index.org.apache.nutch.exchange Control code for exchange component, which acts in indexing job and decides to which index writer a document should be routed, based on plugins behavior.org.apache.nutch.exchange.jexl Plugin of Exchange component based on JEXL expressions.org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.indexwriter.cloudsearch org.apache.nutch.indexwriter.csv Index writer plugin to write a plain CSV file.org.apache.nutch.indexwriter.dummy Index writer plugin for debugging, writes pairs of <action, url> to a text file, action is one of "add", "update", or "delete".org.apache.nutch.indexwriter.elastic Index writer plugin for Elasticsearch.org.apache.nutch.indexwriter.kafka Index writer plugin to produce JSON messages to Kafka.org.apache.nutch.indexwriter.opensearch1x Index writer plugin for OpenSearch.org.apache.nutch.indexwriter.rabbit org.apache.nutch.indexwriter.solr Index writer plugin for Apache Solr.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.net Web-related interfaces: URLfilters
andnormalizers
.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.parse.ext Parse wrapper to run external command to do the parsing.org.apache.nutch.parse.feed Parse RSS feeds.org.apache.nutch.parse.headings Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.org.apache.nutch.parse.html An HTML document parsing plugin.org.apache.nutch.parse.js Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets.org.apache.nutch.parse.metatags Parse filter to extract meta tags: keywords, description, etc.org.apache.nutch.parse.tika Parse various document formats with help of Apache Tika.org.apache.nutch.parse.zip Parse ZIP files: embedded files are recursively passed to appropriate parsers.org.apache.nutch.parsefilter.debug Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML).org.apache.nutch.parsefilter.naivebayes Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.org.apache.nutch.parsefilter.regex RegexParseFilter.org.apache.nutch.protocol Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
.org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http
,httpclient
, etc.)org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium.org.apache.nutch.publisher org.apache.nutch.publisher.rabbitmq Publisher package to implement queuesorg.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.orphan Scoring filter to modify score or status of orphaned pages (no inlinks found for a configurable amount of time).org.apache.nutch.scoring.similarity org.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.urlfilter.api GenericURL filter
library, abstracting away from regular expression implementations.org.apache.nutch.urlfilter.automaton URL filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM.org.apache.nutch.urlfilter.domain URL filter plugin to include only URLs which match an element in a given list of domain suffixes, domain names, and/or host names.org.apache.nutch.urlfilter.domaindenylist URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.org.apache.nutch.urlfilter.fast URL filter plugin that first does fast exact suffix matches on host/domain names before applying regular expressions to the path component of a URL.org.apache.nutch.urlfilter.ignoreexempt URL filter plugin which identifies exemptions to external urls when when external urls are set to ignore.org.apache.nutch.urlfilter.prefix URL filter plugin to include only URLs which match one of a given list of URL prefixes.org.apache.nutch.urlfilter.regex URL filter plugin to include and/or exclude URLs matching Java regular expressions.org.apache.nutch.urlfilter.suffix URL filter plugin to either exclude or include only URLs which match one of the given (path) suffixes.org.apache.nutch.urlfilter.validator URL filter plugin that validates given urls.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
-
Uses of Pluggable in org.apache.nutch.analysis.lang
Classes in org.apache.nutch.analysis.lang that implement Pluggable Modifier and Type Class Description class
HTMLLanguageParser
class
LanguageIndexingFilter
AnIndexingFilter
that add alang
(language) field to the document. -
Uses of Pluggable in org.apache.nutch.collection
Classes in org.apache.nutch.collection that implement Pluggable Modifier and Type Class Description class
Subcollection
SubCollection represents a subset of index, you can define url patterns that will indicate that particular page (url) is part of SubCollection. -
Uses of Pluggable in org.apache.nutch.exchange
Subinterfaces of Pluggable in org.apache.nutch.exchange Modifier and Type Interface Description interface
Exchange
-
Uses of Pluggable in org.apache.nutch.exchange.jexl
Classes in org.apache.nutch.exchange.jexl that implement Pluggable Modifier and Type Class Description class
JexlExchange
-
Uses of Pluggable in org.apache.nutch.indexer
Subinterfaces of Pluggable in org.apache.nutch.indexer Modifier and Type Interface Description interface
IndexingFilter
Extension point for indexing.interface
IndexWriter
-
Uses of Pluggable in org.apache.nutch.indexer.anchor
Classes in org.apache.nutch.indexer.anchor that implement Pluggable Modifier and Type Class Description class
AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors. -
Uses of Pluggable in org.apache.nutch.indexer.arbitrary
Classes in org.apache.nutch.indexer.arbitrary that implement Pluggable Modifier and Type Class Description class
ArbitraryIndexingFilter
Adds arbitrary searchable fields to a document from the class and method the user identifies in the config. -
Uses of Pluggable in org.apache.nutch.indexer.basic
Classes in org.apache.nutch.indexer.basic that implement Pluggable Modifier and Type Class Description class
BasicIndexingFilter
Adds basic searchable fields to a document. -
Uses of Pluggable in org.apache.nutch.indexer.feed
Classes in org.apache.nutch.indexer.feed that implement Pluggable Modifier and Type Class Description class
FeedIndexingFilter
-
Uses of Pluggable in org.apache.nutch.indexer.filter
Classes in org.apache.nutch.indexer.filter that implement Pluggable Modifier and Type Class Description class
MimeTypeIndexingFilter
AnIndexingFilter
that allows filtering of documents based on the MIME Type detected by Tika -
Uses of Pluggable in org.apache.nutch.indexer.geoip
Classes in org.apache.nutch.indexer.geoip that implement Pluggable Modifier and Type Class Description class
GeoIPIndexingFilter
This plugin implements an indexing filter which takes advantage of the GeoIP2-java API. -
Uses of Pluggable in org.apache.nutch.indexer.jexl
Classes in org.apache.nutch.indexer.jexl that implement Pluggable Modifier and Type Class Description class
JexlIndexingFilter
AnIndexingFilter
that allows filtering of documents based on a JEXL expression. -
Uses of Pluggable in org.apache.nutch.indexer.links
Classes in org.apache.nutch.indexer.links that implement Pluggable Modifier and Type Class Description class
LinksIndexingFilter
-
Uses of Pluggable in org.apache.nutch.indexer.metadata
Classes in org.apache.nutch.indexer.metadata that implement Pluggable Modifier and Type Class Description class
MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata. -
Uses of Pluggable in org.apache.nutch.indexer.more
Classes in org.apache.nutch.indexer.more that implement Pluggable Modifier and Type Class Description class
MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are available), so that they can be accurately used within the search index. -
Uses of Pluggable in org.apache.nutch.indexer.replace
Classes in org.apache.nutch.indexer.replace that implement Pluggable Modifier and Type Class Description class
ReplaceIndexer
Do pattern replacements on selected field contents prior to indexing. -
Uses of Pluggable in org.apache.nutch.indexer.staticfield
Classes in org.apache.nutch.indexer.staticfield that implement Pluggable Modifier and Type Class Description class
StaticFieldIndexer
A simple plugin called at indexing that adds fields with static data. -
Uses of Pluggable in org.apache.nutch.indexer.subcollection
Classes in org.apache.nutch.indexer.subcollection that implement Pluggable Modifier and Type Class Description class
SubcollectionIndexingFilter
-
Uses of Pluggable in org.apache.nutch.indexer.tld
Classes in org.apache.nutch.indexer.tld that implement Pluggable Modifier and Type Class Description class
TLDIndexingFilter
Adds the top-level domain extensions to the index -
Uses of Pluggable in org.apache.nutch.indexer.urlmeta
Classes in org.apache.nutch.indexer.urlmeta that implement Pluggable Modifier and Type Class Description class
URLMetaIndexingFilter
This is part of the URL Meta plugin. -
Uses of Pluggable in org.apache.nutch.indexwriter.cloudsearch
Classes in org.apache.nutch.indexwriter.cloudsearch that implement Pluggable Modifier and Type Class Description class
CloudSearchIndexWriter
Writes documents to CloudSearch. -
Uses of Pluggable in org.apache.nutch.indexwriter.csv
Classes in org.apache.nutch.indexwriter.csv that implement Pluggable Modifier and Type Class Description class
CSVIndexWriter
Write Nutch documents to a CSV file (comma separated values), i.e., dump index as CSV or tab-separated plain text table. -
Uses of Pluggable in org.apache.nutch.indexwriter.dummy
Classes in org.apache.nutch.indexwriter.dummy that implement Pluggable Modifier and Type Class Description class
DummyIndexWriter
DummyIndexWriter. -
Uses of Pluggable in org.apache.nutch.indexwriter.elastic
Classes in org.apache.nutch.indexwriter.elastic that implement Pluggable Modifier and Type Class Description class
ElasticIndexWriter
Sends NutchDocuments to a configured Elasticsearch index. -
Uses of Pluggable in org.apache.nutch.indexwriter.kafka
Classes in org.apache.nutch.indexwriter.kafka that implement Pluggable Modifier and Type Class Description class
KafkaIndexWriter
Sends Nutch documents to a configured Kafka Cluster -
Uses of Pluggable in org.apache.nutch.indexwriter.opensearch1x
Classes in org.apache.nutch.indexwriter.opensearch1x that implement Pluggable Modifier and Type Class Description class
OpenSearch1xIndexWriter
Sends NutchDocuments to a configured OpenSearch index. -
Uses of Pluggable in org.apache.nutch.indexwriter.rabbit
Classes in org.apache.nutch.indexwriter.rabbit that implement Pluggable Modifier and Type Class Description class
RabbitIndexWriter
-
Uses of Pluggable in org.apache.nutch.indexwriter.solr
Classes in org.apache.nutch.indexwriter.solr that implement Pluggable Modifier and Type Class Description class
SolrIndexWriter
-
Uses of Pluggable in org.apache.nutch.microformats.reltag
Classes in org.apache.nutch.microformats.reltag that implement Pluggable Modifier and Type Class Description class
RelTagIndexingFilter
AnIndexingFilter
that addtag
field(s) to the document.class
RelTagParser
Adds microformat rel-tags of document if found. -
Uses of Pluggable in org.apache.nutch.net
Subinterfaces of Pluggable in org.apache.nutch.net Modifier and Type Interface Description interface
URLExemptionFilter
Interface used to allow exemptions to external domain resources by overridingdb.ignore.external.links
.interface
URLFilter
Interface used to limit which URLs enter Nutch. -
Uses of Pluggable in org.apache.nutch.parse
Subinterfaces of Pluggable in org.apache.nutch.parse Modifier and Type Interface Description interface
HtmlParseFilter
Extension point for DOM-based HTML parsers.interface
Parser
A parser for content generated by aProtocol
implementation. -
Uses of Pluggable in org.apache.nutch.parse.ext
Classes in org.apache.nutch.parse.ext that implement Pluggable Modifier and Type Class Description class
ExtParser
A wrapper that invokes external command to do real parsing job. -
Uses of Pluggable in org.apache.nutch.parse.feed
Classes in org.apache.nutch.parse.feed that implement Pluggable Modifier and Type Class Description class
FeedParser
-
Uses of Pluggable in org.apache.nutch.parse.headings
Classes in org.apache.nutch.parse.headings that implement Pluggable Modifier and Type Class Description class
HeadingsParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM. -
Uses of Pluggable in org.apache.nutch.parse.html
Classes in org.apache.nutch.parse.html that implement Pluggable Modifier and Type Class Description class
HtmlParser
-
Uses of Pluggable in org.apache.nutch.parse.js
Classes in org.apache.nutch.parse.js that implement Pluggable Modifier and Type Class Description class
JSParseFilter
This class is a heuristic link extractor for JavaScript files and code snippets. -
Uses of Pluggable in org.apache.nutch.parse.metatags
Classes in org.apache.nutch.parse.metatags that implement Pluggable Modifier and Type Class Description class
MetaTagsParser
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'. -
Uses of Pluggable in org.apache.nutch.parse.tika
Classes in org.apache.nutch.parse.tika that implement Pluggable Modifier and Type Class Description class
TikaParser
Wrapper for Tika parsers. -
Uses of Pluggable in org.apache.nutch.parse.zip
Classes in org.apache.nutch.parse.zip that implement Pluggable Modifier and Type Class Description class
ZipParser
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter. -
Uses of Pluggable in org.apache.nutch.parsefilter.debug
Classes in org.apache.nutch.parsefilter.debug that implement Pluggable Modifier and Type Class Description class
DebugParseFilter
Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML). -
Uses of Pluggable in org.apache.nutch.parsefilter.naivebayes
Classes in org.apache.nutch.parsefilter.naivebayes that implement Pluggable Modifier and Type Class Description class
NaiveBayesParseFilter
Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevant it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist. -
Uses of Pluggable in org.apache.nutch.parsefilter.regex
Classes in org.apache.nutch.parsefilter.regex that implement Pluggable Modifier and Type Class Description class
RegexParseFilter
RegexParseFilter. -
Uses of Pluggable in org.apache.nutch.protocol
Subinterfaces of Pluggable in org.apache.nutch.protocol Modifier and Type Interface Description interface
Protocol
A retriever of url content. -
Uses of Pluggable in org.apache.nutch.protocol.file
Classes in org.apache.nutch.protocol.file that implement Pluggable Modifier and Type Class Description class
File
This class is a protocol plugin used for file: scheme. -
Uses of Pluggable in org.apache.nutch.protocol.ftp
Classes in org.apache.nutch.protocol.ftp that implement Pluggable Modifier and Type Class Description class
Ftp
This class is a protocol plugin used for ftp: scheme. -
Uses of Pluggable in org.apache.nutch.protocol.htmlunit
Classes in org.apache.nutch.protocol.htmlunit that implement Pluggable Modifier and Type Class Description class
Http
-
Uses of Pluggable in org.apache.nutch.protocol.http
Classes in org.apache.nutch.protocol.http that implement Pluggable Modifier and Type Class Description class
Http
-
Uses of Pluggable in org.apache.nutch.protocol.http.api
Classes in org.apache.nutch.protocol.http.api that implement Pluggable Modifier and Type Class Description class
HttpBase
-
Uses of Pluggable in org.apache.nutch.protocol.httpclient
Classes in org.apache.nutch.protocol.httpclient that implement Pluggable Modifier and Type Class Description class
Http
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. -
Uses of Pluggable in org.apache.nutch.protocol.interactiveselenium
Classes in org.apache.nutch.protocol.interactiveselenium that implement Pluggable Modifier and Type Class Description class
Http
-
Uses of Pluggable in org.apache.nutch.protocol.okhttp
Classes in org.apache.nutch.protocol.okhttp that implement Pluggable Modifier and Type Class Description class
OkHttp
-
Uses of Pluggable in org.apache.nutch.protocol.selenium
Classes in org.apache.nutch.protocol.selenium that implement Pluggable Modifier and Type Class Description class
Http
-
Uses of Pluggable in org.apache.nutch.publisher
Subinterfaces of Pluggable in org.apache.nutch.publisher Modifier and Type Interface Description interface
NutchPublisher
All publisher subscriber model implementations should implement this interface.Classes in org.apache.nutch.publisher that implement Pluggable Modifier and Type Class Description class
NutchPublishers
-
Uses of Pluggable in org.apache.nutch.publisher.rabbitmq
Classes in org.apache.nutch.publisher.rabbitmq that implement Pluggable Modifier and Type Class Description class
RabbitMQPublisherImpl
-
Uses of Pluggable in org.apache.nutch.scoring
Subinterfaces of Pluggable in org.apache.nutch.scoring Modifier and Type Interface Description interface
ScoringFilter
A contract defining behavior of scoring plugins.Classes in org.apache.nutch.scoring that implement Pluggable Modifier and Type Class Description class
AbstractScoringFilter
class
ScoringFilters
Creates and cachesScoringFilter
implementing plugins. -
Uses of Pluggable in org.apache.nutch.scoring.depth
Classes in org.apache.nutch.scoring.depth that implement Pluggable Modifier and Type Class Description class
DepthScoringFilter
This scoring filter limits the number of hops from the initial seed urls. -
Uses of Pluggable in org.apache.nutch.scoring.link
Classes in org.apache.nutch.scoring.link that implement Pluggable Modifier and Type Class Description class
LinkAnalysisScoringFilter
-
Uses of Pluggable in org.apache.nutch.scoring.metadata
Classes in org.apache.nutch.scoring.metadata that implement Pluggable Modifier and Type Class Description class
MetadataScoringFilter
For documentation:org.apache.nutch.scoring.metadata
-
Uses of Pluggable in org.apache.nutch.scoring.opic
Classes in org.apache.nutch.scoring.opic that implement Pluggable Modifier and Type Class Description class
OPICScoringFilter
This plugin implements a variant of an Online Page Importance Computation (OPIC) score, described in this paper: Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive On-Line Page Importance Computation. -
Uses of Pluggable in org.apache.nutch.scoring.orphan
Classes in org.apache.nutch.scoring.orphan that implement Pluggable Modifier and Type Class Description class
OrphanScoringFilter
Orphan scoring filter that determines whether a page has become orphaned, e.g. -
Uses of Pluggable in org.apache.nutch.scoring.similarity
Classes in org.apache.nutch.scoring.similarity that implement Pluggable Modifier and Type Class Description class
SimilarityScoringFilter
-
Uses of Pluggable in org.apache.nutch.scoring.tld
Classes in org.apache.nutch.scoring.tld that implement Pluggable Modifier and Type Class Description class
TLDScoringFilter
Scoring filter to boost top-level domains (TLDs). -
Uses of Pluggable in org.apache.nutch.scoring.urlmeta
Classes in org.apache.nutch.scoring.urlmeta that implement Pluggable Modifier and Type Class Description class
URLMetaScoringFilter
For documentation:org.apache.nutch.scoring.urlmeta
-
Uses of Pluggable in org.apache.nutch.urlfilter.api
Classes in org.apache.nutch.urlfilter.api that implement Pluggable Modifier and Type Class Description class
RegexURLFilterBase
GenericURLFilter
based on regular expressions. -
Uses of Pluggable in org.apache.nutch.urlfilter.automaton
Classes in org.apache.nutch.urlfilter.automaton that implement Pluggable Modifier and Type Class Description class
AutomatonURLFilter
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM. -
Uses of Pluggable in org.apache.nutch.urlfilter.domain
Classes in org.apache.nutch.urlfilter.domain that implement Pluggable Modifier and Type Class Description class
DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames. -
Uses of Pluggable in org.apache.nutch.urlfilter.domaindenylist
Classes in org.apache.nutch.urlfilter.domaindenylist that implement Pluggable Modifier and Type Class Description class
DomainDenylistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames. -
Uses of Pluggable in org.apache.nutch.urlfilter.fast
Classes in org.apache.nutch.urlfilter.fast that implement Pluggable Modifier and Type Class Description class
FastURLFilter
Filters URLs based on a file of regular expressions using host/domains matching first. -
Uses of Pluggable in org.apache.nutch.urlfilter.ignoreexempt
Classes in org.apache.nutch.urlfilter.ignoreexempt that implement Pluggable Modifier and Type Class Description class
ExemptionUrlFilter
This implementation ofURLExemptionFilter
uses regex configuration to check if URL is eligible for exemption from 'db.ignore.external'. -
Uses of Pluggable in org.apache.nutch.urlfilter.prefix
Classes in org.apache.nutch.urlfilter.prefix that implement Pluggable Modifier and Type Class Description class
PrefixURLFilter
Filters URLs based on a file of URL prefixes. -
Uses of Pluggable in org.apache.nutch.urlfilter.regex
Classes in org.apache.nutch.urlfilter.regex that implement Pluggable Modifier and Type Class Description class
RegexURLFilter
Filters URLs based on a file of regular expressions using theJava Regex implementation
. -
Uses of Pluggable in org.apache.nutch.urlfilter.suffix
Classes in org.apache.nutch.urlfilter.suffix that implement Pluggable Modifier and Type Class Description class
SuffixURLFilter
Filters URLs based on a file of URL suffixes. -
Uses of Pluggable in org.apache.nutch.urlfilter.validator
Classes in org.apache.nutch.urlfilter.validator that implement Pluggable Modifier and Type Class Description class
UrlValidator
Validates URLs. -
Uses of Pluggable in org.creativecommons.nutch
Classes in org.creativecommons.nutch that implement Pluggable Modifier and Type Class Description class
CCIndexingFilter
Adds basic searchable fields to a document.class
CCParseFilter
Adds metadata identifying the Creative Commons license used, if any.
-