Uses of Package
org.apache.nutch.parse
-
Packages that use org.apache.nutch.parse Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.fetcher The Nutch multi-threaded fetching moduleorg.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.parse.ext Parse wrapper to run external command to do the parsing.org.apache.nutch.parse.feed Parse RSS feeds.org.apache.nutch.parse.headings Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.org.apache.nutch.parse.html An HTML document parsing plugin.org.apache.nutch.parse.js Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets.org.apache.nutch.parse.metatags Parse filter to extract meta tags: keywords, description, etc.org.apache.nutch.parse.tika Parse various document formats with help of Apache Tika.org.apache.nutch.parse.zip Parse ZIP files: embedded files are recursively passed to appropriate parsers.org.apache.nutch.parsefilter.debug Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML).org.apache.nutch.parsefilter.naivebayes Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.org.apache.nutch.parsefilter.regex RegexParseFilter.org.apache.nutch.scoring TheScoringFilter
interface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph
.org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.similarity org.apache.nutch.scoring.similarity.cosine Implements the cosine similarity metric for scoring relevant documentsorg.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.segment A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.org.apache.nutch.service.model.response org.apache.nutch.tools Miscellaneous tools.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
Classes in org.apache.nutch.parse used by org.apache.nutch.analysis.lang Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.Parse The result of parsing a page's raw content.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.crawl Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.fetcher Class Description Outlink An outgoing link from a page. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.anchor Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.arbitrary Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.basic Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.feed Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.filter Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.geoip Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.jexl Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.links Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.metadata Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.more Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.replace Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.staticfield Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.subcollection Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.tld Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.indexer.urlmeta Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.microformats.reltag Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.Parse The result of parsing a page's raw content.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.Outlink An outgoing link from a page.Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content.ParseException ParseImpl The result of parsing a page's raw content.Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse.ParserNotFound ParseStatus ParseText -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.ext Class Description Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.feed Class Description Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.headings Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.html Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.Outlink An outgoing link from a page.Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.js Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.metatags Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.tika Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.Outlink An outgoing link from a page.Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parse.zip Class Description Outlink An outgoing link from a page.Parser A parser for content generated by aProtocol
implementation.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parsefilter.debug Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parsefilter.naivebayes Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.parsefilter.regex Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.ParseResult A utility class that stores result of a parse. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.depth Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.link Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.metadata Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.opic Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.similarity Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.similarity.cosine Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.tld Class Description Parse The result of parsing a page's raw content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.scoring.urlmeta Class Description Parse The result of parsing a page's raw content.ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.apache.nutch.segment Class Description ParseData Data extracted from a page's content.ParseText -
Classes in org.apache.nutch.parse used by org.apache.nutch.service.model.response Class Description Outlink An outgoing link from a page. -
Classes in org.apache.nutch.parse used by org.apache.nutch.tools Class Description ParseData Data extracted from a page's content. -
Classes in org.apache.nutch.parse used by org.creativecommons.nutch Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilter Extension point for DOM-based HTML parsers.Parse The result of parsing a page's raw content.ParseException ParseResult A utility class that stores result of a parse.