Package org.apache.nutch.parse
The
Parse
interface and related classes.-
Interface Summary Interface Description HtmlParseFilter Extension point for DOM-based HTML parsers.Parse The result of parsing a page's raw content.Parser A parser for content generated by aProtocol
implementation. -
Class Summary Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilters Creates and cachesHtmlParseFilter
implementing plugins.Outlink An outgoing link from a page.OutlinkExtractor Extractor to extractOutlink
s / URLs from plain text using Regular Expressions.ParseData Data extracted from a page's content.ParseImpl The result of parsing a page's raw content.ParseOutputFormat ParserChecker Parser checker, useful for testing parser.ParseResult A utility class that stores result of a parse.ParserFactory Creates and cachesParser
plugins.ParseSegment ParseSegment.ParseSegmentMapper ParseSegment.ParseSegmentReducer ParseStatus ParseText ParseUtil -
Exception Summary Exception Description ParseException ParserNotFound