Package org.apache.nutch.parse.html
An HTML document parsing plugin.
This package relies on NekoHTML.
-
Class Summary Class Description DOMBuilder This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.DOMContentUtils A collection of methods for extracting content from DOM trees.DOMContentUtils.LinkParams HTMLMetaProcessor Class for parsing META Directives from DOM trees.HtmlParser XMLCharacterRecognizer Class used to verify whether the specified ch conforms to the XML 1.0 definition of whitespace.