Package org.apache.nutch.parse.html

An HTML document parsing plugin.


Class Summary
DOMBuilder This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMContentUtils A collection of methods for extracting content from DOM trees.
HTMLMetaProcessor Class for parsing META Directives from DOM trees.
XMLCharacterRecognizer Class used to verify whether the specified ch conforms to the XML 1.0 definition of whitespace.

This package relies on NekoHTML.

