Package org.apache.nutch.parse.html

An HTML document parsing plugin.


Class Summary
DOMBuilder This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMContentUtils A collection of methods for extracting content from DOM trees.
HTMLMetaProcessor Class for parsing META Directives from DOM trees.
XMLCharacterRecognizer Class used to verify whether the specified ch conforms to the XML 1.0 definition of whitespace.

Package org.apache.nutch.parse.html Description

An HTML document parsing plugin.

This package relies on NekoHTML.

Copyright © 2013 The Apache Software Foundation