Class MSPowerPointParser

  extended by
      extended by org.apache.nutch.parse.mspowerpoint.MSPowerPointParser
All Implemented Interfaces:
Configurable, Parser, Pluggable

public class MSPowerPointParser
extends MSBaseParser

Nutch-Parser for parsing MS PowerPoint slides ( mime type: application/

It is based on org.apache.poi.*.

Stephan Strittmatter -, Jérôme Charron
See Also:
Jakarta POI

Field Summary
static String MIME_TYPE
          Associated Mime type for PowerPoint files (application/
Fields inherited from class
Fields inherited from interface org.apache.nutch.parse.Parser
Constructor Summary
Method Summary
 ParseResult getParse(Content content)
           This method parses the given content and returns a map of <key, parse> pairs.
static void main(String[] args)
          Main for testing.
Methods inherited from class
getConf, getParse, main, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


public static final String MIME_TYPE
Associated Mime type for PowerPoint files (application/

See Also:
Constant Field Values
Constructor Detail


public MSPowerPointParser()
Method Detail


public ParseResult getParse(Content content)
Description copied from interface: Parser

This method parses the given content and returns a map of <key, parse> pairs. Parse instances will be persisted under the given key.

Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"", Parse with a ParseStatus indicating the redirect>.

content - Content to be parsed
a map containing <key, parse> pairs


public static void main(String[] args)
Main for testing. Pass a powerpoint document as argument

