Interface IndexingFilter

All Superinterfaces:
org.apache.hadoop.conf.Configurable, FieldPluggable, Pluggable
All Known Implementing Classes:
AnchorIndexingFilter, BasicIndexingFilter, CCIndexingFilter, FeedIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter, SubcollectionIndexingFilter, TLDIndexingFilter

public interface IndexingFilter
extends FieldPluggable, org.apache.hadoop.conf.Configurable

Extension point for indexing. Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse.

Field Summary
static String X_POINT_ID
          The name of the extension point.
Method Summary
 NutchDocument filter(NutchDocument doc, String url, WebPage page)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
Methods inherited from interface org.apache.nutch.plugin.FieldPluggable
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Detail


static final String X_POINT_ID
The name of the extension point.

Method Detail


NutchDocument filter(NutchDocument doc,
                     String url,
                     WebPage page)
                     throws IndexingException
Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

doc - document instance for collecting fields
url - page url
page -
modified (or a new) document instance, or null (meaning the document should be discarded)

Copyright © 2013 The Apache Software Foundation