org.apache.nutch.indexer (apache-nutch 1.21 API)

Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index. Two tasks are delegated to plugins:

Interface Summary
Interface	Description
IndexingFilter	Extension point for indexing.
IndexWriter

Class Summary
Class	Description
CleaningJob	The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.
CleaningJob.DBFilter
CleaningJob.DeleterReducer
IndexerMapReduce	This class is typically invoked from within `IndexingJob` and handles all MapReduce functionality required when undertaking indexing.
IndexerMapReduce.IndexerMapper
IndexerMapReduce.IndexerReducer
IndexerOutputFormat
IndexingFilters	Creates and caches `IndexingFilter` implementing plugins.
IndexingFiltersChecker	Reads and parses a URL and run the indexers on it.
IndexingJob	Generic indexer which relies on the plugins implementing IndexWriter
IndexWriterConfig
IndexWriterParams
IndexWriters	Creates and caches `IndexWriter` implementing plugins.
NutchDocument	A `NutchDocument` is the unit of indexing.
NutchField	This class represents a multi-valued field with a weight.
NutchIndexAction	A `NutchIndexAction` is the new unit of indexing holding the document and action information.

Package org.apache.nutch.indexer