Class CustomFields

  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.CustomFields
All Implemented Interfaces:
Configurable, Tool

public class CustomFields
extends Configured
implements Tool

Creates custom FieldWritable objects from a text file containing field information including field name, value, and optional boost and fields type (as needed by FieldWritable objects). An input text file to CustomFields would be tab separated and would look similar to this:\tlang\ten\t5.0\tCONTENT\tlang\tde
The only required fields are url, name and value. Custom fields are configured through the custom-fields.xml file in the classpath. The config file allow you to set defaults for whether a field is indexed, stored, and tokenized, boosts on a field, and whether a field can output multiple values under the same key. The purpose of the CustomFields job is to allow better integration with technologies such as Hadoop Streaming. Streaming jobs can be created in any programming language, can output the text file needed by the CustomFields job, and those fields can then be included in the index. The concept of custom fields requires two separate pieces. The indexing piece and the query piece. The indexing piece is handled by the CustomFields job. The query piece is handled by the query-custom plugin. Important:
Currently, because of the way the query plugin architecture works, custom fields names must be added to the fields parameter in the query-custom plugin plugin.xml file in order to be queried. The CustomFields tool accepts one or more directories containing text files in the appropriate custom field format. These files are then turned into FieldWritable objects to be included in the index.

Nested Class Summary
static class CustomFields.Collector
          Aggregates FieldWritable objects by the same name for the same URL.
static class CustomFields.Converter
          Converts text values into FieldWritable objects.
Field Summary
static org.apache.commons.logging.Log LOG
Constructor Summary
Method Summary
static void main(String[] args)
 int run(String[] args)
          Runs the CustomFields job.
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Detail


public static final org.apache.commons.logging.Log LOG
Constructor Detail


public CustomFields()
Method Detail


public static void main(String[] args)
                 throws Exception


public int run(String[] args)
        throws Exception
Runs the CustomFields job.

Specified by:
run in interface Tool

Copyright © 2006 The Apache Software Foundation