Class BasicFields

  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.BasicFields
All Implemented Interfaces:
Configurable, Tool

public class BasicFields
extends Configured
implements Tool

Creates the basic FieldWritable objects. The basic fields are the main fields used in indexing segments. Many other fields jobs will rely on the urls being present in the basic fields output to create their fields for indexing. Basic fields are extracted from segements. Only urls that were successfully fetched and parsed will be converted. This job also implements a portion of redirect logic. If a url contains both a redirect or orig url then both the url and its orig will be measured against their link analysis score with the highest scoring one being the url used for display in the index. This ensures that we index content under the best, most popular, url which is most often the one users are expecting. The BasicFields tool can accept one or more segments to convert to fields. If multiple segments have overlapping content, only the latest successfully fetched content will be converted.

Nested Class Summary
static class BasicFields.Flipper
          Runs the first part of redirect logic.
static class BasicFields.Merger
          Merges output of all segments fields collecting only the most recent set of fields for any given url.
static class BasicFields.Scorer
          The Scorer job sets the boost field from the NodeDb score.
Field Summary
static org.apache.commons.logging.Log LOG
Constructor Summary
Method Summary
 void createFields(Path nodeDb, Path[] segments, Path output)
          Runs the BasicFields jobs for every segment and aggregates and filters the output to create a final database of FieldWritable objects.
static void main(String[] args)
 int run(String[] args)
          Runs the BasicFields tool.
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Detail


public static final org.apache.commons.logging.Log LOG
Constructor Detail


public BasicFields()
Method Detail


public void createFields(Path nodeDb,
                         Path[] segments,
                         Path output)
                  throws IOException
Runs the BasicFields jobs for every segment and aggregates and filters the output to create a final database of FieldWritable objects.

nodeDb - The node database.
segments - The array of segments to process.
output - The BasicFields output.
IOException - If an error occurs while processing the segments.


public static void main(String[] args)
                 throws Exception


public int run(String[] args)
        throws Exception
Runs the BasicFields tool.

Specified by:
run in interface Tool

Copyright © 2006 The Apache Software Foundation