Class AnchorFields

  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.AnchorFields
All Implemented Interfaces:
Configurable, Tool

public class AnchorFields
extends Configured
implements Tool

Creates FieldWritable objects for inbound anchor text. These FieldWritable objects are then included in the input to the FieldIndexer to be converted to Lucene Field objects and indexed. Any empty or null anchor text is ignored. Anchors are sorted in descending order according to the score of their parent pages. There are settings for a maximum number of anchors to index and whether those anchors should be stored and tokenized. With a descending order by score and a maximum anchors index we ensure that only the best anchors are indexed assuming that a higher link analysis score equals a better page and better inbound text.

Nested Class Summary
static class AnchorFields.Collector
          Collects and creates FieldWritable objects from the inlinks.
static class AnchorFields.Extractor
          Extracts outlinks to be created as FieldWritable objects.
Field Summary
static org.apache.commons.logging.Log LOG
Constructor Summary
Method Summary
 void createFields(Path webGraphDb, Path basicFields, Path output)
          Creates the FieldsWritable object from the anchors.
static void main(String[] args)
 int run(String[] args)
          Runs the AnchorFields job.
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Detail


public static final org.apache.commons.logging.Log LOG
Constructor Detail


public AnchorFields()
Method Detail


public void createFields(Path webGraphDb,
                         Path basicFields,
                         Path output)
                  throws IOException
Creates the FieldsWritable object from the anchors.

webGraphDb - The WebGraph from which to pull outlinks.
basicFields - The BasicFields that must be present to avoid orphan anchor fields.
output - The AnchorFields output.
IOException - If an error occurs while creating the fields.


public static void main(String[] args)
                 throws Exception


public int run(String[] args)
        throws Exception
Runs the AnchorFields job.

Specified by:
run in interface Tool

Copyright © 2006 The Apache Software Foundation