org.apache.nutch.tools
Class ResolveUrls

java.lang.Object
  extended by org.apache.nutch.tools.ResolveUrls

public class ResolveUrls
extends Object

A simple tool that will spin up multiple threads to resolve urls to ip addresses. This can be used to verify that pages that are failing due to UnknownHostException during fetching are actually bad and are not failing due to a dns problem in fetching.


Field Summary
static org.slf4j.Logger LOG
           
 
Constructor Summary
ResolveUrls(String urlsFile)
          Create a new ResolveUrls with a file from the local file system.
ResolveUrls(String urlsFile, int numThreads)
          Create a new ResolveUrls with a urls file and a number of threads for the Thread pool.
 
Method Summary
static void main(String[] args)
          Runs the resolve urls tool.
 void resolveUrls()
          Creates a thread pool for resolving urls.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

ResolveUrls

public ResolveUrls(String urlsFile)
Create a new ResolveUrls with a file from the local file system.

Parameters:
urlsFile - The local urls file, one url per line.

ResolveUrls

public ResolveUrls(String urlsFile,
                   int numThreads)
Create a new ResolveUrls with a urls file and a number of threads for the Thread pool. Number of threads is 100 by default.

Parameters:
urlsFile - The local urls file, one url per line.
numThreads - The number of threads used to resolve urls in parallel.
Method Detail

resolveUrls

public void resolveUrls()
Creates a thread pool for resolving urls. Reads in the url file on the local filesystem. For each url it attempts to resolve it keeping a total account of the number resolved, errored, and the amount of time.


main

public static void main(String[] args)
Runs the resolve urls tool.



Copyright © 2011 The Apache Software Foundation