Class DomainURLFilter

  extended by org.apache.nutch.urlfilter.domain.DomainURLFilter
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, URLFilter, Pluggable

public class DomainURLFilter
extends Object
implements URLFilter

Filters URLs based on a file containing domain suffixes, domain names, and hostnames. Only a url that matches one of the suffixes, domains, or hosts present in the file is allowed.

Urls are checked in order of domain suffix, domain name, and hostname against entries in the domain file. The domain file would be setup as follows with one entry per line:


The first line is an example of a filter that would allow all .com domains. The second line allows all urls from and all of its subdomains such as and The third line would allow only urls from There is no specific ordering to entries. The entries are from more general to more specific with the more general overridding the more specific.

The domain file defaults to domain-urlfilter.txt in the classpath but can be overridden using the: the attribute "file" has higher precedence if defined.

Field Summary
Fields inherited from interface
Constructor Summary
          Default constructor.
DomainURLFilter(String domainFile)
          Constructor that specifies the domain file to use.
Method Summary
 String filter(String url)
 org.apache.hadoop.conf.Configuration getConf()
 void setConf(org.apache.hadoop.conf.Configuration conf)
          Sets the configuration.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public DomainURLFilter()
Default constructor.


public DomainURLFilter(String domainFile)
Constructor that specifies the domain file to use.

domainFile - The domain file, overrides domain-urlfilter.text default.
Method Detail


public void setConf(org.apache.hadoop.conf.Configuration conf)
Sets the configuration.

Specified by:
setConf in interface org.apache.hadoop.conf.Configurable


public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable


public String filter(String url)
Specified by:
filter in interface URLFilter

Copyright © 2013 The Apache Software Foundation