Package org.apache.nutch.net
Interface URLFilter
-
- All Superinterfaces:
Configurable
,Pluggable
- All Known Implementing Classes:
AutomatonURLFilter
,DomainDenylistURLFilter
,DomainURLFilter
,ExemptionUrlFilter
,FastURLFilter
,PrefixURLFilter
,RegexURLFilter
,RegexURLFilterBase
,Subcollection
,SuffixURLFilter
,UrlValidator
public interface URLFilter extends Pluggable, Configurable
Interface used to limit which URLs enter Nutch. Used per default by injector, fetcher and parser for all URLs seen first (seeds, outlinks, redirects). URL filters can be optionally enabled for many more Nutch tools.
-
-
Field Summary
Fields Modifier and Type Field Description static String
X_POINT_ID
The name of the extension point.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description String
filter(String urlString)
Interface for a filter that transforms a URL: it can pass the original URL through or "delete" the URL by returning null-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
X_POINT_ID
static final String X_POINT_ID
The name of the extension point.
-
-
Method Detail
-
filter
String filter(String urlString)
Interface for a filter that transforms a URL: it can pass the original URL through or "delete" the URL by returning null- Parameters:
urlString
- the URL string the filter is applied on- Returns:
- the original URL string if the URL is accepted by the filter or null in case the URL is rejected
-
-