Uses of Class
org.apache.nutch.crawl.CrawlDatum

Packages that use CrawlDatum
org.apache.nutch.analysis.lang Text document language identifier. 
org.apache.nutch.crawl Crawl control code. 
org.apache.nutch.fetcher The Nutch robot. 
org.apache.nutch.indexer Maintain Lucene full-text indexes. 
org.apache.nutch.indexer.basic A basic indexing plugin. 
org.apache.nutch.indexer.more A more indexing plugin. 
org.apache.nutch.indexer.solr   
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin. 
org.apache.nutch.protocol   
org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources. 
org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol. 
org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol. 
org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http, httpclient
org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server. 
org.apache.nutch.scoring   
org.apache.nutch.scoring.opic   
org.apache.nutch.scoring.webgraph   
org.apache.nutch.segment   
org.apache.nutch.tools   
org.apache.nutch.util.domain org.apache.nutch.util.domain 
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata. 
 

Uses of CrawlDatum in org.apache.nutch.analysis.lang
 

Methods in org.apache.nutch.analysis.lang with parameters of type CrawlDatum
 NutchDocument LanguageIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.crawl
 

Fields in org.apache.nutch.crawl declared as CrawlDatum
 CrawlDatum Generator.SelectorEntry.datum
           
 

Methods in org.apache.nutch.crawl that return CrawlDatum
 CrawlDatum FetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
 CrawlDatum AbstractFetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
 CrawlDatum CrawlDbReader.get(String crawlDb, String url, Configuration config)
           
 CrawlDatum FetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 CrawlDatum AbstractFetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
static CrawlDatum CrawlDatum.read(DataInput in)
           
 CrawlDatum FetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum DefaultFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AdaptiveFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AbstractFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum FetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum AbstractFetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum FetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 CrawlDatum AbstractFetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 

Methods in org.apache.nutch.crawl that return types with arguments of type CrawlDatum
 RecordWriter<Text,CrawlDatum> CrawlDbReader.CrawlDatumCsvOutputFormat.getRecordWriter(FileSystem fs, JobConf job, String name, Progressable progress)
           
 

Methods in org.apache.nutch.crawl with parameters of type CrawlDatum
 long FetchSchedule.calculateLastFetchTime(CrawlDatum datum)
          Calculates last fetch time of the given CrawlDatum.
 long AbstractFetchSchedule.calculateLastFetchTime(CrawlDatum datum)
          This method return the last fetch time of the CrawlDatum
 int CrawlDatum.compareTo(CrawlDatum that)
          Sort by decreasing score.
 CrawlDatum FetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
 CrawlDatum AbstractFetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
static boolean CrawlDatum.hasDbStatus(CrawlDatum datum)
           
static boolean CrawlDatum.hasFetchStatus(CrawlDatum datum)
           
 CrawlDatum FetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 CrawlDatum AbstractFetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 void Generator.Selector.map(Text key, CrawlDatum value, OutputCollector<FloatWritable,Generator.SelectorEntry> output, Reporter reporter)
          Select & invert subset due for fetch.
 void CrawlDbReader.CrawlDbTopNMapper.map(Text key, CrawlDatum value, OutputCollector<FloatWritable,Text> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbFilter.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReader.CrawlDbStatMapper.map(Text key, CrawlDatum value, OutputCollector<Text,LongWritable> output, Reporter reporter)
           
 void CrawlDatum.putAllMetaData(CrawlDatum other)
          Add all metadata from other CrawlDatum to this CrawlDatum.
 void CrawlDatum.set(CrawlDatum that)
          Copy the contents of another instance into this instance.
 CrawlDatum FetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum DefaultFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AdaptiveFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AbstractFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum FetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum AbstractFetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum FetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 CrawlDatum AbstractFetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 boolean FetchSchedule.shouldFetch(Text url, CrawlDatum datum, long curTime)
          This method provides information whether the page is suitable for selection in the current fetchlist.
 boolean AbstractFetchSchedule.shouldFetch(Text url, CrawlDatum datum, long curTime)
          This method provides information whether the page is suitable for selection in the current fetchlist.
 void CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter.write(Text key, CrawlDatum value)
           
 

Method parameters in org.apache.nutch.crawl with type arguments of type CrawlDatum
 void Generator.CrawlDbUpdater.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbFilter.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectMapper.map(WritableComparable key, Text value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbMerger.Merger.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbMerger.Merger.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.PartitionReducer.reduce(Text key, Iterator<Generator.SelectorEntry> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.apache.nutch.fetcher
 

Methods in org.apache.nutch.fetcher that return CrawlDatum
 CrawlDatum FetcherOutput.getCrawlDatum()
           
 

Method parameters in org.apache.nutch.fetcher with type arguments of type CrawlDatum
 void Fetcher.run(RecordReader<Text,CrawlDatum> input, OutputCollector<Text,NutchWritable> output, Reporter reporter)
           
 

Constructors in org.apache.nutch.fetcher with parameters of type CrawlDatum
FetcherOutput(CrawlDatum crawlDatum, Content content, ParseImpl parse)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer
 

Methods in org.apache.nutch.indexer with parameters of type CrawlDatum
 NutchDocument IndexingFilters.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Run all defined filters.
 NutchDocument IndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 

Uses of CrawlDatum in org.apache.nutch.indexer.basic
 

Methods in org.apache.nutch.indexer.basic with parameters of type CrawlDatum
 NutchDocument BasicIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.more
 

Methods in org.apache.nutch.indexer.more with parameters of type CrawlDatum
 NutchDocument MoreIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.solr
 

Methods in org.apache.nutch.indexer.solr with parameters of type CrawlDatum
 void SolrClean.DBFilter.map(Text key, CrawlDatum value, OutputCollector<ByteWritable,Text> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.apache.nutch.microformats.reltag
 

Methods in org.apache.nutch.microformats.reltag with parameters of type CrawlDatum
 NutchDocument RelTagIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol
 

Methods in org.apache.nutch.protocol with parameters of type CrawlDatum
 ProtocolOutput Protocol.getProtocolOutput(Text url, CrawlDatum datum)
          Returns the Content for a fetchlist entry.
 RobotRules Protocol.getRobotRules(Text url, CrawlDatum datum)
          Retrieve robot rules applicable for this url.
 

Uses of CrawlDatum in org.apache.nutch.protocol.file
 

Methods in org.apache.nutch.protocol.file with parameters of type CrawlDatum
 ProtocolOutput File.getProtocolOutput(Text url, CrawlDatum datum)
           
 RobotRules File.getRobotRules(Text url, CrawlDatum datum)
           
 

Constructors in org.apache.nutch.protocol.file with parameters of type CrawlDatum
FileResponse(URL url, CrawlDatum datum, File file, Configuration conf)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.ftp
 

Methods in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum
 ProtocolOutput Ftp.getProtocolOutput(Text url, CrawlDatum datum)
           
 RobotRules Ftp.getRobotRules(Text url, CrawlDatum datum)
           
 

Constructors in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum
FtpResponse(URL url, CrawlDatum datum, Ftp ftp, Configuration conf)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.http
 

Methods in org.apache.nutch.protocol.http with parameters of type CrawlDatum
protected  Response Http.getResponse(URL url, CrawlDatum datum, boolean redirect)
           
 

Constructors in org.apache.nutch.protocol.http with parameters of type CrawlDatum
HttpResponse(HttpBase http, URL url, CrawlDatum datum)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.http.api
 

Methods in org.apache.nutch.protocol.http.api with parameters of type CrawlDatum
 ProtocolOutput HttpBase.getProtocolOutput(Text url, CrawlDatum datum)
           
protected abstract  Response HttpBase.getResponse(URL url, CrawlDatum datum, boolean followRedirects)
           
 RobotRules HttpBase.getRobotRules(Text url, CrawlDatum datum)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.httpclient
 

Methods in org.apache.nutch.protocol.httpclient with parameters of type CrawlDatum
protected  Response Http.getResponse(URL url, CrawlDatum datum, boolean redirect)
          Fetches the url with a configured HTTP client and gets the response.
 

Uses of CrawlDatum in org.apache.nutch.scoring
 

Methods in org.apache.nutch.scoring that return CrawlDatum
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 

Methods in org.apache.nutch.scoring with parameters of type CrawlDatum
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 float ScoringFilters.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          Calculate a sort value for Generate.
 float ScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
 float ScoringFilters.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 float ScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          This method calculates a Lucene document boost.
 void ScoringFilters.initialScore(Text url, CrawlDatum datum)
          Calculate a new initial score, used when adding newly discovered pages.
 void ScoringFilter.initialScore(Text url, CrawlDatum datum)
          Set an initial score for newly discovered pages.
 void ScoringFilters.injectedScore(Text url, CrawlDatum datum)
          Calculate a new initial score, used when injecting new pages.
 void ScoringFilter.injectedScore(Text url, CrawlDatum datum)
          Set an initial score for newly injected pages.
 void ScoringFilters.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
           
 void ScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
          This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it into Content metadata.
 void ScoringFilters.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          Calculate updated page score during CrawlDb.update().
 void ScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.
 

Method parameters in org.apache.nutch.scoring with type arguments of type CrawlDatum
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 void ScoringFilters.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          Calculate updated page score during CrawlDb.update().
 void ScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.
 

Uses of CrawlDatum in org.apache.nutch.scoring.opic
 

Methods in org.apache.nutch.scoring.opic that return CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 

Methods in org.apache.nutch.scoring.opic with parameters of type CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 float OPICScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          Use getScore().
 float OPICScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Dampen the boost value by scorePower.
 void OPICScoringFilter.initialScore(Text url, CrawlDatum datum)
          Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.
 void OPICScoringFilter.injectedScore(Text url, CrawlDatum datum)
           
 void OPICScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
          Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY.
 void OPICScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List inlinked)
          Increase the score by a sum of inlinked scores.
 

Method parameters in org.apache.nutch.scoring.opic with type arguments of type CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 

Uses of CrawlDatum in org.apache.nutch.scoring.webgraph
 

Method parameters in org.apache.nutch.scoring.webgraph with type arguments of type CrawlDatum
 void ScoreUpdater.reduce(Text key, Iterator<ObjectWritable> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
          Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.
 

Uses of CrawlDatum in org.apache.nutch.segment
 

Methods in org.apache.nutch.segment with parameters of type CrawlDatum
 boolean SegmentMergeFilters.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well.
 boolean SegmentMergeFilter.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          The filtering method which gets all information being merged for a given key (URL).
 

Method parameters in org.apache.nutch.segment with type arguments of type CrawlDatum
 boolean SegmentMergeFilters.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well.
 boolean SegmentMergeFilter.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          The filtering method which gets all information being merged for a given key (URL).
 

Uses of CrawlDatum in org.apache.nutch.tools
 

Methods in org.apache.nutch.tools with parameters of type CrawlDatum
 void CrawlDBScanner.map(Text url, CrawlDatum crawlDatum, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Method parameters in org.apache.nutch.tools with type arguments of type CrawlDatum
 void CrawlDBScanner.map(Text url, CrawlDatum crawlDatum, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDBScanner.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDBScanner.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void FreeGenerator.FG.reduce(Text key, Iterator<Generator.SelectorEntry> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.apache.nutch.util.domain
 

Methods in org.apache.nutch.util.domain with parameters of type CrawlDatum
 void DomainStatistics.map(Text urlText, CrawlDatum datum, OutputCollector<Text,LongWritable> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.creativecommons.nutch
 

Methods in org.creativecommons.nutch with parameters of type CrawlDatum
 NutchDocument CCIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 



Copyright © 2011 The Apache Software Foundation