Uses of Class
org.apache.nutch.crawl.CrawlDatum

Packages that use CrawlDatum
org.apache.nutch.analysis.lang Text document language identifier. 
org.apache.nutch.crawl Crawl control code. 
org.apache.nutch.fetcher The Nutch robot. 
org.apache.nutch.indexer Maintain Lucene full-text indexes. 
org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text. 
org.apache.nutch.indexer.basic A basic indexing plugin. 
org.apache.nutch.indexer.feed   
org.apache.nutch.indexer.metadata   
org.apache.nutch.indexer.more A more indexing plugin. 
org.apache.nutch.indexer.solr   
org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data. 
org.apache.nutch.indexer.subcollection   
org.apache.nutch.indexer.tld Top Level Domain Indexing plugin. 
org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Plugin 
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin. 
org.apache.nutch.protocol   
org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources. 
org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol. 
org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol. 
org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http, httpclient
org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server. 
org.apache.nutch.scoring   
org.apache.nutch.scoring.link   
org.apache.nutch.scoring.opic   
org.apache.nutch.scoring.tld Top Level Domain Scoring plugin. 
org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Plugin 
org.apache.nutch.scoring.webgraph   
org.apache.nutch.segment   
org.apache.nutch.tools   
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata. 
 

Uses of CrawlDatum in org.apache.nutch.analysis.lang
 

Methods in org.apache.nutch.analysis.lang with parameters of type CrawlDatum
 NutchDocument LanguageIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.crawl
 

Fields in org.apache.nutch.crawl declared as CrawlDatum
 CrawlDatum Generator.SelectorEntry.datum
           
 

Methods in org.apache.nutch.crawl that return CrawlDatum
 CrawlDatum AbstractFetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
 CrawlDatum FetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
 CrawlDatum CrawlDbReader.get(String crawlDb, String url, Configuration config)
           
 CrawlDatum AbstractFetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 CrawlDatum FetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
static CrawlDatum CrawlDatum.read(DataInput in)
           
 CrawlDatum AbstractFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum FetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum AdaptiveFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum DefaultFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AbstractFetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum FetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum AbstractFetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 CrawlDatum FetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 

Methods in org.apache.nutch.crawl that return types with arguments of type CrawlDatum
 RecordWriter<Text,CrawlDatum> CrawlDbReader.CrawlDatumCsvOutputFormat.getRecordWriter(FileSystem fs, JobConf job, String name, Progressable progress)
           
 

Methods in org.apache.nutch.crawl with parameters of type CrawlDatum
 long AbstractFetchSchedule.calculateLastFetchTime(CrawlDatum datum)
          This method return the last fetch time of the CrawlDatum
 long FetchSchedule.calculateLastFetchTime(CrawlDatum datum)
          Calculates last fetch time of the given CrawlDatum.
 int CrawlDatum.compareTo(CrawlDatum that)
          Sort by decreasing score.
 CrawlDatum AbstractFetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching.
 CrawlDatum FetchSchedule.forceRefetch(Text url, CrawlDatum datum, boolean asap)
          This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching.
static boolean CrawlDatum.hasDbStatus(CrawlDatum datum)
           
static boolean CrawlDatum.hasFetchStatus(CrawlDatum datum)
           
 CrawlDatum AbstractFetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 CrawlDatum FetchSchedule.initializeSchedule(Text url, CrawlDatum datum)
          Initialize fetch schedule related data.
 void Generator.Selector.map(Text key, CrawlDatum value, OutputCollector<FloatWritable,Generator.SelectorEntry> output, Reporter reporter)
          Select & invert subset due for fetch.
 void CrawlDbReader.CrawlDbTopNMapper.map(Text key, CrawlDatum value, OutputCollector<FloatWritable,Text> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReader.CrawlDbDumpMapper.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbFilter.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReader.CrawlDbStatMapper.map(Text key, CrawlDatum value, OutputCollector<Text,LongWritable> output, Reporter reporter)
           
 void CrawlDatum.putAllMetaData(CrawlDatum other)
          Add all metadata from other CrawlDatum to this CrawlDatum.
 void CrawlDatum.set(CrawlDatum that)
          Copy the contents of another instance into this instance.
 CrawlDatum AbstractFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum FetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
          Sets the fetchInterval and fetchTime on a successfully fetched page.
 CrawlDatum AdaptiveFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum DefaultFetchSchedule.setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
           
 CrawlDatum AbstractFetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum FetchSchedule.setPageGoneSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method specifies how to schedule refetching of pages marked as GONE.
 CrawlDatum AbstractFetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 CrawlDatum FetchSchedule.setPageRetrySchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime)
          This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors.
 boolean AbstractFetchSchedule.shouldFetch(Text url, CrawlDatum datum, long curTime)
          This method provides information whether the page is suitable for selection in the current fetchlist.
 boolean FetchSchedule.shouldFetch(Text url, CrawlDatum datum, long curTime)
          This method provides information whether the page is suitable for selection in the current fetchlist.
 void CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter.write(Text key, CrawlDatum value)
           
 

Method parameters in org.apache.nutch.crawl with type arguments of type CrawlDatum
 void Generator.CrawlDbUpdater.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReader.CrawlDbDumpMapper.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbFilter.map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectMapper.map(WritableComparable key, Text value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.CrawlDbUpdater.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Injector.InjectReducer.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbMerger.Merger.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDbMerger.Merger.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void Generator.PartitionReducer.reduce(Text key, Iterator<Generator.SelectorEntry> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.apache.nutch.fetcher
 

Methods in org.apache.nutch.fetcher that return CrawlDatum
 CrawlDatum FetcherOutput.getCrawlDatum()
           
 

Method parameters in org.apache.nutch.fetcher with type arguments of type CrawlDatum
 void Fetcher.run(RecordReader<Text,CrawlDatum> input, OutputCollector<Text,NutchWritable> output, Reporter reporter)
           
 

Constructors in org.apache.nutch.fetcher with parameters of type CrawlDatum
FetcherOutput(CrawlDatum crawlDatum, Content content, ParseImpl parse)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer
 

Methods in org.apache.nutch.indexer with parameters of type CrawlDatum
 NutchDocument IndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 NutchDocument IndexingFilters.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Run all defined filters.
 

Uses of CrawlDatum in org.apache.nutch.indexer.anchor
 

Methods in org.apache.nutch.indexer.anchor with parameters of type CrawlDatum
 NutchDocument AnchorIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.basic
 

Methods in org.apache.nutch.indexer.basic with parameters of type CrawlDatum
 NutchDocument BasicIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.feed
 

Methods in org.apache.nutch.indexer.feed with parameters of type CrawlDatum
 NutchDocument FeedIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch index.
 

Uses of CrawlDatum in org.apache.nutch.indexer.metadata
 

Methods in org.apache.nutch.indexer.metadata with parameters of type CrawlDatum
 NutchDocument MetadataIndexer.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.more
 

Methods in org.apache.nutch.indexer.more with parameters of type CrawlDatum
 NutchDocument MoreIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.solr
 

Methods in org.apache.nutch.indexer.solr with parameters of type CrawlDatum
 void SolrClean.DBFilter.map(Text key, CrawlDatum value, OutputCollector<ByteWritable,Text> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.staticfield
 

Methods in org.apache.nutch.indexer.staticfield with parameters of type CrawlDatum
 NutchDocument StaticFieldIndexer.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.subcollection
 

Methods in org.apache.nutch.indexer.subcollection with parameters of type CrawlDatum
 NutchDocument SubcollectionIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.tld
 

Methods in org.apache.nutch.indexer.tld with parameters of type CrawlDatum
 NutchDocument TLDIndexingFilter.filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.indexer.urlmeta
 

Methods in org.apache.nutch.indexer.urlmeta with parameters of type CrawlDatum
 NutchDocument URLMetaIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
 

Uses of CrawlDatum in org.apache.nutch.microformats.reltag
 

Methods in org.apache.nutch.microformats.reltag with parameters of type CrawlDatum
 NutchDocument RelTagIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol
 

Methods in org.apache.nutch.protocol with parameters of type CrawlDatum
 ProtocolOutput Protocol.getProtocolOutput(Text url, CrawlDatum datum)
          Returns the Content for a fetchlist entry.
 RobotRules Protocol.getRobotRules(Text url, CrawlDatum datum)
          Retrieve robot rules applicable for this url.
 

Uses of CrawlDatum in org.apache.nutch.protocol.file
 

Methods in org.apache.nutch.protocol.file with parameters of type CrawlDatum
 ProtocolOutput File.getProtocolOutput(Text url, CrawlDatum datum)
           
 RobotRules File.getRobotRules(Text url, CrawlDatum datum)
           
 

Constructors in org.apache.nutch.protocol.file with parameters of type CrawlDatum
FileResponse(URL url, CrawlDatum datum, File file, Configuration conf)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.ftp
 

Methods in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum
 ProtocolOutput Ftp.getProtocolOutput(Text url, CrawlDatum datum)
           
 RobotRules Ftp.getRobotRules(Text url, CrawlDatum datum)
           
 

Constructors in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum
FtpResponse(URL url, CrawlDatum datum, Ftp ftp, Configuration conf)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.http
 

Methods in org.apache.nutch.protocol.http with parameters of type CrawlDatum
protected  Response Http.getResponse(URL url, CrawlDatum datum, boolean redirect)
           
 

Constructors in org.apache.nutch.protocol.http with parameters of type CrawlDatum
HttpResponse(HttpBase http, URL url, CrawlDatum datum)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.http.api
 

Methods in org.apache.nutch.protocol.http.api with parameters of type CrawlDatum
 ProtocolOutput HttpBase.getProtocolOutput(Text url, CrawlDatum datum)
           
protected abstract  Response HttpBase.getResponse(URL url, CrawlDatum datum, boolean followRedirects)
           
 RobotRules HttpBase.getRobotRules(Text url, CrawlDatum datum)
           
 

Uses of CrawlDatum in org.apache.nutch.protocol.httpclient
 

Methods in org.apache.nutch.protocol.httpclient with parameters of type CrawlDatum
protected  Response Http.getResponse(URL url, CrawlDatum datum, boolean redirect)
          Fetches the url with a configured HTTP client and gets the response.
 

Uses of CrawlDatum in org.apache.nutch.scoring
 

Methods in org.apache.nutch.scoring that return CrawlDatum
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 

Methods in org.apache.nutch.scoring with parameters of type CrawlDatum
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 float ScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
 float ScoringFilters.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          Calculate a sort value for Generate.
 float ScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          This method calculates a Lucene document boost.
 float ScoringFilters.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void ScoringFilter.initialScore(Text url, CrawlDatum datum)
          Set an initial score for newly discovered pages.
 void ScoringFilters.initialScore(Text url, CrawlDatum datum)
          Calculate a new initial score, used when adding newly discovered pages.
 void ScoringFilter.injectedScore(Text url, CrawlDatum datum)
          Set an initial score for newly injected pages.
 void ScoringFilters.injectedScore(Text url, CrawlDatum datum)
          Calculate a new initial score, used when injecting new pages.
 void ScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
          This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it into Content metadata.
 void ScoringFilters.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
           
 void ScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.
 void ScoringFilters.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          Calculate updated page score during CrawlDb.update().
 

Method parameters in org.apache.nutch.scoring with type arguments of type CrawlDatum
 CrawlDatum ScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 CrawlDatum ScoringFilters.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 void ScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.
 void ScoringFilters.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
          Calculate updated page score during CrawlDb.update().
 

Uses of CrawlDatum in org.apache.nutch.scoring.link
 

Methods in org.apache.nutch.scoring.link that return CrawlDatum
 CrawlDatum LinkAnalysisScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 

Methods in org.apache.nutch.scoring.link with parameters of type CrawlDatum
 CrawlDatum LinkAnalysisScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 float LinkAnalysisScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
           
 float LinkAnalysisScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void LinkAnalysisScoringFilter.initialScore(Text url, CrawlDatum datum)
           
 void LinkAnalysisScoringFilter.injectedScore(Text url, CrawlDatum datum)
           
 void LinkAnalysisScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
           
 void LinkAnalysisScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
           
 

Method parameters in org.apache.nutch.scoring.link with type arguments of type CrawlDatum
 CrawlDatum LinkAnalysisScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 void LinkAnalysisScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
           
 

Uses of CrawlDatum in org.apache.nutch.scoring.opic
 

Methods in org.apache.nutch.scoring.opic that return CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 

Methods in org.apache.nutch.scoring.opic with parameters of type CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 float OPICScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          Use getScore().
 float OPICScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Dampen the boost value by scorePower.
 void OPICScoringFilter.initialScore(Text url, CrawlDatum datum)
          Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.
 void OPICScoringFilter.injectedScore(Text url, CrawlDatum datum)
           
 void OPICScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
          Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY.
 void OPICScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List inlinked)
          Increase the score by a sum of inlinked scores.
 

Method parameters in org.apache.nutch.scoring.opic with type arguments of type CrawlDatum
 CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
 

Uses of CrawlDatum in org.apache.nutch.scoring.tld
 

Methods in org.apache.nutch.scoring.tld that return CrawlDatum
 CrawlDatum TLDScoringFilter.distributeScoreToOutlink(Text fromUrl, Text toUrl, ParseData parseData, CrawlDatum target, CrawlDatum adjust, int allCount, int validCount)
           
 CrawlDatum TLDScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 

Methods in org.apache.nutch.scoring.tld with parameters of type CrawlDatum
 CrawlDatum TLDScoringFilter.distributeScoreToOutlink(Text fromUrl, Text toUrl, ParseData parseData, CrawlDatum target, CrawlDatum adjust, int allCount, int validCount)
           
 CrawlDatum TLDScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 float TLDScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
           
 float TLDScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void TLDScoringFilter.initialScore(Text url, CrawlDatum datum)
           
 void TLDScoringFilter.injectedScore(Text url, CrawlDatum datum)
           
 void TLDScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
           
 void TLDScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
           
 

Method parameters in org.apache.nutch.scoring.tld with type arguments of type CrawlDatum
 CrawlDatum TLDScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
           
 void TLDScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinked)
           
 

Uses of CrawlDatum in org.apache.nutch.scoring.urlmeta
 

Methods in org.apache.nutch.scoring.urlmeta that return CrawlDatum
 CrawlDatum URLMetaScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.
 

Methods in org.apache.nutch.scoring.urlmeta with parameters of type CrawlDatum
 CrawlDatum URLMetaScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.
 float URLMetaScoringFilter.generatorSortValue(Text url, CrawlDatum datum, float initSort)
          Boilerplate
 float URLMetaScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Boilerplate
 void URLMetaScoringFilter.initialScore(Text url, CrawlDatum datum)
          Boilerplate
 void URLMetaScoringFilter.injectedScore(Text url, CrawlDatum datum)
          Boilerplate
 void URLMetaScoringFilter.passScoreBeforeParsing(Text url, CrawlDatum datum, Content content)
          Takes the metadata, specified in your "urlmeta.tags" property, from the datum object and injects it into the content.
 void URLMetaScoringFilter.updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List inlinked)
          Boilerplate
 

Method parameters in org.apache.nutch.scoring.urlmeta with type arguments of type CrawlDatum
 CrawlDatum URLMetaScoringFilter.distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
          This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.
 

Uses of CrawlDatum in org.apache.nutch.scoring.webgraph
 

Method parameters in org.apache.nutch.scoring.webgraph with type arguments of type CrawlDatum
 void ScoreUpdater.reduce(Text key, Iterator<ObjectWritable> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
          Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.
 

Uses of CrawlDatum in org.apache.nutch.segment
 

Methods in org.apache.nutch.segment with parameters of type CrawlDatum
 boolean SegmentMergeFilters.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well.
 boolean SegmentMergeFilter.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          The filtering method which gets all information being merged for a given key (URL).
 

Method parameters in org.apache.nutch.segment with type arguments of type CrawlDatum
 boolean SegmentMergeFilters.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well.
 boolean SegmentMergeFilter.filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          The filtering method which gets all information being merged for a given key (URL).
 

Uses of CrawlDatum in org.apache.nutch.tools
 

Methods in org.apache.nutch.tools with parameters of type CrawlDatum
 void CrawlDBScanner.map(Text url, CrawlDatum crawlDatum, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Method parameters in org.apache.nutch.tools with type arguments of type CrawlDatum
 void CrawlDBScanner.map(Text url, CrawlDatum crawlDatum, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDBScanner.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void CrawlDBScanner.reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void FreeGenerator.FG.reduce(Text key, Iterator<Generator.SelectorEntry> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 

Uses of CrawlDatum in org.creativecommons.nutch
 

Methods in org.creativecommons.nutch with parameters of type CrawlDatum
 NutchDocument CCIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 



Copyright © 2012 The Apache Software Foundation