public class AdaptiveFetchSchedule extends AbstractFetchSchedule
delta = fetchTime - modifiedTime
fetchTime + fetchInterval - delta * SYNC_DELTA_RATE
fetchInterval = delta.
NOTE: values of DEC_FACTOR and INC_FACTOR higher than 0.4f may destabilize
the algorithm, so that the fetch interval either increases or decreases
infinitely, with little relevance to the page changes. Please use
main(String) method to test the values before applying them in a
|Modifier and Type||Field and Description|
|Constructor and Description|
|Modifier and Type||Method and Description|
calculateLastFetchTime, forceRefetch, initializeSchedule, setPageGoneSchedule, setPageRetrySchedule, shouldFetch
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public void setConf(Configuration conf)
public CrawlDatum setFetchSchedule(Text url, CrawlDatum datum, long prevFetchTime, long prevModifiedTime, long fetchTime, long modifiedTime, int state)
fetchTimeon a successfully fetched page. NOTE: this implementation resets the retry counter - extending classes should call super.setFetchSchedule() to preserve this behavior.
url- url of the page
datum- page description to be adjusted. NOTE: this instance, passed by reference, may be modified inside the method.
prevFetchTime- previous value of fetch time, or 0 if not available.
prevModifiedTime- previous value of modifiedTime, or 0 if not available.
fetchTime- the latest time, when the page was recently re-fetched. Most FetchSchedule implementations should update the value in @see CrawlDatum to something greater than this value.
modifiedTime- last time the content was modified. This information comes from the protocol implementations, or is set to < 0 if not available. Most FetchSchedule implementations should update the value in @see CrawlDatum to this value.
FetchSchedule.STATUS_MODIFIED, then the content is considered to be "changed" before the
FetchSchedule.STATUS_NOTMODIFIEDthen the content is known to be unchanged. This information may be obtained by comparing page signatures before and after fetching. If this is set to
FetchSchedule.STATUS_UNKNOWN, then it is unknown whether the page was changed; implementations are free to follow a sensible default behavior.
Copyright © 2017 The Apache Software Foundation