Extension of @see AdaptiveFetchSchedule that allows for more flexible
configuration of DEC and INC factors for various MIME-types.
This class can be typically used in cases where a recrawl consists of many
different MIME-types. It's not very common for MIME-types other than
text/html to change frequently. Using this class you can configure different
factors per MIME-type so to prefer frequently changing MIME-types over
For it to work this class relies on the Content-Type MetaData key being
present in the CrawlDB. This can either be done when injecting new URL's or
by adding "Content-Type" to the db.parsemeta.to.crawldb configuration setting
to force MIME-types of newly discovered URL's to be added to the CrawlDB.
Sets the fetchInterval and fetchTime on a
successfully fetched page. NOTE: this implementation resets the retry
counter - extending classes should call super.setFetchSchedule() to
preserve this behavior.
datum - page description to be adjusted. NOTE: this instance, passed by
reference, may be modified inside the method.
prevFetchTime - previous value of fetch time, or 0 if not available.
prevModifiedTime - previous value of modifiedTime, or 0 if not available.
fetchTime - the latest time, when the page was recently re-fetched. Most
FetchSchedule implementations should update the value in @see
CrawlDatum to something greater than this value.
modifiedTime - last time the content was modified. This information comes from
the protocol implementations, or is set to < 0 if not available.
Most FetchSchedule implementations should update the value in @see
CrawlDatum to this value.
adjusted page information, including all original information.
NOTE: this may be a different instance than @see CrawlDatum, but
implementations should make sure that it contains at least all
information from @see CrawlDatum}.