Class NutchMetrics


  • public final class NutchMetrics
    extends Object
    Centralized constants for Hadoop metrics counter groups and names.

    Follows Prometheus naming conventions:

    • Counter groups use the nutch_ prefix namespace
    • Counter names use snake_case
    • Accumulating counters use _total suffix
    • Units are included in counter names where applicable (e.g., _bytes)
    Since:
    1.22
    • Field Detail

      • GROUP_FETCHER_OUTLINKS

        public static final String GROUP_FETCHER_OUTLINKS
        Counter group for fetcher outlink processing.
        See Also:
        Constant Field Values
      • GROUP_CRAWLDB_FILTER

        public static final String GROUP_CRAWLDB_FILTER
        Counter group for CrawlDb filter operations.
        See Also:
        Constant Field Values
      • GROUP_SITEMAP

        public static final String GROUP_SITEMAP
        Counter group for sitemap processing operations.
        See Also:
        Constant Field Values
      • GROUP_WARC_EXPORTER

        public static final String GROUP_WARC_EXPORTER
        Counter group for WARC export operations.
        See Also:
        Constant Field Values
      • GROUP_DOMAIN_STATS

        public static final String GROUP_DOMAIN_STATS
        Counter group for domain statistics operations.
        See Also:
        Constant Field Values
      • FETCHER_BYTES_DOWNLOADED_TOTAL

        public static final String FETCHER_BYTES_DOWNLOADED_TOTAL
        Total bytes downloaded by fetcher.
        See Also:
        Constant Field Values
      • FETCHER_ROBOTS_DENIED_TOTAL

        public static final String FETCHER_ROBOTS_DENIED_TOTAL
        URLs denied by robots.txt.
        See Also:
        Constant Field Values
      • FETCHER_ROBOTS_DENIED_MAXCRAWLDELAY_TOTAL

        public static final String FETCHER_ROBOTS_DENIED_MAXCRAWLDELAY_TOTAL
        URLs denied due to crawl delay exceeding maximum.
        See Also:
        Constant Field Values
      • FETCHER_ROBOTS_DEFER_VISITS_DROPPED_TOTAL

        public static final String FETCHER_ROBOTS_DEFER_VISITS_DROPPED_TOTAL
        URLs dropped due to robots.txt deferred visits.
        See Also:
        Constant Field Values
      • FETCHER_REDIRECT_COUNT_EXCEEDED_TOTAL

        public static final String FETCHER_REDIRECT_COUNT_EXCEEDED_TOTAL
        Redirects that exceeded maximum redirect count.
        See Also:
        Constant Field Values
      • FETCHER_REDIRECT_DEDUPLICATED_TOTAL

        public static final String FETCHER_REDIRECT_DEDUPLICATED_TOTAL
        Redirects deduplicated (already seen).
        See Also:
        Constant Field Values
      • FETCHER_REDIRECT_NOT_CREATED_TOTAL

        public static final String FETCHER_REDIRECT_NOT_CREATED_TOTAL
        FetchItems not created for redirects.
        See Also:
        Constant Field Values
      • FETCHER_HIT_BY_TIMELIMIT_TOTAL

        public static final String FETCHER_HIT_BY_TIMELIMIT_TOTAL
        URLs hit by time limit.
        See Also:
        Constant Field Values
      • FETCHER_HIT_BY_TIMEOUT_TOTAL

        public static final String FETCHER_HIT_BY_TIMEOUT_TOTAL
        URLs hit by timeout.
        See Also:
        Constant Field Values
      • FETCHER_HIT_BY_THROUGHPUT_THRESHOLD_TOTAL

        public static final String FETCHER_HIT_BY_THROUGHPUT_THRESHOLD_TOTAL
        URLs hit by throughput threshold.
        See Also:
        Constant Field Values
      • FETCHER_HUNG_THREADS_TOTAL

        public static final String FETCHER_HUNG_THREADS_TOTAL
        Threads that hung during fetching.
        See Also:
        Constant Field Values
      • FETCHER_FILTERED_TOTAL

        public static final String FETCHER_FILTERED_TOTAL
        URLs filtered during fetching.
        See Also:
        Constant Field Values
      • FETCHER_ABOVE_EXCEPTION_THRESHOLD_TOTAL

        public static final String FETCHER_ABOVE_EXCEPTION_THRESHOLD_TOTAL
        URLs dropped due to exception threshold in queue.
        See Also:
        Constant Field Values
      • FETCHER_OUTLINKS_DETECTED_TOTAL

        public static final String FETCHER_OUTLINKS_DETECTED_TOTAL
        Outlinks detected during parsing.
        See Also:
        Constant Field Values
      • FETCHER_OUTLINKS_FOLLOWING_TOTAL

        public static final String FETCHER_OUTLINKS_FOLLOWING_TOTAL
        Outlinks being followed.
        See Also:
        Constant Field Values
      • GENERATOR_URL_FILTERS_REJECTED_TOTAL

        public static final String GENERATOR_URL_FILTERS_REJECTED_TOTAL
        URLs rejected by URL filters.
        See Also:
        Constant Field Values
      • GENERATOR_SCHEDULE_REJECTED_TOTAL

        public static final String GENERATOR_SCHEDULE_REJECTED_TOTAL
        URLs rejected by fetch schedule.
        See Also:
        Constant Field Values
      • GENERATOR_WAIT_FOR_UPDATE_TOTAL

        public static final String GENERATOR_WAIT_FOR_UPDATE_TOTAL
        URLs waiting for CrawlDb update.
        See Also:
        Constant Field Values
      • GENERATOR_EXPR_REJECTED_TOTAL

        public static final String GENERATOR_EXPR_REJECTED_TOTAL
        URLs rejected by JEXL expression.
        See Also:
        Constant Field Values
      • GENERATOR_STATUS_REJECTED_TOTAL

        public static final String GENERATOR_STATUS_REJECTED_TOTAL
        URLs rejected due to status restriction.
        See Also:
        Constant Field Values
      • GENERATOR_SCORE_TOO_LOW_TOTAL

        public static final String GENERATOR_SCORE_TOO_LOW_TOTAL
        URLs rejected due to score below threshold.
        See Also:
        Constant Field Values
      • GENERATOR_INTERVAL_REJECTED_TOTAL

        public static final String GENERATOR_INTERVAL_REJECTED_TOTAL
        URLs rejected due to fetch interval exceeding threshold.
        See Also:
        Constant Field Values
      • GENERATOR_URLS_SKIPPED_PER_HOST_OVERFLOW_TOTAL

        public static final String GENERATOR_URLS_SKIPPED_PER_HOST_OVERFLOW_TOTAL
        URLs skipped due to per-host overflow.
        See Also:
        Constant Field Values
      • GENERATOR_HOSTS_AFFECTED_PER_HOST_OVERFLOW_TOTAL

        public static final String GENERATOR_HOSTS_AFFECTED_PER_HOST_OVERFLOW_TOTAL
        Hosts affected by per-host overflow.
        See Also:
        Constant Field Values
      • INDEXER_DELETED_ROBOTS_NOINDEX_TOTAL

        public static final String INDEXER_DELETED_ROBOTS_NOINDEX_TOTAL
        Documents deleted due to robots noindex.
        See Also:
        Constant Field Values
      • INDEXER_DELETED_GONE_TOTAL

        public static final String INDEXER_DELETED_GONE_TOTAL
        Documents deleted because they are gone.
        See Also:
        Constant Field Values
      • INDEXER_DELETED_REDIRECTS_TOTAL

        public static final String INDEXER_DELETED_REDIRECTS_TOTAL
        Documents deleted due to redirects.
        See Also:
        Constant Field Values
      • INDEXER_DELETED_DUPLICATES_TOTAL

        public static final String INDEXER_DELETED_DUPLICATES_TOTAL
        Documents deleted as duplicates.
        See Also:
        Constant Field Values
      • INDEXER_DELETED_BY_INDEXING_FILTER_TOTAL

        public static final String INDEXER_DELETED_BY_INDEXING_FILTER_TOTAL
        Documents deleted by indexing filter.
        See Also:
        Constant Field Values
      • INDEXER_SKIPPED_NOT_MODIFIED_TOTAL

        public static final String INDEXER_SKIPPED_NOT_MODIFIED_TOTAL
        Documents skipped (not modified).
        See Also:
        Constant Field Values
      • INDEXER_SKIPPED_BY_INDEXING_FILTER_TOTAL

        public static final String INDEXER_SKIPPED_BY_INDEXING_FILTER_TOTAL
        Documents skipped by indexing filter.
        See Also:
        Constant Field Values
      • INDEXER_INDEXED_TOTAL

        public static final String INDEXER_INDEXED_TOTAL
        Documents indexed (added or updated).
        See Also:
        Constant Field Values
      • CRAWLDB_URLS_FILTERED_TOTAL

        public static final String CRAWLDB_URLS_FILTERED_TOTAL
        URLs filtered during CrawlDb operations.
        See Also:
        Constant Field Values
      • CRAWLDB_GONE_RECORDS_REMOVED_TOTAL

        public static final String CRAWLDB_GONE_RECORDS_REMOVED_TOTAL
        Gone (404) records removed during CrawlDb operations.
        See Also:
        Constant Field Values
      • CRAWLDB_ORPHAN_RECORDS_REMOVED_TOTAL

        public static final String CRAWLDB_ORPHAN_RECORDS_REMOVED_TOTAL
        Orphan records removed during CrawlDb operations.
        See Also:
        Constant Field Values
      • INJECTOR_URLS_FILTERED_TOTAL

        public static final String INJECTOR_URLS_FILTERED_TOTAL
        URLs filtered during injection.
        See Also:
        Constant Field Values
      • INJECTOR_URLS_INJECTED_UNIQUE_TOTAL

        public static final String INJECTOR_URLS_INJECTED_UNIQUE_TOTAL
        Unique URLs injected.
        See Also:
        Constant Field Values
      • INJECTOR_URLS_MERGED_TOTAL

        public static final String INJECTOR_URLS_MERGED_TOTAL
        URLs merged with existing CrawlDb entries.
        See Also:
        Constant Field Values
      • INJECTOR_URLS_PURGED_404_TOTAL

        public static final String INJECTOR_URLS_PURGED_404_TOTAL
        URLs purged due to 404 status.
        See Also:
        Constant Field Values
      • INJECTOR_URLS_PURGED_FILTER_TOTAL

        public static final String INJECTOR_URLS_PURGED_FILTER_TOTAL
        URLs purged by filter.
        See Also:
        Constant Field Values
      • HOSTDB_FILTERED_RECORDS_TOTAL

        public static final String HOSTDB_FILTERED_RECORDS_TOTAL
        Records filtered in HostDb.
        See Also:
        Constant Field Values
      • HOSTDB_SKIPPED_NOT_ELIGIBLE_TOTAL

        public static final String HOSTDB_SKIPPED_NOT_ELIGIBLE_TOTAL
        Hosts skipped (not eligible).
        See Also:
        Constant Field Values
      • HOSTDB_URL_LIMIT_NOT_REACHED_TOTAL

        public static final String HOSTDB_URL_LIMIT_NOT_REACHED_TOTAL
        Hosts where URL limit was not reached.
        See Also:
        Constant Field Values
      • HOSTDB_NEW_KNOWN_HOST_TOTAL

        public static final String HOSTDB_NEW_KNOWN_HOST_TOTAL
        New known hosts discovered.
        See Also:
        Constant Field Values
      • HOSTDB_REDISCOVERED_HOST_TOTAL

        public static final String HOSTDB_REDISCOVERED_HOST_TOTAL
        Rediscovered hosts.
        See Also:
        Constant Field Values
      • HOSTDB_EXISTING_KNOWN_HOST_TOTAL

        public static final String HOSTDB_EXISTING_KNOWN_HOST_TOTAL
        Existing known hosts.
        See Also:
        Constant Field Values
      • HOSTDB_NEW_UNKNOWN_HOST_TOTAL

        public static final String HOSTDB_NEW_UNKNOWN_HOST_TOTAL
        New unknown hosts.
        See Also:
        Constant Field Values
      • HOSTDB_EXISTING_UNKNOWN_HOST_TOTAL

        public static final String HOSTDB_EXISTING_UNKNOWN_HOST_TOTAL
        Existing unknown hosts.
        See Also:
        Constant Field Values
      • HOSTDB_PURGED_UNKNOWN_HOST_TOTAL

        public static final String HOSTDB_PURGED_UNKNOWN_HOST_TOTAL
        Purged unknown hosts.
        See Also:
        Constant Field Values
      • DEDUP_DOCUMENTS_MARKED_DUPLICATE_TOTAL

        public static final String DEDUP_DOCUMENTS_MARKED_DUPLICATE_TOTAL
        Documents marked as duplicate.
        See Also:
        Constant Field Values
      • CLEANING_DELETED_DOCUMENTS_TOTAL

        public static final String CLEANING_DELETED_DOCUMENTS_TOTAL
        Documents deleted during cleaning.
        See Also:
        Constant Field Values
      • WEBGRAPH_ADDED_LINKS_TOTAL

        public static final String WEBGRAPH_ADDED_LINKS_TOTAL
        Links added to WebGraph.
        See Also:
        Constant Field Values
      • WEBGRAPH_REMOVED_LINKS_TOTAL

        public static final String WEBGRAPH_REMOVED_LINKS_TOTAL
        Links removed from WebGraph.
        See Also:
        Constant Field Values
      • SITEMAP_FILTERED_RECORDS_TOTAL

        public static final String SITEMAP_FILTERED_RECORDS_TOTAL
        Filtered records in sitemap processing.
        See Also:
        Constant Field Values
      • SITEMAP_FROM_HOSTNAME_TOTAL

        public static final String SITEMAP_FROM_HOSTNAME_TOTAL
        Sitemaps discovered from hostname.
        See Also:
        Constant Field Values
      • SITEMAP_FILTERED_FROM_HOSTNAME_TOTAL

        public static final String SITEMAP_FILTERED_FROM_HOSTNAME_TOTAL
        Sitemaps filtered from hostname.
        See Also:
        Constant Field Values
      • SITEMAP_FAILED_FETCHES_TOTAL

        public static final String SITEMAP_FAILED_FETCHES_TOTAL
        Failed sitemap fetches.
        See Also:
        Constant Field Values
      • SITEMAP_EXISTING_ENTRIES_TOTAL

        public static final String SITEMAP_EXISTING_ENTRIES_TOTAL
        Existing sitemap entries.
        See Also:
        Constant Field Values
      • WARC_MISSING_CONTENT_TOTAL

        public static final String WARC_MISSING_CONTENT_TOTAL
        Missing content in WARC export.
        See Also:
        Constant Field Values
      • WARC_MISSING_METADATA_TOTAL

        public static final String WARC_MISSING_METADATA_TOTAL
        Missing metadata in WARC export.
        See Also:
        Constant Field Values
      • WARC_OMITTED_EMPTY_RESPONSE_TOTAL

        public static final String WARC_OMITTED_EMPTY_RESPONSE_TOTAL
        Omitted empty responses in WARC export.
        See Also:
        Constant Field Values
      • WARC_RECORDS_GENERATED_TOTAL

        public static final String WARC_RECORDS_GENERATED_TOTAL
        WARC records generated.
        See Also:
        Constant Field Values
      • DOMAIN_STATS_FETCHED_TOTAL

        public static final String DOMAIN_STATS_FETCHED_TOTAL
        Fetched URLs in domain statistics.
        See Also:
        Constant Field Values
      • DOMAIN_STATS_NOT_FETCHED_TOTAL

        public static final String DOMAIN_STATS_NOT_FETCHED_TOTAL
        Not fetched URLs in domain statistics.
        See Also:
        Constant Field Values
      • DOMAIN_STATS_EMPTY_RESULT_TOTAL

        public static final String DOMAIN_STATS_EMPTY_RESULT_TOTAL
        Empty results in domain statistics.
        See Also:
        Constant Field Values
      • ERROR_TOTAL

        public static final String ERROR_TOTAL
        Total errors across all categories. This is incremented alongside any category-specific error counter.
        See Also:
        Constant Field Values
      • ERROR_NETWORK_TOTAL

        public static final String ERROR_NETWORK_TOTAL
        Network-related errors. Includes: IOException, SocketException, ConnectException, UnknownHostException
        See Also:
        Constant Field Values
      • ERROR_PROTOCOL_TOTAL

        public static final String ERROR_PROTOCOL_TOTAL
        Protocol errors. Includes: ProtocolException, ProtocolNotFound
        See Also:
        Constant Field Values
      • ERROR_PARSING_TOTAL

        public static final String ERROR_PARSING_TOTAL
        Parsing errors. Includes: ParseException, ParserNotFound
        See Also:
        Constant Field Values
      • ERROR_URL_TOTAL

        public static final String ERROR_URL_TOTAL
        URL-related errors. Includes: MalformedURLException, URLFilterException
        See Also:
        Constant Field Values
      • ERROR_SCORING_TOTAL

        public static final String ERROR_SCORING_TOTAL
        Scoring filter errors. Includes: ScoringFilterException
        See Also:
        Constant Field Values
      • ERROR_INDEXING_TOTAL

        public static final String ERROR_INDEXING_TOTAL
        Indexing filter errors. Includes: IndexingException
        See Also:
        Constant Field Values
      • ERROR_TIMEOUT_TOTAL

        public static final String ERROR_TIMEOUT_TOTAL
        Timeout errors. Includes: SocketTimeoutException, connection timeouts
        See Also:
        Constant Field Values
      • ERROR_OTHER_TOTAL

        public static final String ERROR_OTHER_TOTAL
        Other uncategorized errors. Used as fallback for exceptions not matching any specific category.
        See Also:
        Constant Field Values