Class ErrorTracker


  • public class ErrorTracker
    extends Object
    A utility class for tracking errors by category with automatic classification.

    This class provides thread-safe error counting with automatic categorization based on exception type. It uses a bounded set of error categories to stay within Hadoop's counter limits (~120 counters).

    Usage:

     // In mapper/reducer setup or thread initialization
     errorTracker = new ErrorTracker(NutchMetrics.GROUP_FETCHER);
     
     // When catching exceptions
     try {
         // ... operation ...
     } catch (Exception e) {
         errorTracker.recordError(e);  // Auto-categorizes
     }
     
     // Or with manual categorization
     errorTracker.recordError(ErrorTracker.ErrorType.NETWORK);
     
     // In cleanup - emit all error counters
     errorTracker.emitCounters(context);
     

    Emits the following counters:

    • errors_total - total number of errors across all categories
    • errors_network_total - network-related errors
    • errors_protocol_total - protocol errors
    • errors_parsing_total - parsing errors
    • errors_url_total - URL-related errors
    • errors_scoring_total - scoring filter errors
    • errors_indexing_total - indexing filter errors
    • errors_timeout_total - timeout errors
    • errors_other_total - uncategorized errors
    Since:
    1.22
    • Constructor Detail

      • ErrorTracker

        public ErrorTracker​(String group)
        Creates a new ErrorTracker for the specified counter group.

        This constructor creates an ErrorTracker without cached counters. Call initCounters(TaskInputOutputContext) in setup() to cache counter references for better performance.

        Parameters:
        group - the Hadoop counter group name (e.g., NutchMetrics.GROUP_FETCHER)
      • ErrorTracker

        public ErrorTracker​(String group,
                            TaskInputOutputContext<?,​?,​?,​?> context)
        Creates a new ErrorTracker with cached counter references.

        This constructor caches all counter references at creation time, avoiding repeated counter lookups in hot paths.

        Parameters:
        group - the Hadoop counter group name
        context - the Hadoop task context for caching counters
    • Method Detail

      • initCounters

        public void initCounters​(TaskInputOutputContext<?,​?,​?,​?> context)
        Initializes cached counter references from the Hadoop context.

        Call this method in the mapper/reducer setup() method to cache counter references and avoid repeated lookups during processing.

        Parameters:
        context - the Hadoop task context
      • recordError

        public void recordError​(Throwable t)
        Records an error with automatic categorization based on the throwable type.
        Parameters:
        t - the throwable to categorize and record
      • recordError

        public void recordError​(ErrorTracker.ErrorType type)
        Records an error with explicit category.
        Parameters:
        type - the error type category
      • getCount

        public long getCount​(ErrorTracker.ErrorType type)
        Returns the count for a specific error type.
        Parameters:
        type - the error type
        Returns:
        the count for that error type
      • getTotalCount

        public long getTotalCount()
        Returns the total count of all errors.
        Returns:
        the total error count
      • emitCounters

        public void emitCounters​(TaskInputOutputContext<?,​?,​?,​?> context)
        Emits all error counters to the Hadoop context.

        Should be called once during cleanup to emit aggregated metrics. Only emits counters for error types that have non-zero counts.

        If counters were cached via initCounters(TaskInputOutputContext), uses the cached references for better performance.

        Parameters:
        context - the Hadoop task context
      • incrementCounters

        public void incrementCounters​(Throwable t)
        Directly increments cached error counters without local accumulation.

        Use this method when you want to immediately update Hadoop counters rather than accumulating locally and emitting in cleanup. Requires initCounters(TaskInputOutputContext) to have been called.

        Parameters:
        t - the throwable to categorize and count
        Throws:
        IllegalStateException - if counters have not been initialized
      • incrementCounters

        public void incrementCounters​(ErrorTracker.ErrorType type)
        Directly increments cached error counters without local accumulation.

        Use this method when you want to immediately update Hadoop counters rather than accumulating locally and emitting in cleanup. Requires initCounters(TaskInputOutputContext) to have been called.

        Parameters:
        type - the error type to count
        Throws:
        IllegalStateException - if counters have not been initialized
      • categorize

        public static ErrorTracker.ErrorType categorize​(Throwable t)
        Categorizes a throwable into an error type.

        The categorization checks the exception class hierarchy to determine the most appropriate category. Timeout exceptions are checked first as they are a subclass of IOException.

        Parameters:
        t - the throwable to categorize
        Returns:
        the appropriate ErrorType for the throwable
      • getCounterName

        public static String getCounterName​(ErrorTracker.ErrorType type)
        Gets the counter name constant for a given error type.
        Parameters:
        type - the error type
        Returns:
        the counter name constant from NutchMetrics
      • getCounterName

        public static String getCounterName​(Throwable t)
        Gets the counter name for a throwable based on its categorization.

        This is a convenience method for direct use in catch blocks:

         } catch (Exception e) {
             context.getCounter(group, ErrorTracker.getCounterName(e)).increment(1);
         }
         
        Parameters:
        t - the throwable to get the counter name for
        Returns:
        the counter name constant from NutchMetrics