Interface Protocol

    • Field Detail

      • X_POINT_ID

        static final String X_POINT_ID
        The name of the extension point.
    • Method Detail

      • getRobotRules

        crawlercommons.robots.BaseRobotRules getRobotRules​(Text url,
                                                           CrawlDatum datum,
                                                           List<Content> robotsTxtContent)
        Retrieve robot rules applicable for this URL.
        Parameters:
        url - URL to check
        datum - page datum
        robotsTxtContent - container to store responses when fetching the robots.txt file for debugging or archival purposes. Instead of a robots.txt file, it may include redirects or an error page (404, etc.). Response Content is appended to the passed list. If null is passed nothing is stored.
        Returns:
        robot rules (specific for this URL or default), never null