Interface Protocol

    • Field Detail

      • X_POINT_ID

        static final String X_POINT_ID
        The name of the extension point.
    • Method Detail

      • getRobotRules

        crawlercommons.robots.BaseRobotRules getRobotRules​(Text url,
                                                           CrawlDatum datum,
                                                           List<Content> robotsTxtContent)
        Retrieve robot rules applicable for this URL.
        url - URL to check
        datum - page datum
        robotsTxtContent - container to store responses when fetching the robots.txt file for debugging or archival purposes. Instead of a robots.txt file, it may include redirects or an error page (404, etc.). Response Content is appended to the passed list. If null is passed nothing is stored.
        robot rules (specific for this URL or default), never null