Uses of Interface
org.apache.nutch.protocol.Protocol
-
Packages that use Protocol Package Description org.apache.nutch.protocol Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
.org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http
,httpclient
, etc.)org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium. -
-
Uses of Protocol in org.apache.nutch.protocol
Methods in org.apache.nutch.protocol that return Protocol Modifier and Type Method Description Protocol
ProtocolFactory. getProtocol(String urlString)
Returns the appropriateProtocol
implementation for a url.Protocol
ProtocolFactory. getProtocol(URL url)
Returns the appropriateProtocol
implementation for a url.Protocol
ProtocolFactory. getProtocolById(String id)
Methods in org.apache.nutch.protocol with parameters of type Protocol Modifier and Type Method Description abstract crawlercommons.robots.BaseRobotRules
RobotRulesParser. getRobotRulesSet(Protocol protocol, URL url, List<Content> robotsTxtContent)
Fetch robots.txt (or it's protocol-specific equivalent) which applies to the given URL, parse it and return the set of robot rules applicable for the configured agent name(s).crawlercommons.robots.BaseRobotRules
RobotRulesParser. getRobotRulesSet(Protocol protocol, Text url, List<Content> robotsTxtContent)
Fetch robots.txt (or it's protocol-specific equivalent) which applies to the given URL, parse it and return the set of robot rules applicable for the configured agent name(s). -
Uses of Protocol in org.apache.nutch.protocol.file
Classes in org.apache.nutch.protocol.file that implement Protocol Modifier and Type Class Description class
File
This class is a protocol plugin used for file: scheme. -
Uses of Protocol in org.apache.nutch.protocol.ftp
Classes in org.apache.nutch.protocol.ftp that implement Protocol Modifier and Type Class Description class
Ftp
This class is a protocol plugin used for ftp: scheme.Methods in org.apache.nutch.protocol.ftp with parameters of type Protocol Modifier and Type Method Description crawlercommons.robots.BaseRobotRules
FtpRobotRulesParser. getRobotRulesSet(Protocol ftp, URL url, List<Content> robotsTxtContent)
The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to theURL
passed, gets robots file, parses the rules and caches the rules object to avoid re-work in future. -
Uses of Protocol in org.apache.nutch.protocol.htmlunit
Classes in org.apache.nutch.protocol.htmlunit that implement Protocol Modifier and Type Class Description class
Http
-
Uses of Protocol in org.apache.nutch.protocol.http
Classes in org.apache.nutch.protocol.http that implement Protocol Modifier and Type Class Description class
Http
-
Uses of Protocol in org.apache.nutch.protocol.http.api
Classes in org.apache.nutch.protocol.http.api that implement Protocol Modifier and Type Class Description class
HttpBase
Methods in org.apache.nutch.protocol.http.api with parameters of type Protocol Modifier and Type Method Description crawlercommons.robots.BaseRobotRules
HttpRobotRulesParser. getRobotRulesSet(Protocol http, URL url, List<Content> robotsTxtContent)
Get the rules from robots.txt which applies for the givenurl
. -
Uses of Protocol in org.apache.nutch.protocol.httpclient
Classes in org.apache.nutch.protocol.httpclient that implement Protocol Modifier and Type Class Description class
Http
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. -
Uses of Protocol in org.apache.nutch.protocol.interactiveselenium
Classes in org.apache.nutch.protocol.interactiveselenium that implement Protocol Modifier and Type Class Description class
Http
-
Uses of Protocol in org.apache.nutch.protocol.okhttp
Classes in org.apache.nutch.protocol.okhttp that implement Protocol Modifier and Type Class Description class
OkHttp
-
Uses of Protocol in org.apache.nutch.protocol.selenium
Classes in org.apache.nutch.protocol.selenium that implement Protocol Modifier and Type Class Description class
Http
-