Uses of Class
org.apache.nutch.metadata.Metadata
-
Packages that use Metadata Package Description org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.metadata A Multi-valued Metadata container, and set of constant fields for Nutch Metadata.org.apache.nutch.net.protocols Helper classes related to theProtocol
interface, see alsoorg.apache.nutch.protocol
.org.apache.nutch.parse TheParse
interface and related classes.org.apache.nutch.protocol Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium.org.apache.nutch.scoring.webgraph org.apache.nutch.segment A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.org.apache.nutch.tools Miscellaneous tools.org.apache.nutch.tools.warc Tools to import / export between Nutch segments and WARC archives.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
-
Uses of Metadata in org.apache.nutch.indexer
Methods in org.apache.nutch.indexer that return Metadata Modifier and Type Method Description Metadata
NutchDocument. getDocumentMeta()
-
Uses of Metadata in org.apache.nutch.metadata
Subclasses of Metadata in org.apache.nutch.metadata Modifier and Type Class Description class
CaseInsensitiveMetadata
A decorator to Metadata that adds for case-insensitive lookup of keys.class
SpellCheckedMetadata
A decorator to Metadata that adds spellchecking capabilities to property names.Methods in org.apache.nutch.metadata that return Metadata Modifier and Type Method Description Metadata
MetaWrapper. getMetadata()
Get all metadata.Methods in org.apache.nutch.metadata with parameters of type Metadata Modifier and Type Method Description void
Metadata. addAll(Metadata metadata)
Add all name/value mappings (merge two metadata mappings).Constructors in org.apache.nutch.metadata with parameters of type Metadata Constructor Description MetaWrapper(Metadata metadata, Writable instance, Configuration conf)
-
Uses of Metadata in org.apache.nutch.net.protocols
Methods in org.apache.nutch.net.protocols that return Metadata Modifier and Type Method Description Metadata
Response. getHeaders()
Get all the headers. -
Uses of Metadata in org.apache.nutch.parse
Methods in org.apache.nutch.parse that return Metadata Modifier and Type Method Description Metadata
ParseData. getContentMeta()
The originalMetadata
retrieved from contentMetadata
HTMLMetaTags. getGeneralTags()
Metadata
ParseData. getParseMeta()
Other content properties.Methods in org.apache.nutch.parse with parameters of type Metadata Modifier and Type Method Description void
ParseData. setParseMeta(Metadata parseMeta)
Constructors in org.apache.nutch.parse with parameters of type Metadata Constructor Description ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta)
ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta)
-
Uses of Metadata in org.apache.nutch.protocol
Methods in org.apache.nutch.protocol that return Metadata Modifier and Type Method Description Metadata
Content. getMetadata()
Other protocol-specific data.Methods in org.apache.nutch.protocol with parameters of type Metadata Modifier and Type Method Description void
Content. setMetadata(Metadata metadata)
Other protocol-specific data.Constructors in org.apache.nutch.protocol with parameters of type Metadata Constructor Description Content(String url, String base, byte[] content, String contentType, Metadata metadata, Configuration conf)
Content(String url, String base, byte[] content, String contentType, Metadata metadata, MimeUtil mimeTypes)
-
Uses of Metadata in org.apache.nutch.protocol.htmlunit
Methods in org.apache.nutch.protocol.htmlunit that return Metadata Modifier and Type Method Description Metadata
HttpResponse. getHeaders()
-
Uses of Metadata in org.apache.nutch.protocol.http
Methods in org.apache.nutch.protocol.http that return Metadata Modifier and Type Method Description Metadata
HttpResponse. getHeaders()
-
Uses of Metadata in org.apache.nutch.protocol.httpclient
Methods in org.apache.nutch.protocol.httpclient that return Metadata Modifier and Type Method Description Metadata
HttpResponse. getHeaders()
Methods in org.apache.nutch.protocol.httpclient with parameters of type Metadata Modifier and Type Method Description HttpAuthentication
HttpAuthenticationFactory. findAuthentication(Metadata header)
-
Uses of Metadata in org.apache.nutch.protocol.interactiveselenium
Methods in org.apache.nutch.protocol.interactiveselenium that return Metadata Modifier and Type Method Description Metadata
HttpResponse. getHeaders()
-
Uses of Metadata in org.apache.nutch.protocol.okhttp
Methods in org.apache.nutch.protocol.okhttp that return Metadata Modifier and Type Method Description Metadata
OkHttpResponse. getHeaders()
-
Uses of Metadata in org.apache.nutch.protocol.selenium
Methods in org.apache.nutch.protocol.selenium that return Metadata Modifier and Type Method Description Metadata
HttpResponse. getHeaders()
-
Uses of Metadata in org.apache.nutch.scoring.webgraph
Methods in org.apache.nutch.scoring.webgraph that return Metadata Modifier and Type Method Description Metadata
Node. getMetadata()
Methods in org.apache.nutch.scoring.webgraph with parameters of type Metadata Modifier and Type Method Description void
Node. setMetadata(Metadata metadata)
-
Uses of Metadata in org.apache.nutch.segment
Methods in org.apache.nutch.segment with parameters of type Metadata Modifier and Type Method Description static Charset
SegmentReader. getCharset(Metadata parseMeta)
Try to get HTML encoding from parse metadata. -
Uses of Metadata in org.apache.nutch.tools
Fields in org.apache.nutch.tools declared as Metadata Modifier and Type Field Description protected Metadata
AbstractCommonCrawlFormat. metadata
Methods in org.apache.nutch.tools with parameters of type Metadata Modifier and Type Method Description static CommonCrawlFormat
CommonCrawlFormatFactory. getCommonCrawlFormat(String formatType, String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
Deprecated.String
AbstractCommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata)
String
AbstractCommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
String
CommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata)
Returns a string representation of the JSON structure of the URL content.String
CommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
Returns a string representation of the JSON structure of the URL content.String
CommonCrawlFormatWARC. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)
Constructors in org.apache.nutch.tools with parameters of type Metadata Constructor Description AbstractCommonCrawlFormat(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
CommonCrawlFormatJackson(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
CommonCrawlFormatJettinson(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
CommonCrawlFormatSimple(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config)
CommonCrawlFormatWARC(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config, ParseData parseData)
-
Uses of Metadata in org.apache.nutch.tools.warc
Methods in org.apache.nutch.tools.warc with parameters of type Metadata Modifier and Type Method Description protected com.google.gson.JsonObject
WARCExporter.WARCMapReduce.WARCReducer. metadataToJson(Metadata meta)
Adds keys/values of a Nuta metadata container to a JsonObject. -
Uses of Metadata in org.creativecommons.nutch
Methods in org.creativecommons.nutch with parameters of type Metadata Modifier and Type Method Description static void
CCParseFilter.Walker. walk(Node doc, URL base, Metadata metadata, Configuration conf)
Scan the document adding attributes to metadata.
-