LinkDatum |
A class for holding link information including the url, anchor text, a score,
the timestamp of the link and a link type.
|
LinkDumper |
The LinkDumper tool creates a database of node to inlink information that can
be read using the nested Reader class.
|
LinkDumper.Inverter |
Inverts outlinks from the WebGraph to inlinks and attaches node
information.
|
LinkDumper.Inverter.InvertMapper |
Wraps all values in ObjectWritables.
|
LinkDumper.Inverter.InvertReducer |
Inverts outlinks to inlinks while attaching node information to the
outlink.
|
LinkDumper.LinkNode |
Bean class which holds url to node information.
|
LinkDumper.LinkNodes |
Writable class which holds an array of LinkNode objects.
|
LinkDumper.Merger |
Merges LinkNode objects into a single array value per url.
|
LinkDumper.Reader |
Reader class which will print out the url and all of its inlinks to system
out.
|
LinkRank |
|
Node |
A class which holds the number of inlinks and outlinks for a given url along
with an inlink score from a link analysis program and any metadata.
|
NodeDumper |
A tools that dumps out the top urls by number of inlinks, number of outlinks,
or by score, to a text file.
|
NodeDumper.Dumper |
Outputs the hosts or domains with an associated value.
|
NodeDumper.Dumper.DumperMapper |
Outputs the host or domain as key for this record and numInlinks,
numOutlinks or score as the value.
|
NodeDumper.Dumper.DumperReducer |
Outputs either the sum or the top value for this record.
|
NodeDumper.Sorter |
Outputs the top urls sorted in descending order.
|
NodeDumper.Sorter.SorterMapper |
Outputs the url with the appropriate number of inlinks, outlinks, or for
score.
|
NodeDumper.Sorter.SorterReducer |
Flips and collects the url and numeric sort value.
|
NodeReader |
Reads and prints to system out information for a single node from the NodeDb
in the WebGraph.
|
ScoreUpdater |
Updates the score from the WebGraph node database into the crawl database.
|
ScoreUpdater.ScoreUpdaterMapper |
Changes input into ObjectWritables.
|
ScoreUpdater.ScoreUpdaterReducer |
Creates new CrawlDatum objects with the updated score from the NodeDb or
with a cleared score.
|
WebGraph |
Creates three databases, one for inlinks, one for outlinks, and a node
database that holds the number of in and outlinks to a url and the current
score for the url.
|
WebGraph.OutlinkDb |
The OutlinkDb creates a database of all outlinks.
|
WebGraph.OutlinkDb.OutlinkDbMapper |
Passes through existing LinkDatum objects from an existing OutlinkDb and
maps out new LinkDatum objects from new crawls ParseData.
|
WebGraph.OutlinkDb.OutlinkDbReducer |
|