A class for holding link information including the url, anchor text, a score, the timestamp of the link and a link type.
The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class.
Inverts outlinks from the WebGraph to inlinks and attaches node information.
Bean class which holds url to node information.
Writable class which holds an array of LinkNode objects.
Merges LinkNode objects into a single array value per url.
Reader class which will print out the url and all of its inlinks to system out.
The LoopReader tool prints the loopset information for a single url.
The Loops job identifies cycles of loops inside of the web graph.
Finishes the Loops job by aggregating and collecting and found routes.
Initializes the Loop routes.
Follows a route path looking for the start url of the route.
A set of loops.
A link path or route looking to identify a link cycle.
A class which holds the number of inlinks and outlinks for a given url along with an inlink score from a link analysis program and any metadata.
A tools that dumps out the top urls by number of inlinks, number of outlinks, or by score, to a text file.
Outputs the hosts or domains with an associated value.
Outputs the top urls sorted in descending order.
Reads and prints to system out information for a single node from the NodeDb in the WebGraph.
Updates the score from the WebGraph node database into the crawl database.
Creates three databases, one for inlinks, one for outlinks, and a node database that holds the number of in and outlinks to a url and the current score for the url.
The OutlinkDb creates a database of all outlinks.
Copyright © 2015 The Apache Software Foundation