Learning Web Page Scores by Error Back-Propagation
Title:Learning Web Page Scores by Error Back-Propagation
Content Preview:
-
In this paper we propose an algorithm able to learn a generic distribution of values over the nodes of a graph with symbolic labels. In many applications, the input data is naturally represented by a labeled graph. However, the actual lack of adequate algorithms to directly process this structured data commonly forces to convert the input patterns to vectors of real numbers, loosing significant information. In particular, “Web like” data structures are common in many pattern recognition tasks and possible applications range from object recognition in segmented images to molecular analysis in bioinformatics, and so on.
-
Like in the PageRank algorithm [Page et al., 1998], the score of a node is assumed to correspond to the probability that a random walker will visit that node at any given time. The score distribution computed by the random walker depends both on the connectivity graph and the labels assigned to each node. Markov Chain theory allows to define a set of constrains on the parameters of the random walker in order to guarantee some desired features like convergence and independence of the final distribution on the initial state. A learning algorithm which is based on error back-propagation through the graph is used to optimize the parameters of the random walker minimizing the distance of the scores assigned by the walker from the target values provided by a supervisor for some nodes in the graph.
-
Previous work in the same field achieved very limited results. For example, in [Kondor and Lafferty, 2002] the authors present an algorithm that processes only non labeled graphs, while the technique presented in [Chang et al., 2000] is very application specific and features very limited generalization capabilities. In [Tsoi et al., 2003] a method to modify PageRank scores computed over a collection of pages on the basis of a supervisor’s feedback is proposed. This algorithm has limited approximation capabilities, since it controls only a term of the PageRank equation and it does not optimize how the scores flow though the connectivity graph.
………..[ ]
File Type: PDF
URL Download: Learning Web Page Scores by Error Back-Propagation