Ranking
on Data Manifold with Sink Points
Abstract
Ranking is an important
problem in various applications, such as Information Retrieval (IR), natural
language processing, computational biology, and social sciences. Many ranking
approaches have been proposed to rank objects according to their degrees of
relevance or importance. Beyond these two goals, diversity has also been
recognized as a crucial criterion in ranking. Top ranked results are expected
to convey as little redundant information as possible, and cover as many
aspects as possible. However, existing ranking approaches either take no
account of diversity, or handle it separately with some heuristics. In this
paper, we introduce a novel approach, Manifold Ranking with Sink Points
(MRSPs), to address diversity as well as relevance and importance in ranking. Specifically,
our approach uses a manifold ranking process over the data manifold, which can
naturally find the most relevant and important data objects. Meanwhile, by
turning ranked objects into sink points on data manifold, we can effectively
prevent redundant objects from receiving a high rank. MRSP not only shows a
nice convergence property, but also has an interesting and satisfying optimization
explanation. We applied MRSP on two application tasks, update summarization and
query recommendation, where diversity is of great concern in ranking.
Experimental results on both tasks present a strong empirical performance of
MRSP as compared to existing ranking approaches.
A mass of relevant objects may contain highly redundant, even duplicated information, which is undesirable for users. Furthermore, the user’s needs might be multifaceted or ambiguous. The redundance in top ranked results will reduce the chance to satisfy different users. For example, given a query “zeppelin,” if the top ranked search results were all similar articles about the “Zeppelin iPod speaker,” it would be a waste of the output space and largely degrade users’ search experience even though the results are all highly relevant to the query. Obviously, such top ranked results would not satisfy the users who want to know about the rigid airship “Zeppelin” or the rock band “Zeppelin.” Thus, it is important to reduce redundancy in these top search results. Top ranked results are expected to convey as little redundant information as possible, and cover as many aspects as possible. In this way, we are able to minimize the risk that the information need of the user will not be satisfied. Many real application tasks demand diversity in ranking. For example, in query recommendation, the recommended queries should capture different query intents of different users. In text summarization, candidate sentences of a summary are expected to be less redundant and cover different aspects of information delivered by the document. In e-commerce, a list of relevant but distinctive products is useful for users to browse and make a purchase.
Disadvantages
·
However, these methods often treat
relevance and diversity separately in the ranking algorithm, sometimes with
additional heuristic procedures.
·
Therefore, relevance and importance are
well balanced in manifold ranking, similar to Personalized PageRank
Proposed
System
The ranking approaches
have been proposed to rank objects according to their degrees of relevance or
importance. Beyond these two goals, diversity has also been recognized as a
crucial criterion in ranking. The issue of diversity in ranking has been widely
studied recently. Researchers from various domains have proposed many
approaches to address this problem, such as Maximum Marginal Relevance (MMR) subtopic diversity cluster-based centroids
selecting, categorization- based approach, and many other redundancy penalty
approaches. However, these methods often treat relevance and diversity
separately in the ranking algorithm, sometimes with additional heuristic
procedures. Our proposed approach MRSP has not only a nice convergence
property, but also a satisfying optimization explanation. The manifold ranking
algorithm is proposed based on the following two key assumptions: 1) nearby
data are likely to have close ranking scores; and 2) data on the same structure
are likely to have close ranking scores. An intuitive description of the
ranking algorithm is described as follows: a weighted network is constructed
first, where nodes represent all the data and query points, and an edge is put
between two nodes if they are “close.
Advantages
·
Manifold ranking gives high ranks to
nodes that are close to the queries on the manifold and that have strong
centrality.
·
The best of our knowledge, the challenge
of addressing relevance, importance and diversity simultaneously in a unified
way is still far from being well resolved.
·
At each iteration, we use manifold ranking
to find one or more most relevant points.
Modules
·
Data Manifolds
·
Manifold ranking
·
Update Summarization
·
Query Recommendation
·
Qualitative Comparison
·
Parameter Tuning
Module
Description
1. Data Manifolds
Ranking on data
manifolds is proposed In their approach, data objects are assumed to be points
sampled from a low-dimensional manifold embedded in a high-dimensional
euclidean space (ambient space). Hereafter, object and point will not be
discriminated unless otherwise specified. Manifold ranking is then to rank the
data points with respect to the intrinsic global manifold structure given a set
of query points.
2. Manifold ranking
Query nodes are then
initiated with a positive ranking score, while the nodes to be ranked are
assigned with a zero initial score. All the nodes then propagate their ranking
scores to their neighbors via the weighted network. The propagation process is
repeated until a global stable state is achieved, and all the nodes except the
queries are ranked according to their final scores. Manifold ranking gives high
ranks to nodes that are close to the queries on the manifold (which reflects
high relevance) and that have strong centrality (which reflects high
importance). Therefore, relevance and importance are well balanced in manifold
ranking, similar to Personalized . However, diversity is not considered in manifold
ranking.
3. Update Summarization
Update summarization is
a temporal extension of topicfocused multidocument summarization, by focusing
on summarizing up-to-date information contained in the new document set given a
past document set. There are mainly two kinds of approaches for update
summarization, one is abstractive summarization
in which some deep natural language processing techniques are leveraged to
compress sentences or to reorganize phrases to produce a summary of the text.
Another one is extractive summarization . In the extractive approach, update summarization
is reduced to a sentence ranking problem, which composes a summary by
extracting the most representative sentences from target document set.
4. Query Recommendation
Query recommendation
aims to provide alternative queries to help users search and also improve the
usability of search engines. It has been employed as a core utility by many industrial
search engines. Most of the work on query recommendation focuses on measures of
query similarity, where query log data has been widely used in these approaches.
For example, Beeferman and Berger applied agglomerative clustering to the
click-through bipartite graph to identify related queries for recommendation.
Wen et al. proposed to combine both user
click-through data and query content information to determine query similarity.
5. Qualitative Comparison
A qualitative
comparison to gain some intuition on the differences between the summaries generated
by our approach and the baseline methods. We randomly selected one topic from
the 48 topics of TAC2008 data set as an example, which is about “the
investigation of Jack Abramoff and others related to lobbying activities.” We show
the four reference summaries, provided by NIST as the ground truth, and the
summaries generated by our approach and other three baselines.
6. Parameter Tuning
There is only one
parameter in the MRSP algorithm, which is a balance factor between the
influence of the intrinsic manifold structure and the prior knowledge on each
sentence. the influence of on the
summarization performance. As we can see, the summarization approach performs
not so well when is small, which may be due to over emphasis on the prior
knowledge.
Flow
Chart
Conclusion
In this paper, we
propose a novel MRSP approach to address diversity as well as relevance and
importance in ranking. MRSP uses a manifold ranking process over the data manifold,
which can naturally find the most relevant and important objects. Meanwhile, by
turning ranked objects into sink points on data manifold, MRSP can effectively
prevent redundant objects from receiving a high rank. The integrated MSRP
approach can achieve relevance, importance, diversity, and novelty in a unified
process. Experiments on tasks of update summarization and query recommendation present
strong empirical performance of MRSP. Experiments for update summarization show
that MRSP can achieve comparable performance to the existing best performing systems
in TAC competitions and outperform other baseline methods. Experiments for
query recommendation also demonstrate that our approach can effectively
generate diverse and highly relevant query recommendations.