A Generalized Flow-Based Method for
Analysis of Implicit Relationships on Wikipedia
ABSTRACT:
We focus on measuring relationships between pairs of
objects in Wikipedia whose pages can be regarded as individual objects. Two
kinds of relationships between two objects exist: in Wikipedia, an explicit
relationship is represented by a single link between the two pages for the
objects, and an implicit relationship is represented by a link structure
containing the two pages. Some of the previously proposed methods for measuring
relationships are cohesion-based methods, which underestimate objects having
high degrees, although such objects could be important in constituting
relationships in Wikipedia. The other methods are inadequate for measuring
implicit relationships because they use only one or two of the following three
important factors: distance, connectivity, and co citation. We propose a new
method using a generalized maximum flow which reflects all the three factors
and does not underestimate objects having high degree. We confirm through
experiments that our method can measure the strength of a relationship more
appropriately than these previously proposed methods do. Another remarkable
aspect of our method is mining elucidatory objects, that is, objects constituting
a relationship. We explain that mining elucidatory objects would open a novel
way to deeply understand a relationship.
EXISTING SYSTEM:
Several methods have been proposed for measuring the
strength of a relationship between two objects on an information network (V, E), a directed graph where V is a set
of objects; an edge (u, v) E exists if and only if object u V has an explicit relationship to u V. We can define a Wikipedia information
network whose vertices are pages of Wikipedia and whose edges are links between
pages. Previously proposed methods then
can be applied to Wikipedia by using a Wikipedia information network. The Concept of “cohesion,” exists for
measuring the strength of an implicit relationship. CFEC proposed by Koren et al.
[1] and PFIBF proposed by Nakayama et al. is based on cohesion. We do not adopt
the idea of cohesion based methods, because they always punish objects having
high degrees although such
objects could be important to some relationships in Wikipedia. Other previously
proposed methods use only one or two of the three representative concepts for
measuring a relationship: distance, connectivity, and cocitation, although all
the concepts are important factors for implicit relationships. Using all the
three concepts together would be appropriate for measuring an implicit
relationship and mining elucidatory objects.
DISADVANTAGES
OF EXISTING SYSTEM:
·
It is difficult for the user to discover
an implicit relationship and elucidatory objects without investigating a number
of pages and links.
·
Therefore, it is an interesting problem
to measure and explain the strength of an implicit relationship between two
objects in Wikipedia.
PROPOSED SYSTEM:
We propose a new method for measuring a relationship
on Wikipedia by reflecting all the three concepts: distance, connectivity, and
cocitation. We measure relationships rather than similarities. As discussed in
relationship is a more general concept than similarity. For example, it is hard
to say petroleum is similar to USA, but a relationship exists between petroleum
and the USA. Our method uses a “generalized maximum flow” on an information network
to compute the strength of a relationship from object s to object t using the
value of the flow whose source is s and destination is t. It introduces a gain
for every edge on the network. The value of a flow sent along an edge is multiplied
by the gain of the edge. Assignment of the gain to each edge is important for
measuring a relationship using a generalized maximum flow. We propose a
heuristic gain function utilizing the category structure in Wikipedia. We confirm
through experiments that the gain function is sufficient to measure
relationships appropriately.
ADVANTAGES
OF PROPOSED SYSTEM:
·
Compute the strength of the relationship
between a source object and each of its destination objects, and rank the
destination objects by the strength.
·
Assignment of the gain to each edge is
important for measuring a relationship using a generalized maximum flow.
·
Experiments on Wikipedia showing that
our method is the most appropriate one
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System :
Windows XP
ü Programming Language :
JAVA/J2EE.
ü Java Version :
JDK 1.6 & above.
ü Database :
MYSQL
REFERENCE:
Xinpeng Zhang, Member, IEEE, Yasuhito
Asano, Member, IEEE, and Masatoshi Yoshikawa “A Generalized Flow-Based Method
for Analysis of Implicit Relationships on Wikipedia”- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2,
FEBRUARY 2013.