Anonymization of
Centralized and Distributed Social Networks by Sequential Clustering
ABSTRACT:
We study the problem of privacy-preservation in
social networks. We consider the distributed setting in which the network data
is split between several data holders. The goal is to arrive at an anonymized
view of the unified network without revealing to any of the data holders
information about links between nodes that are controlled by other data
holders. To that end, we start with the centralized setting and offer two
variants of an anonymization algorithm which is based on sequential clustering
(Sq). Our algorithms significantly outperform the SaNGreeA algorithm due to
Campan and Truta which is the leading algorithm for achieving anonymity in networks
by means of clustering. We then devise secure distributed versions of our
algorithms. To the best of our knowledge, this is the first study of privacy
preservation in distributed social networks. We conclude by outlining future
research proposals in that direction.
EXISTING SYSTEM:
The study of anonymizing social networks
has concentrated so far on centralized networks, i.e., networks that are held
by one data holder.
A social network, for example, provides
information on individuals in some population and the links between them, which
may describe relations of friendship, collaboration, correspondence, and so
forth. An information network, as another example, may describe scientific
publications and their citation links. In their most basic form, networks are
modeled by a graph, where the nodes of the graph correspond to the entities,
while edges denote relations between them. Real social networks may be more
complex or contain additional information. For example, in networks where the
described interaction is asymmetric (e.g., a financial transaction network),
the graph would be directed; if the interaction involves more than two parties
(e.g., a social network that describes comembership in social clubs) then the
network would be modeled as a hypergraph; in case where there are several types
of interaction, the edges would be labeled; or the nodes in the graph could be
accompanied by attributes that provide demographic information such as age,
gender, location, or occupation which could enrich and shed light on the
structure of the network. Such social networks are of interest to researchers
from many disciplines, be it sociology, psychology, market research, or epidemiology.
However, the data in such social networks cannot be released as is, since it
might contain sensitive information. Therefore, it is needed to anonymize the
data prior to its publication in order to address the need to respect the
privacy of the individuals whose sensitive information is included in the data.
Data anonymization typically trades off with utility. Hence, it is required to
find a golden path in which the released anonymized data still holds enough
utility, on one hand, and preserves privacy to some accepted degree on the
other hand.
DISADVANTAGES
OF EXISTING SYSTEM:
In an existing system, the complexity in
communication increases and security level is low in social network system.
PROPOSED SYSTEM:
In this study, we deal with social networks where
the nodes could be accompanied by descriptive data, and propose two novel
anonymization methods of the third category (namely, by clustering the nodes).
Our algorithms issue anonymized views of the graph with significantly smaller
information losses than anonymizations issued by the existing algorithms.
The study of anonymizing social networks has
concentrated so far on centralized networks, i.e., networks that are held by
one data holder. However, in some settings, the network data is split between
several data holders, or players. For example, the data in a network of email accounts
where two nodes are connected if the number of email messages that they
exchanged was greater than some given threshold, might be split between several
email service providers. As another example, consider a transaction network
where an edge denotes a financial transaction between two individuals; such a
network would be split between several banks. In such settings, each player controls
some of the nodes (his clients) and he knows only the edges that are adjacent
to the nodes under his control. It is needed to devise secure distributed
protocols that would allow the players to arrive at an anonymized version of
the unified network. Namely, protocols that would not disclose to any of the
interacting players more information than that which is implied by its own
input (being the structure of edges adjacent to the nodes under the control of
that player) and the final output (the anonymized view of the entire unified
network). The recent survey by Wu et al. about privacy-preservation in graphs
and social networks concludes by recommendations for future research in this emerging
area. One of the proposed directions is distributed privacy-preserving social
network analysis, which “has not been well reported in literature.”
ADVANTAGES
OF PROPOSED SYSTEM:
We deal with social networks where the nodes could
be accompanied by descriptive data, and propose two novel anonymization methods
of the third category (namely, by clustering the nodes). Our algorithms issue
anonymized views of the graph with significantly smaller information losses
than anonymizations issued by the algorithms. We also devise distributed versions
of our algorithms and analyze their privacy and communication complexity.
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System :
Windows XP
ü Programming Language :
JAVA/J2EE.
ü Java Version :
JDK 1.6 & above.
ü Database :
MYSQL
REFERENCE:
Tamir Tassa and Dror J. Cohen
“Anonymization of Centralized and Distributed Social Networks by Sequential
Clustering”- IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013.