A Privacy Leakage Upper
Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of
Intermediate Data Sets in Cloud
ABSTRACT:
Cloud computing provides massive computation power
and storage capacity which enable users to deploy computation and
data-intensive applications without infrastructure investment. Along the
processing of such applications, a large volume of intermediate data sets will
be generated, and often stored to save the cost of recomputing them. However,
preserving the privacy of intermediate data sets becomes a challenging problem
because adversaries may recover privacy-sensitive information by analyzing multiple
intermediate data sets. Encrypting ALL data sets in cloud is widely adopted in
existing approaches to address this challenge. But we argue that encrypting all
intermediate data sets are neither efficient nor cost-effective because it is
very time consuming and costly for data-intensive applications to en/decrypt
data sets frequently while performing any operation on them. In this paper, we propose
a novel upper bound privacy leakage constraint-based approach to identify which
intermediate data sets need to be encrypted and which do not, so that
privacy-preserving cost can be saved while the privacy requirements of data holders
can still be satisfied. Evaluation results demonstrate that the
privacy-preserving cost of intermediate data sets can be significantly reduced
with our approach over existing ones where all data sets are encrypted.
EXISTING SYSTEM:
The privacy concerns caused by retaining
intermediate data sets in cloud are important but they are paid little attention.
Storage and computation services in cloud are equivalent from an economical
perspective because they are charged in proportion to their usage. Thus, cloud
users can store valuable intermediate data sets selectively when processing
original data sets in data intensive applications like medical diagnosis, in
order to curtail the overall expenses by avoiding frequent recomputation to
obtain these data sets. Such scenarios are quite common because data users
often reanalyze results, conduct new analysis on intermediate data sets, or share
some intermediate results with others for collaboration. Without loss of
generality, the notion of intermediate data set herein refers to intermediate
and resultant data sets.
However, the storage of intermediate data enlarges attack
surfaces so that privacy requirements of data holders are at risk of being
violated. Usually, intermediate data sets in cloud are accessed and processed
by multiple parties, but rarely controlled by original data set holders. This
enables an adversary to collect intermediate data sets together and menace
privacy-sensitive information from them, bringing considerable economic loss or
severe social reputation impairment to data owners. But, little attention has
been paid to such a cloud-specific privacy issue.
DISADVANTAGES
OF EXISTING SYSTEM:
Existing technical approaches for preserving the
privacy of data sets stored in cloud mainly include encryption and anonymization.
On one hand, encrypting all data sets, a straightforward and effective
approach, is widely adopted in current research.
However, processing on encrypted data sets
efficiently is quite a challenging task, because most existing applications
only run on unencrypted data sets. Although recent progress has been made in homomorphic
encryption which theoretically allows performing computation on encrypted data
sets, applying current algorithms are rather expensive due to their inefficiency
On the other hand, partial information of data sets, e.g., aggregate
information, is required to expose to data users in most cloud applications
like data mining and analytics. In such cases, data sets are Anonymized rather
than encrypted to ensure both data utility and privacy preserving. Current
privacy-preserving techniques like generalization can withstand most privacy
attacks on one single data set, while preserving privacy for multiple data sets
is still a challenging problem
PROPOSED SYSTEM:
Encrypting all intermediate data sets will lead to
high overhead and low efficiency when they are frequently accessed or
processed. As such, we propose to encrypt part of intermediate data sets rather
than all for reducing privacy-preserving cost.
In this paper, we propose a novel approach to
identify which intermediate data sets need to be encrypted while others do not,
in order to satisfy privacy requirements given by data holders. A tree
structure is modeled from generation relationships of intermediate data sets to
analyze privacy propagation of data sets.
As quantifying joint privacy leakage of multiple
data sets efficiently is challenging, we exploit an upper bound constraint to
confine privacy disclosure. Based on such a constraint, we model the problem of
saving privacy-preserving cost as a constrained optimization problem. This
problem is then divided into a series of sub-problems by decomposing privacy
leakage constraints. Finally, we design a practical heuristic algorithm
accordingly to identify the data sets that need to be encrypted.
ADVANTAGES
OF PROPOSED SYSTEM:
The major contributions of our research are
threefold.
ü First,
we formally demonstrate the possibility of ensuring privacy leakage
requirements without encrypting all intermediate data sets when encryption is
incorporated with anonymization to preserve privacy.
ü Second,
we design a practical heuristic algorithm to identify which data sets need to
be encrypted for preserving privacy while the rest of them do not.
ü Third,
experiment results demonstrate that our approach can significantly reduce
privacy-preserving cost over existing approaches, which is quite beneficial for
the cloud users who utilize cloud services in a pay-as-you-go fashion.
ARCHITECTURE:
ALGORITHM USED:
Algorithm 1:
Privacy-Preserving Cost Reducing Heuristic
SYSTEM
REQUIREMENTS:
HARDWARE
REQUIREMENTS:
·
System : Pentium IV 2.4 GHz.
·
Hard Disk : 40 GB.
·
Monitor :
15 inch VGA Colour.
·
Mouse :
Logitech Mouse.
·
Ram : 512 MB
·
Keyboard :
Standard Keyboard
SOFTWARE
REQUIREMENTS:
·
Operating System : Windows XP.
·
Coding Language : ASP.NET, C#.Net.
·
Database :
SQL Server 2005
REFERENCE:
Xuyun Zhang, Chang Liu, Surya Nepal, Suraj Pandey,
and Jinjun Chen, “A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective
Privacy Preserving of Intermediate Data Sets in Cloud”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 6,
JUNE 2013.