Outsourced
Similarity Search on Metric Data Assets
ABSTRACT:
This
paper considers a cloud computing setting in which similarity querying of
metric data is outsourced to a service provider. The data is to be revealed
only to trusted users, not to the service provider or anyone else. Users query
the server for the most similar data objects to a query example. Outsourcing
offers the data owner scalability and a low-initial investment. The need for privacy
may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in
astronomy), or otherwise confidential. Given this setting, the paper presents
techniques that transform the data prior to supplying it to the service
provider for similarity queries on the transformed data. Our techniques provide
interesting trade-offs between query cost and accuracy. They are then further
extended to offer an intuitive privacy guarantee. Empirical studies with real
data demonstrate that the techniques are capable of offering privacy while
enabling efficient and accurate processing of similarity queries.
ARCHITECTURE:
ALGORITHM
Encrypted Hierarchical Index Search (EHI)
This section presents a client
algorithm, called encrypted hierarchical index (EHI), for performing NN search
on an encrypted hierarchical index stored at the server. This method offers
perfect data privacy for the data owner, but it incurs multiple communication
round trips during query processing. In the literature, algorithms have been
developed for processing range queries on encrypted B+-tree and encrypted
R-tree; however, no solutions were proposed for the NN query on those encrypted
indexes.
Flexible Distance-based Hashing (FDH)
In this section, we propose a
hashing-based technique, called flexible distance-based hashing (FDH), for
processing the NN query. The main advantage of this technique is that the
server always returns a constant-sized candidate set (in one communication
round). The client then refines the candidate set to obtain the final result.
Even though FDH is not guaranteed to return the exact result, the final result
is very close to the actual NN in practice.
EXISTING SYSTEM
In the literature, a number of
concepts for securing databases have been studied. Private information
retrieval techniques hide the user’s query, e.g., the data item searched for,
but not the data being queried. To outsource valuable data to an insecure
server, such techniques are clearly not appropriate. Digital watermarking
establishes the data owner’s identity on the data. Additional information
stored in the data helps prove ownership, but it cannot prevent an attacker
from illegally copying the dataset.
DISADVANTAGE OF EXISTING SYSTEM:
1.
Existing
solutions either offer query efficiency at no Privacy.
2.
No
Security
PROPOSED SYSTEM
We
introduce approaches that shift search functionality to the server. The
proposed Metric Preserving Transformation (MPT) stores relative distance
information at the server with respect to a private set of anchor objects. This
method guarantees correctness of the final search result, but at the cost of
two rounds of communication. The proposed Flexible Distance-based Hashing (FDH)
methods finishes in just a single round of communication, but does not
guarantee retrieval of the exact result.
ADVANTAGE OF PROPOSED SYSTEM:
1.
privacy
guarantee
2.
Security
3.
Approximation
of the query result
4.
Encrypted
index-based technique.
5.
Low
storage costs for large databases.
MODULES:
1.
Outsourcing
Data
2.
Nearest
Neighbor Query
3.
Brute-force
Secure Solution (BRUTE)
4.
Anonymization
- based Solution (ANONY)
MODULES DESCRIPTION:
Outsourcing Data
It
consists of three entities: a data owner, a trusted query user, and an
untrusted server. On the one hand, the data owner wishes to upload his data to
the server so that users are able to execute queries on those data. On the
other hand, the data owner trusts only the users, and nobody else (including
the server). The data owner has a set P of (original) objects (e.g., actual
time series, graphs, strings), and a key to be used for transformation. First,
the data owner applies a transformation function (with a key) to convert P into
a set P0 of transformed objects, and uploads the set P0 to the server (see step
A1 in the figure 1.1). The server builds an index structure on the set P0 in
order to facilitate efficient search. In addition, the data owner applies a
standard encryption method (e.g., AES) on the set of original objects; the
resulting encrypted objects (with their IDs) are uploaded to the server and
stored in a relational table (or in the file system).
Nearest Neighbor Query
In
this module, our research objective is to design a transformation method that
meets the following requirements:
1)
Even in the worst case where the attacker knows the inverse of the
transformation function, the attacker can only estimate the original object p
from the transformed object p’ with bounded precision.
2) It enables high query accuracy.
3) It
enables efficient query processing in terms of communication cost.
4) It supports the insertion and
deletion of objects.
Brute-force Secure Solution (BRUTE)
This
brute-force solution is the one we mentioned in the Introduction. In the
offline construction phase, the data owner applies conventional encryption
(e.g., AES) on each object and then uploads those encrypted objects to the
server. At query time, the user needs to download all encrypted objects from
the server, decrypt them and then compute the actual result. As mentioned, it
is perfectly secure, but its query and communication cost are both
prohibitively high, and pay-as you- go is not supported.
Anonymization - based Solution (ANONY)
This anonymization-based solution
achieves data privacy by means of k-anonymity — the objects are generalized in
such a way that each generalized object cannot be distinguished from k - 1
other generalized objects. In the context of similarity search, it is able to
confuse the ranking of transformed objects because k - 1 of them have the same
rank as the transformed object of the actual nearest neighbor. Thus, we still
consider this solution as a competitor, even though k-anonymity is not a
suitable privacy guarantee for our applications.
SYSTEM CONFIGURATION:-
HARDWARE REQUIREMENTS:-
ü Processor -Pentium –III
ü Speed - 1.1 Ghz
ü RAM - 256 MB(min)
ü Hard Disk - 20 GB
ü Floppy Drive - 1.44 MB
ü Key Board - Standard
Windows Keyboard
ü Mouse - Two or Three Button Mouse
ü Monitor - SVGA
SOFTWARE REQUIREMENTS:-
v Operating System : Windows95/98/2000/XP
v Application Server :
Tomcat5.0/6.X
v Front End : HTML, Java, JSP
v Script :
JavaScript.
v Server side Script : Java Server Pages.
v Database : MYSQL
REFERENCE:
Man Lung Yiu, Ira Assent, Christian
S. Jensen, and Panos Kalnis, “Outsourced Similarity Search on Metric Data
Assets”, IEEE TRANSACTIONS ON KNOWLEDGE
AND DATA ENGINEERING, VOL. 24, NO.2, FEBRUARY 2012.