Follow us on Facebook

Header Ads

Outsourced Similarity Search on Metric Data Assets


Outsourced Similarity Search on Metric Data Assets

ABSTRACT:

This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.



ARCHITECTURE:


ALGORITHM
Encrypted Hierarchical Index Search (EHI)
This section presents a client algorithm, called encrypted hierarchical index (EHI), for performing NN search on an encrypted hierarchical index stored at the server. This method offers perfect data privacy for the data owner, but it incurs multiple communication round trips during query processing. In the literature, algorithms have been developed for processing range queries on encrypted B+-tree and encrypted R-tree; however, no solutions were proposed for the NN query on those encrypted indexes.


Flexible Distance-based Hashing (FDH)
In this section, we propose a hashing-based technique, called flexible distance-based hashing (FDH), for processing the NN query. The main advantage of this technique is that the server always returns a constant-sized candidate set (in one communication round). The client then refines the candidate set to obtain the final result. Even though FDH is not guaranteed to return the exact result, the final result is very close to the actual NN in practice.

EXISTING SYSTEM
In the literature, a number of concepts for securing databases have been studied. Private information retrieval techniques hide the user’s query, e.g., the data item searched for, but not the data being queried. To outsource valuable data to an insecure server, such techniques are clearly not appropriate. Digital watermarking establishes the data owner’s identity on the data. Additional information stored in the data helps prove ownership, but it cannot prevent an attacker from illegally copying the dataset.
DISADVANTAGE OF EXISTING SYSTEM:
1.     Existing solutions either offer query efficiency at no Privacy.
2.     No Security





PROPOSED SYSTEM
We introduce approaches that shift search functionality to the server. The proposed Metric Preserving Transformation (MPT) stores relative distance information at the server with respect to a private set of anchor objects. This method guarantees correctness of the final search result, but at the cost of two rounds of communication. The proposed Flexible Distance-based Hashing (FDH) methods finishes in just a single round of communication, but does not guarantee retrieval of the exact result.

ADVANTAGE OF PROPOSED SYSTEM:
1.     privacy guarantee
2.     Security
3.     Approximation of the query result
4.     Encrypted index-based technique.
5.     Low storage costs for large databases.

MODULES:
1.     Outsourcing Data
2.     Nearest Neighbor Query
3.     Brute-force Secure Solution (BRUTE)
4.     Anonymization - based Solution (ANONY)

MODULES DESCRIPTION:
Outsourcing Data
It consists of three entities: a data owner, a trusted query user, and an untrusted server. On the one hand, the data owner wishes to upload his data to the server so that users are able to execute queries on those data. On the other hand, the data owner trusts only the users, and nobody else (including the server). The data owner has a set P of (original) objects (e.g., actual time series, graphs, strings), and a key to be used for transformation. First, the data owner applies a transformation function (with a key) to convert P into a set P0 of transformed objects, and uploads the set P0 to the server (see step A1 in the figure 1.1). The server builds an index structure on the set P0 in order to facilitate efficient search. In addition, the data owner applies a standard encryption method (e.g., AES) on the set of original objects; the resulting encrypted objects (with their IDs) are uploaded to the server and stored in a relational table (or in the file system).

Nearest Neighbor Query
In this module, our research objective is to design a transformation method that meets the following requirements:

1) Even in the worst case where the attacker knows the inverse of the transformation function, the attacker can only estimate the original object p from the transformed object p’ with bounded precision.

2)  It enables high query accuracy.

3) It enables efficient query processing in terms of communication cost.

4) It supports the insertion and deletion of objects.

Brute-force Secure Solution (BRUTE)
This brute-force solution is the one we mentioned in the Introduction. In the offline construction phase, the data owner applies conventional encryption (e.g., AES) on each object and then uploads those encrypted objects to the server. At query time, the user needs to download all encrypted objects from the server, decrypt them and then compute the actual result. As mentioned, it is perfectly secure, but its query and communication cost are both prohibitively high, and pay-as you- go is not supported.

Anonymization - based Solution (ANONY)
This anonymization-based solution achieves data privacy by means of k-anonymity — the objects are generalized in such a way that each generalized object cannot be distinguished from k - 1 other generalized objects. In the context of similarity search, it is able to confuse the ranking of transformed objects because k - 1 of them have the same rank as the transformed object of the actual nearest neighbor. Thus, we still consider this solution as a competitor, even though k-anonymity is not a suitable privacy guarantee for our applications.

 

SYSTEM CONFIGURATION:-

HARDWARE REQUIREMENTS:-


ü Processor             -Pentium –III

ü Speed                             -    1.1 Ghz
ü RAM                    -    256 MB(min)
ü Hard Disk            -   20 GB
ü Floppy Drive       -    1.44 MB
ü Key Board            -    Standard Windows Keyboard
ü Mouse                  -    Two or Three Button Mouse
ü Monitor                -    SVGA

 

SOFTWARE REQUIREMENTS:-


v   Operating System          : Windows95/98/2000/XP
v   Application  Server       : Tomcat5.0/6.X                     
v   Front End                      : HTML, Java, JSP
v    Script                            : JavaScript.
v   Server side Script          : Java Server Pages.
v   Database                        : MYSQL

REFERENCE:

Man Lung Yiu, Ira Assent, Christian S. Jensen, and Panos Kalnis, “Outsourced Similarity Search on Metric Data Assets”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO.2, FEBRUARY 2012.