Spatial Approximate String Search
ABSTRACT:
This work deals with the approximate
string search in large spatial databases. Specifically, we investigate range
queries augmented with a string similarity search predicate in both Euclidean
space and road networks. We dub this query the spatial approximate string (SAS)
query. In Euclidean space, we propose an approximate solution, the MHR-tree,
which embeds min-wise signatures into an R-tree. The min-wise signature for an
index node u keeps a concise representation of the union of q-grams from
strings under the sub-tree of u. We analyze the pruning functionality of such
signatures based on the set resemblance between the query string and the
q-grams from the sub-trees of index nodes. We also discuss how to estimate the
selectivity of a SAS query in Euclidean space, for which we present a novel
adaptive algorithm to find balanced partitions using both the spatial and string
information stored in the tree. For queries on road networks, we propose a novel
exact method, RSASSOL, which significantly outperforms the baseline algorithm in
practice. The RSASSOL combines the q-gram based inverted lists and the
reference nodes based pruning. Extensive experiments on large real data sets
demonstrate the efficiency and effectiveness of our approaches.
EXISTING SYSTEM
Keyword search over a large
amount of data is an important operation in a wide range of domains. Felipe et
al. has recently extended its study to spatial databases, where keyword search
becomes a fundamental building block for an increasing number of real-world
applications, and proposed the IR -Tree.
DISADVANTAGES OF
EXISTING SYSTEM:
v A main limitation of the IR -Tree is that it only supports exact
keyword search.
v Exact
Keyword Require For Searching the Results.
PROPOSED SYSTEM:
For RSAS queries, the baseline spatial solution is based on the
Dijkstra’s algorithm. Given a query point q, the query range radius r, and a
string predicate, we expand from q on the road network using the Dijkstra
algorithm until we reach the points distance r away from q and verify the
string predicate either in a post-processing step or on the intermediate
results of the expansion. We denote this approach as the Dijkstra solution. Its
performance degrades quickly when the query range enlarges and/or the data on
the network increases. This motivates us to find a novel method to avoid the
unnecessary road network expansions, by combining the prunings from both the
spatial and the string predicates simultaneously.
We demonstrate the efficiency and effectiveness of our proposed
methods for SAS queries using a comprehensive experimental evaluation. For ESAS
queries, our experimental evaluation covers both synthetic and real data sets
of up to 10 millions points and 6 dimensions. For RSAS queries, our evaluation
is based on two large, real road network datasets, that contain up to 175,813
nodes, 179,179 edges, and 2 millions points on the road network. In both cases,
our methods have significantly outperformed the respective baseline methods.
ADVANTAGES OF PROPOSED SYSTEM:
This is very helpful for Exact Result
from Non Exact keywords .
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System :
Windows XP
ü Programming Language :
JAVA/J2EE.
ü Java Version :
JDK 1.6 & above.
ü Database :
MYSQL
REFERENCE:
Feifei Li, Member, IEEE, Bin Yao, Mingwang Tang, and
Marios Hadjieleftheriou, “Spatial Approximate String Search”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 25, NO. 6, JUNE 2013.