Distributed Processing of Probabilistic
Top-k
Queries
in Wireless Sensor Networks
Abstract
we introduce the notion
of sufficient set and necessary set for distributed processing of probabilistic
top-k queries in cluster-based wireless sensor networks. These two concepts
have very nice properties that can facilitate localized data pruning in
clusters. Accordingly, we develop a suite of algorithms, namely, sufficient
set-based (SSB), necessary set-based (NSB), and boundary-based (BB), for
intercluster query processing with bounded rounds of communications. Moreover,
in responding to dynamic changes of data distribution in the network, we
develop an adaptive algorithm that dynamically switches among the three
proposed
algorithms to minimize the transmission
cost. We show the applicability of sufficient set and necessary set to wireless
sensor networks with both two-tier hierarchical and tree-structured network
topologies. Experimental results show that the proposed algorithms reduce data
transmissions significantly and incur only small constant rounds of data
communications. The experimental results also demonstrate the superiority of
the adaptive algorithm, which achieves a near-optimal performance under various
conditions.
Existing
System
This new technology has
resulted in significant impacts on a wide array of applications in various
fields, including military, science, industry, commerce, transportation, and
health-care. However, the quality of sensors varies significantly in terms of
their sensing precision, accuracy, tolerance to hardware/external noise, and so
on. For example, studies show that the distribution of noise varies widely in
different photovoltic sensors, precision and accuracy of readings usually vary
significantly in humidity sensors, and the errors in GPS devices can be up to several
meters. Nevertheless, they have mostly been studied under a centralized system
setting. In this paper, we explore the problem of processing probabilistic
top-k queries in distributed wireless sensor networks. Here, we first use an
environmental monitoring application of wireless sensor network to introduce
some basics of probabilistic databases. Due to sensing imprecision and environmental
interferences, the sensor readings are usually noisy. Thus, multiple sensors
are deployed at certain zones in order to improve monitoring quality. In this network,
sensor nodes are grouped into clusters, within each of which one of sensors is
selected as the cluster head for performing localized data processing.boudary
based algorithm using hts to the data processing.
Disadvantage
·
We explore the problem of processing
probabilistic top-k queries in distributed wireless sensor networks.
·
The wind station very slowly
·
Data is not accuracy purify
·
The one station to another station delay
the communication rate
Propossed
System
There are three
proposed algorithms to minimize the transmission cost. We show the
applicability of sufficient set and necessary set to wireless sensor networks with
both two-tier hierarchical and tree-structured network topologies. There are
several top-k query semantics and solutions proposed recently, including U-Topk
and UkRanks in PT-Topk in PK-Topk in expected rank in and so on. A common way to process probabilistic
top-k queries is to first sort all tuples based on the scoring attribute, and
then process tuples in the sorted order
to compute the final answer set. Nevertheless, while focusing on
optimizing the transmission bandwidth, the proposed techniques require numerous
iterations of computation and communication, introducing tremendous
communication overhead and resulting in long latency. As argued in this is not desirable for many distributed
applications, e.g., network monitoring, that require the queries to be answered
in a good response time, with a minimized energy consumption. In this paper, we
aim at developing energy efficient algorithms optimized for fixed rounds of
communications.
Advantage
·
Additionally, NSB and BB take advantage
of the skewed necessary sets and necessary boundaries among local clusters to
obtain their global boundaries, respectively, which are very effective for
intercluster pruning.
·
The transmission cost increases for all
algorithms because the number of tuples needed for query processing is
increased.
Modules
1.
PT-Topk Query Processing
2.
Sensor Networks
3.
Data pruning
4.
Structured network topology
5.
Data transmission
6.
performance evaluation
Module
Description
1.
PT-Topk
Query Processing
The PT-Topk
queries in a centralized uncertain database, which provides a good background
for the targeted distributed processing problem. The query answer can be
obtained by examining the tuples in descending ranking order from the sorted
table (which is still denoted as T for simplicity). We can easily determine
that the highest ranked k tuples are definitely in the answer set as long as
their confidences are greater than p since their qualifications as PT-Topk
answers are not dependent on the existence of any other tuples.
2. Sensor Networks
The extensive number of research work in this
area has appeared in the literature. Due to the limited energy budget available
at sensor nodes, the primary issue is how to develop energy-efficient
techniques to reduce communication and energy costs in the networks.
Approximate-based data aggregation techniques have also been proposed. The idea
is to tradeoff some data quality for improved energy efficiency. Silberstein et
al. develop a sampling-based approach to evaluate approximate top-k queries in
wireless sensor networks. Based on statistical modeling techniques, a
model-driven approach was proposed in to
balance the confidence of the query answer against the communication cost in
the network. Moreover, continuous top-k queries for sensor networks have been
studied in and . In addition, a
distributed threshold join algorithm has been developed for top-k queries.
These studies, considering no uncertain data, have a different focus from our
study.
3.
Data
pruning
The
cluster heads are responsible for generating uncertain data tuples from the
collected raw sensor readings within their clusters. To answer a query, it’s
natural for the cluster heads to prune redundant uncertain data tuples before
delivery to the base station in order to reduce communication and energy cost.
The key issue here is how to derive a compact set of tuples essential for the
base station to answer the probabilistic top-k queries.
4.
Structured
network topology
To perform in-network
query processing, a routing tree is often formed among sensor nodes and the base station. A query
is issued at the root of the routing tree and propagated along the tree to all
sensor nodes. Although the concepts of sufficient set and necessary set
introduced earlier are based on two-tier hierarchical sensor networks, they are
applicable to tree-structured sensor network.
5.
Data
transmission
The total amount of
data transmission as the performance metrics. Notice that, response time is
another important metrics to evaluate query processing algorithms in wireless
sensor networks. All of those three algorithms, i.e., SSB, NSB, and BB, perform
at most two rounds of message exchange there is not much difference among SSB,
NSB, and BB in terms of query response time, thus we focus on the data
transmission costin the evaluation. Finally, we also conduct experiments to
evaluate algorithms, SSB-T, NSB-T, and NSB-T-Opt under the tree-structured
network topology.
6.
performance
evaluation
The performance evaluation on the distributed
algorithms for processing PT-topk queries in two-tier hierarchical clusterbased
wireless sensor monitoring system. As discussed, limited energy budget is a
critical issue for wireless sensor network and radio transmission is the most
dominate source of energy consumption. Thus, we measure the total amount of
data transmission as the performance metrics.
Notice that, response time is another important metrics to evaluate
query processing algorithms in wireless sensor networks.
Flow
chart
CONCLUSION
The notion of
sufficient set and necessary set for efficient in-network pruning of
distributed uncertain data in probabilistic top-k query processing. Accordingly,
we systematically derive sufficient and necessary boundaries and propose a
suite of algorithms, namely SSB, NSB, and BB algorithms, for in-network
processing of PT-Topk queries. Additionally, we derive a cost model on communication
cost of the three proposed algorithms and propose a cost-based adaptive
algorithm that adapts to the application dynamics. Although our work in this
paper is based mainly under the setting of two-tier hierarchical network, the
concepts of sufficient set and necessary set are universal and can be easily
extend to a network with tree topology. The performance evaluation validates
our ideas and shows that the proposed algorithms reduce data transmissions
significantly. While focusing on PT-Topk query in this paper, the developed
concepts can be applied to other top-k query variants. We plan to develop
algorithms to support other probabilistic top-k queries in the future
REFERENCES
[1] V.
Bychkovskiy, S. Megerian, D. Estrin, and M. Potkonjak, “A Collaborative
Approach to in-Place Sensor Calibration,” Proc. Second Int’l Conf. Information
Processing in Sensor Networks (IPSN), pp. 301-316, 2003.
[2] E. Elnahrawy
and B. Nath, “Poster Abstract: Online Data Cleaning in Wireless Sensor
Networks,” Proc. First Int’l Conf. EmbeddedNetworked Sensor Systems (SenSys
’03), pp. 294-295, 2003.
[3] A.
Deshpande, C. Guestrin, S.R. Madden, J.M. Hellerstein, and W. Hong,
“Model-Driven Data Acquisition in Sensor Networks,” Proc. 13th Int’l Conf. Very
Large Data Bases (VLDB ’04), pp. 588-599, 2004.
[4] R. Cheng,
D.V. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over
Imprecise Data,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03),
pp. 551-562, 2003.
[5] S.
Abiteboul, P. Kanellakis, and G. Grahne, “On the Representation and Querying of
Sets of Possible Worlds,” Proc. ACM SIGMOD Int’l Conf. Management of Data
(SIGMOD ’87), pp. 34- 48, 1987.
[6] N. Dalvi and
D. Suciu, “Efficient Query Evaluation on Probabilistic Databases,” Proc. 30th
Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 864-875, 2004.
[7] A.D. Sarma,
O. Benjelloun, A. Halevy, and J. Widom, “Working Models for Uncertain Data,”
Proc. 22nd Int’l Conf. Data Eng. (ICDE ’06), p. 7, 2006.
[8] R. Cheng, Y.
Xia, S. Prabhakar, R. Shah, and J.S. Vitter, “Efficient Indexing Methods for
Probabilistic Threshold Queries over Uncertain Data,” Proc. 30th Int’l Conf.
Very Large Data Bases (VLDB ’04), pp. 876-887, 2004.