Distributed Processing of Probabilistic Top-k
Queries in Wireless Sensor Networks

Abstract
we introduce the notion of sufficient set and necessary set for distributed processing of probabilistic top-k queries in cluster-based wireless sensor networks. These two concepts have very nice properties that can facilitate localized data pruning in clusters. Accordingly, we develop a suite of algorithms, namely, sufficient set-based (SSB), necessary set-based (NSB), and boundary-based (BB), for intercluster query processing with bounded rounds of communications. Moreover, in responding to dynamic changes of data distribution in the network, we develop an adaptive algorithm that dynamically switches among the three proposed
algorithms to minimize the transmission cost. We show the applicability of sufficient set and necessary set to wireless sensor networks with both two-tier hierarchical and tree-structured network topologies. Experimental results show that the proposed algorithms reduce data transmissions significantly and incur only small constant rounds of data communications. The experimental results also demonstrate the superiority of the adaptive algorithm, which achieves a near-optimal performance under various conditions.













Existing System
This new technology has resulted in significant impacts on a wide array of applications in various fields, including military, science, industry, commerce, transportation, and health-care. However, the quality of sensors varies significantly in terms of their sensing precision, accuracy, tolerance to hardware/external noise, and so on. For example, studies show that the distribution of noise varies widely in different photovoltic sensors, precision and accuracy of readings usually vary significantly in humidity sensors, and the errors in GPS devices can be up to several meters. Nevertheless, they have mostly been studied under a centralized system setting. In this paper, we explore the problem of processing probabilistic top-k queries in distributed wireless sensor networks. Here, we first use an environmental monitoring application of wireless sensor network to introduce some basics of probabilistic databases. Due to sensing imprecision and environmental interferences, the sensor readings are usually noisy. Thus, multiple sensors are deployed at certain zones in order to improve monitoring quality. In this network, sensor nodes are grouped into clusters, within each of which one of sensors is selected as the cluster head for performing localized data processing.boudary based algorithm using hts to the data processing.

Disadvantage
·         We explore the problem of processing probabilistic top-k queries in distributed wireless sensor networks.
·         The wind station very slowly
·         Data is not accuracy purify
·         The one station to another station delay the communication rate









Propossed System
There are three proposed algorithms to minimize the transmission cost. We show the applicability of sufficient set and necessary set to wireless sensor networks with both two-tier hierarchical and tree-structured network topologies. There are several top-k query semantics and solutions proposed recently, including U-Topk and UkRanks in PT-Topk in PK-Topk in expected rank in  and so on. A common way to process probabilistic top-k queries is to first sort all tuples based on the scoring attribute, and then process tuples in the sorted order  to compute the final answer set. Nevertheless, while focusing on optimizing the transmission bandwidth, the proposed techniques require numerous iterations of computation and communication, introducing tremendous communication overhead and resulting in long latency. As argued in  this is not desirable for many distributed applications, e.g., network monitoring, that require the queries to be answered in a good response time, with a minimized energy consumption. In this paper, we aim at developing energy efficient algorithms optimized for fixed rounds of communications.

Advantage
·         Additionally, NSB and BB take advantage of the skewed necessary sets and necessary boundaries among local clusters to obtain their global boundaries, respectively, which are very effective for intercluster pruning.
·         The transmission cost increases for all algorithms because the number of tuples needed for query processing is increased.










Modules
1.      PT-Topk Query Processing
2.      Sensor Networks
3.      Data pruning
4.       Structured network topology
5.      Data transmission
6.      performance evaluation


Module Description

1.      PT-Topk Query Processing
              The PT-Topk queries in a centralized uncertain database, which provides a good background for the targeted distributed processing problem. The query answer can be obtained by examining the tuples in descending ranking order from the sorted table (which is still denoted as T for simplicity). We can easily determine that the highest ranked k tuples are definitely in the answer set as long as their confidences are greater than p since their qualifications as PT-Topk answers are not dependent on the existence of any other tuples.

2.      Sensor Networks
                        The  extensive number of research work in this area has appeared in the literature. Due to the limited energy budget available at sensor nodes, the primary issue is how to develop energy-efficient techniques to reduce communication and energy costs in the networks. Approximate-based data aggregation techniques have also been proposed. The idea is to tradeoff some data quality for improved energy efficiency. Silberstein et al. develop a sampling-based approach to evaluate approximate top-k queries in wireless sensor networks. Based on statistical modeling techniques, a model-driven approach was proposed in  to balance the confidence of the query answer against the communication cost in the network. Moreover, continuous top-k queries for sensor networks have been studied in  and . In addition, a distributed threshold join algorithm has been developed for top-k queries. These studies, considering no uncertain data, have a different focus from our study.
3.      Data pruning
                        The cluster heads are responsible for generating uncertain data tuples from the collected raw sensor readings within their clusters. To answer a query, it’s natural for the cluster heads to prune redundant uncertain data tuples before delivery to the base station in order to reduce communication and energy cost. The key issue here is how to derive a compact set of tuples essential for the base station to answer the probabilistic top-k queries.

4.      Structured network topology
                         To perform in-network query processing, a routing tree is often formed among       sensor nodes and the base station. A query is issued at the root of the routing tree and propagated along the tree to all sensor nodes. Although the concepts of sufficient set and necessary set introduced earlier are based on two-tier hierarchical sensor networks, they are applicable to tree-structured sensor network.

5.      Data transmission
             The total amount of data transmission as the performance metrics. Notice that, response time is another important metrics to evaluate query processing algorithms in wireless sensor networks. All of those three algorithms, i.e., SSB, NSB, and BB, perform at most two rounds of message exchange there is not much difference among SSB, NSB, and BB in terms of query response time, thus we focus on the data transmission costin the evaluation. Finally, we also conduct experiments to evaluate algorithms, SSB-T, NSB-T, and NSB-T-Opt under the tree-structured network topology.

6.      performance evaluation
             The performance evaluation on the distributed algorithms for processing PT-topk queries in two-tier hierarchical clusterbased wireless sensor monitoring system. As discussed, limited energy budget is a critical issue for wireless sensor network and radio transmission is the most dominate source of energy consumption. Thus, we measure the total amount of data transmission as the performance metrics.  Notice that, response time is another important metrics to evaluate query processing algorithms in wireless sensor networks.

Flow chart
Text Box: Intracluster data pruningText Box: Top-k Queries












                                












CONCLUSION
The notion of sufficient set and necessary set for efficient in-network pruning of distributed uncertain data in probabilistic top-k query processing. Accordingly, we systematically derive sufficient and necessary boundaries and propose a suite of algorithms, namely SSB, NSB, and BB algorithms, for in-network processing of PT-Topk queries. Additionally, we derive a cost model on communication cost of the three proposed algorithms and propose a cost-based adaptive algorithm that adapts to the application dynamics. Although our work in this paper is based mainly under the setting of two-tier hierarchical network, the concepts of sufficient set and necessary set are universal and can be easily extend to a network with tree topology. The performance evaluation validates our ideas and shows that the proposed algorithms reduce data transmissions significantly. While focusing on PT-Topk query in this paper, the developed concepts can be applied to other top-k query variants. We plan to develop algorithms to support other probabilistic top-k queries in the future

















REFERENCES

[1] V. Bychkovskiy, S. Megerian, D. Estrin, and M. Potkonjak, “A Collaborative Approach to in-Place Sensor Calibration,” Proc. Second Int’l Conf. Information Processing in Sensor Networks (IPSN), pp. 301-316, 2003.

[2] E. Elnahrawy and B. Nath, “Poster Abstract: Online Data Cleaning in Wireless Sensor Networks,” Proc. First Int’l Conf. EmbeddedNetworked Sensor Systems (SenSys ’03), pp. 294-295, 2003.

[3] A. Deshpande, C. Guestrin, S.R. Madden, J.M. Hellerstein, and W. Hong, “Model-Driven Data Acquisition in Sensor Networks,” Proc. 13th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 588-599, 2004.

[4] R. Cheng, D.V. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 551-562, 2003.

[5] S. Abiteboul, P. Kanellakis, and G. Grahne, “On the Representation and Querying of Sets of Possible Worlds,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’87), pp. 34- 48, 1987.

[6] N. Dalvi and D. Suciu, “Efficient Query Evaluation on Probabilistic Databases,” Proc. 30th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 864-875, 2004.

[7] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, “Working Models for Uncertain Data,” Proc. 22nd Int’l Conf. Data Eng. (ICDE ’06), p. 7, 2006.

[8] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.S. Vitter, “Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,” Proc. 30th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 876-887, 2004.