Tuesday, June 14, 2016

A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe Systems

A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe Systems
Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the requirements of complex computing and reliable communication. In this paper, we propose SREM, a scalable and reliable event matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links among servers, we propose a distributed overlay SkipCloud to organize servers of SREM. Through a hybrid space partitioning technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput and provides multiple candidate servers for each event. Moreover, a series of dynamics maintenance mechanisms are extensively studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a CloudStack testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in tens of seconds even if a large number of servers fail simultaneously.
v In traditional data dissemination applications, the live content are generated by publishers at a low speed, which makes many pub/subs adopt the multi-hop routing techniques to disseminate events.
v A large body of broker-based pub/subs forward events and subscriptions through organizing nodes into diverse distributed overlays, such as tree based design, cluster-based design and DHT-based design.

§  The system cannot scalable to support the large amount of live content.
§  The Multihop routing techniques in these broker-based systems lead to a low matching throughput, which is inadequate to apply to current high arrival rate of live content.
§  Most of them are inappropriate to the matching of live content with high data dimensionality due to the limitation of their subscription space partitioning techniques, which bring either low matching throughput or high memory overhead.

v Specifically, we mainly focus on two problems: one is how to organize servers in the cloud computing environment to achieve scalable and reliable routing. The other is how to manage subscriptions and events to achieve parallel matching among these servers.
v We propose a distributed overlay protocol, called SkipCloud, to organize servers in the cloud computing environment. SkipCloud enables subscriptions and events to be forwarded among brokers in a scalable and reliable manner. Also it is easy to implement and maintain.
v To achieve scalable and reliable event matching among multiple servers, we propose a hybrid multidimensional space partitioning technique, called HPartition. It allows similar subscriptions to be divided into the same server and provides multiple candidate matching servers for each event. Moreover, it adaptively alleviates hot spots and keeps workload balance among all servers.

ü We propose a scalable and reliable matching service for content-based pub/sub service in cloud computing environments, called SREM.
ü We propose a hybrid multidimensional space partitioning technique, called HPartition SSPartition.
ü To alleviate the hot spots whose subscriptions fall into a narrow space, we propose a subscription set partitioning,
ü Through a hybrid multi-dimensional space partitioning technique, SREM reaches scalable and balanced clustering of high dimensional skewed subscriptions

v Datacenter / Broker creation
v Clustering Method
v Content Space Partitioning
v Event Matching
v Routing Method  
Datacenter / Broker creation:
In the first module, we develop the Datacenter creation and Broker Creation. To support large-scale users, we consider a cloud computing environment with a set of geographically distributed datacenters. Each datacenter contains a large number of servers (brokers), which are managed by a datacenter management service. Our approach is suitable for large and reasonably stable environments such as that of an enterprise or a data center, where reliable publication delivery is desired in spite of failures. As future work, we would like to exploit our scheme to allow for multi-path load balancing, and support some of P/S optimization techniques such as subscription covering. It provides an abstract and high level interface for data producers (publishers) to publish messages and consumers (subscribers) to receive messages that match their interest.
Clustering Method:
Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k ≤ n. It means that it will classify the data into k groups, which satisfy the following requirements:
  • Each group contains at least one object.
  • Each object must belong to exactly one group.
Content Space Partitioning:
The content space is partitioned into disjoint subspaces, each of which is managed by a number of brokers. Then each top cluster only handles a subset of the entire space and searches a small number of candidate subscriptions. The whole content space into non-overlapping zones based on the number of its brokers. After that, the brokers in different cliques who are responsible for similar zones are connected by a multicast tree.

Event Matching:
The data replication schemes are employed to ensure reliable event matching. For instance, it advertises subscriptions to the whole network. When receiving an event, each broker determines to forward the event to the corresponding broker according to its routing table. These approaches are inadequate to achieve scalable event matching.
Routing Method:
The routing process usually directs forwarding on the basis of routing tables, which maintain a record of the routes to various network destinations. Thus, constructing routing tables, which are held in the router's memory, is very important for efficient routing. Most routing algorithms use only one network path at a time. Multipath routing techniques enable the use of multiple alternative paths. Prefix routing in SkipCloud is mainly used to efficiently route subscriptions and events to the top clusters. Note that the cluster identifiers at level are generated by appending one b-ary to the corresponding clusters at level i. The relation of identifiers between clusters is the foundation of routing to target clusters. Briefly, when receiving a routing request to a specific cluster, a broker examines its neighbor lists of all levels and chooses the neighbor which shares the longest common prefix with the target ClusterID as the next hop. The routing operation repeats until a broker cannot find a neighbor whose identifier is more closer than itself.

Ø System                          :         Pentium IV 2.4 GHz.
Ø Hard Disk                      :         40 GB.
Ø Floppy Drive                 :         1.44 Mb.
Ø Monitor                         :         15 VGA Colour.
Ø Mouse                            :         Logitech.
Ø Ram                               :         512 Mb.


Ø Operating system           :         Windows XP/7.
Ø Coding Language :         JAVA/J2EE
Ø IDE                      :         Netbeans 7.4
Ø Database              :         MYSQL

Xingkong Ma, Student Member, IEEE, Yijie Wang, Member, IEEE, and Xiaoqiang Pei, “A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe Systems” IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015.