Load
Rebalancing for Distributed File Systems in Clouds
ABSTRACT:
Distributed file systems are key building blocks for
cloud computing applications based on the MapReduce programming paradigm. In
such file systems, nodes simultaneously serve computing and storage functions;
a file is partitioned into a number of chunks allocated in distinct nodes so
that MapReduce tasks can be performed in parallel over the nodes. However, in a
cloud computing environment, failure is the norm, and nodes may be upgraded,
replaced, and added in the system. Files can also be dynamically created,
deleted, and appended. This results in load imbalance in a distributed file
system; that is, the file chunks are not distributed as uniformly as possible
among the nodes. Emerging distributed file systems in production systems
strongly depend on a central node for chunk reallocation. This dependence is
clearly inadequate in a large-scale, failure-prone environment because the central
load balancer is put under considerable workload that is linearly scaled with
the system size, and may thus become the performance bottleneck and the single
point of failure. In this paper, a fully distributed load rebalancing algorithm
is presented to cope with the load imbalance problem. Our algorithm is compared
against a centralized approach in a production system and a competing distributed
solution presented in the literature. The simulation results indicate that our
proposal is comparable with the existing centralized approach and considerably
outperforms the prior distributed algorithm in terms of load imbalance factor,
movement cost, and algorithmic overhead. The performance of our proposal
implemented in the Hadoop distributed file system is further investigated in a
cluster environment.
ARCHITECTURE:
OBJECTIVE:
In this paper, we are interested in
studying the load rebalancing problem in distributed file systems
specialized for large-scale, dynamic and data-intensive clouds.
EXISTING SYSTEM:
State-of-the-art distributed file systems (e.g., Google
GFS and Hadoop HDFS) in clouds rely on central nodes to manage the metadata
information of the file systems and to balance the loads of storage nodes based
on that metadata. The centralized approach simplifies the design and
implementation of a distributed file system. However, recent experience
concludes that when the number of storage nodes, the number of files and the
number of accesses to files increase linearly, the central nodes (e.g., the master
in Google GFS) become a performance bottleneck, as they are unable to
accommodate a large number of file accesses due to clients and MapReduce
applications.
DISADVANTAGES
OF EXISTING SYSTEM:
The most existing solutions are designed without
considering both movement cost and node heterogeneity and may introduce
significant maintenance network traffic to the DHTs.
PROPOSED SYSTEM:
·
In this paper, we are interested in
studying the load rebalancing problem in distributed file systems specialized for
large-scale, dynamic and data-intensive clouds. (The terms “rebalance” and
“balance” are interchangeable in this paper.) Such a large-scale cloud has
hundreds or thousands of nodes (and may reach tens of thousands in the future).
·
Our objective is to allocate the chunks
of files as uniformly as possible among the nodes such that no node manages an excessive
number of chunks. Additionally, we aim to reduce network traffic (or movement
cost) caused by rebalancing the loads of nodes as much as possible to maximize
the network bandwidth available to normal applications. Moreover, as failure is
the norm, nodes are newly added to sustain the overall system
performance,resulting in the heterogeneity of nodes. Exploiting capable nodes
to improve the system performance is, thus, demanded.
·
Our proposal not only takes advantage of
physical network locality in the reallocation of file chunks to reduce the
movement cost but also exploits capable nodes to improve the overall system
performance.
ADVANTAGES
OF PROPOSED SYSTEM:
ü This
eliminates the dependence on central nodes.
ü Our
proposed algorithm operates in a distributed manner in which nodes perform
their load-balancing tasks independently without synchronization or global
knowledge regarding the system.
ü Algorithm
reduces algorithmic overhead introduced to the DHTs as much as possible.
ALGORITHM USED:
ü Load Rebalancing Algorithm
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System : Windows XP
ü Programming Language :
JAVA
ü Java Version :
JDK 1.6 & above.
REFERENCE:
Hung-Chang Hsiao, Member, IEEE Computer
Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and Yu-Chang Chao, “Load
Rebalancing for Distributed File Systems in Clouds”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 5,
MAY 2013.