Measuring the Sky: On Computing Data Cubes via Skylining the Measures
ABSTRACT:
Data
cube is a key element in supporting fast OLAP. Traditionally, an aggregate
function is used to compute the values in data cubes. In this paper, we extend
the notion of data cubes with a new perspective. Instead of using an aggregate
function, we propose to build data cubes using the skyline operation as the
“aggregate function.” Data cubes built in this way are called “group-by skyline
cubes” and can support a variety of analytical tasks. Nevertheless, there are
several challenges in implementing group-by skyline cubes in data warehouses:
1) the skyline operation is computational intensive, 2) the skyline operation
is holistic, and 3) a group-by skyline cube contains both grouping and skyline
dimensions, rendering it infeasible to pre-compute all cuboids in advance. This
paper gives details on how to store, materialize, and query such cubes.
EXISTING SYSTEM:
In the
data-warehousing environment, OLAP tools have been extensively used for a wide
range of decision support applications such as sales analysis, customer analysis,
marketing, and services planning. These OLAP tools are built upon a
multidimensional data model, in which data tuples are partitioned into different
cells based on the values of their dimension attributes.
·
A moving kNN query continuously reports the k
nearest neighbors of a moving query point.
·
Location-based service providers (LBS) that
offer remote kNN querying services often return mobile users a safe region to
the query results.
·
The group-by aggregate result with respect to
a particular set of attributes called dimensions.
·
In data cube with non-holistic aggregate
functions, cuboids can be organized as a lattice.
·
The pre-computation of all cuboids and is
impractical because the number of cuboids is exponential to the number of
dimensions.
DISADVANTAGTES OF EXISTING SYSTEM:
In
traditional data cubes, an aggregate function takes as input a set of measure
values and returns a single numeric value.
PROPOSED SYSTEM:
The Distance find out of a main-memory data
structure which tracks computed distances while inserting objects or performing
similarity queries in the metric space Model.
The number of distance computations spent by
querying/updating the database, similarly like disk page buffering in
traditional DBMSs aims to amortize the I/O cost.
The group-by skyline queries in data
warehouses by proposing the concept of group-by skyline cube.
The B+ tree structure is based on a hash
table, thus making efficient to retrieve stored distances for further usage.
In this
paper, we extend the notion of data cube with a new perspective. Specifically,
we study the issues of building data cubes that exploit the skyline operator as
the post operation instead of the traditional aggregate functions. We name this
type of data cubes as group-by skyline cubes.
To the
best of our knowledge, the building of group-by skyline cube, or the
implementation of the skyline operation as a postoperation in data warehouses,
has not been addressed previously in the research literature or in commercial
products. This paper studies this issue in detail. Our contributions can be
summarized as follows:
1. The concept of group-by skyline cube is
presented. That includes the discussion of what a “group-by skyline cuboid” is,
the relationships between different group-by skyline cuboids, and how these
cuboids constitute a group-by skyline cube.
2. The
technical details of supporting group-by skyline cube are presented.
Specifically, we propose to materialize a group-by skyline cube as an extended
group-by skyline cube (ES-cube). In an ES-cube, skyline results across cuboids
are derivable from each other. We further develop construction and query processing
algorithms for ES-cube.
ADVANTAGES OF PROPOSED SYSTEM:
·
The
skyline operator has been well recognized as a very important decision-support
operator.
·
Experimental
results show that the proposal techniques significantly reduce the query costs
in terms of both CPU time and I/O time.
MODULES:
Query
Level Computation
Data
Materialization
Tree
Constructing Cost.
Group by
Sky Line Data.
Cost
Estimation.
MODULES DESCRIPTION:
Query Level Computation
The
implementing a query processing, the tree structure is traversed such that
non-overlapping users are excluded from further processing.
The
basic and parent filtering, in M*-tree we can use the cost estimate graph
filtering some distance computations needed by basic filtering after an
unsuccessful parent filtering will be saved.
The Sky Line Computation algorithm is a bit more difficult, since the
query radius rQ is not known at the beginning of data search of parent based
pruning
Data Materialization
In
group-by skyline query processing, it is often desirable to
pre-compute/materialize the extended skyline cuboids in the ES-cube.
The
materialize a subset of ES-cuboids that can bring the maximum query processing
improvement.
The
selection of a cuboids for materialization is based on a linear cost model and
that the cost of evaluating a query using a cuboid.
The
default parameter setting as in the experimental section and we measure the
wall clock time of answering 100 random valid group-by skyline queries from a
set of materialized ES-cuboids.
Tree Constructing Cost.
M*-tree,
the navigation to the target leaf makes use of Parent based Pruning, so we
achieve faster navigation.
The
M-Tree insertion into the leaf itself the update of leaf’s-graph is needed,
which takes m distance computations for M*-tree instead of no computation for
M-tree.
The
expensive splitting of a node does not require any additional distance
computation, since all pair wise distances have to be computed to partition the
node, regardless of using M-tree or M*-tree.
Group By Sky Line Data.
The
sharing of computation of group-by skyline cuboids, especially the sharing of
computation across the set of cuboids separated by the distinct value.
In
materialize group-by skyline cuboids using an extended definition of skyline.
We show that group-by skyline cubes materialized in this way can enable sharing
across various cuboids.
The
execution time is spent on the I/O cost and skyline computation shares 55.7%of
the time.
The
remaining 17.4% of time contributes to the grouping operation and the other
overhead.”
Cost Estimation.
In
budget-based partial materialization approach for selecting group-by skyline
cuboids to be materialized such that they yield the highest improvement in query
cost.
The
skyline cardinality for a uniformly-distributed dataset, and the asymptotic
cost of several skyline algorithms in the average-case and the worst-case.
The log
sampling technique for estimating the skyline cardinality and the skyline
computation cost of the BNL and SFS algorithms, with respect to arbitrary data
distribution.
HARDWARE REQUIREMENTS
•
SYSTEM :
Pentium IV 2.4 GHz
•
HARD DISK :
40 GB
•
FLOPPY DRIVE :
1.44 MB
•
MONITOR :
15 VGA colour
•
MOUSE :
Logitech.
•
RAM :
256 MB
•
KEYBOARD : 110 keys enhanced.
SOFTWARE REQUIREMENTS
•
Operating
system :- Windows XP Professional
•
Front
End :- Microsoft Visual Studio .Net 2008
•
Coding
Language : - C# .NET.
•
Database :- SQL Server 2005
REFERENCE:
Man
Lung Yiu, Eric Lo, and Duncan Yung, “Measuring the Sky: On Computing Data Cubes
via Skylining the Measures”, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 3, MARCH 2012.