A Framework for Mining
Signatures from Event Sequences and Its Applications in Healthcare Data
ABSTRACT:
This paper proposes a novel temporal knowledge
representation and learning framework to perform large-scale temporal signature
mining of longitudinal heterogeneous event data. The framework enables the
representation, extraction, and mining of high order latent event structure and
relationships within single and multiple event sequences. The proposed
knowledge representation maps the heterogeneous event sequences to a geometric
image by encoding events as a structured spatial-temporal shape process. We present
a doubly constrained convolutional sparse coding framework that learns
interpretable and shift-invariant latent temporal event signatures. We show how
to cope with the sparsity in the data as well as in the latent factor model by
inducing a double sparsity constraint on the β-divergence to learn an over
complete sparse latent factor model. A novel stochastic optimization scheme
performs large-scale incremental learning of group-specific temporal event
signatures. We validate the framework on synthetic data and on an electronic
health record dataset.
EXISTING SYSTEM:
Finding latent temporal signatures is important in
many domains as they encode temporal concepts such as event trends, episodes,
cycles, and abnormalities. For example, in the medical domain latent event
signatures facilitate decision support for patient diagnosis, prognosis, and
management. In the surveillance domain temporal event signatures aid in
detection of suspicious events at specific locations. Of particular interest is
the temporal aspect of information hidden in event data that may be used to
perform intelligent reasoning and inference about the latent relationships
between event entities over time. An event entity can be a person, an object,
or a location in time. For instance, in the medical domain a patient would be
considered as an event entity, where visits to the doctor’s office would be
considered as events.
DISADVANTAGES
OF EXISTING SYSTEM:
Temporal event signature mining for
knowledge discovery is a difficult problem. In this regard, several problems
need to be addressed:
1. The EKR(Event Knowledge
Representation) should handle the time-invariant representation of multiple
event entities as two event entities can be considered similar if they contain
the same temporal signatures at different time intervals or locations,
2. EKR should be flexible to jointly
represent different types of event structure such as single multivariate events
and event intervals to allow a rich representation of complex event
relationships,
3. EKR should be scalable to support
analysis and inference on large-scale databases, and
4. EKR should be sparse to enable
interpretability of the learned signatures by humans.
PROPOSED SYSTEM:
This paper proposes a novel Temporal
Event Matrix Representation (TEMR) and learning framework to perform temporal
signature mining for large-scale longitudinal and heterogeneous event data.
Basically, our TEMR framework represents the event data as a spatial-temporal
matrix, where one dimension of the matrix corresponds to the type of the events
and the other dimension represents the time information. In this case, if event
i happened at time j with value k, then the (i,j)th element of the matrix is k.
This is a very flexible and intuitive framework for encoding the temporal
knowledge information contained in the event sequences. To improve the
scalability of the proposed approach, we further developed an online updating
technology. Finally, the effectiveness of the proposed algorithm is validated
on a real-world healthcare dataset.
ADVANTAGES
OF PROPOSED SYSTEM:
·
First, on the knowledge representation
level, TEMR provides a visual matrix-based representation of complicated event
data composed of different types of events as well as event intervals, which
supports the joint representation of both continuous and discrete valued data.
·
Second, on the algorithmic level, we
propose a doubly sparse convolutional matrix approximation-based formulation
for detecting the latent signatures contained in the datasets. Moreover, we
derive a multiplicative updates procedure to solve the problem and proved
theoretically its convergence. We further propose a novel stochastic
optimization scheme for large-scale longitudinal event signature mining of
multiple event entities in a group. We demonstrate that appropriate
normalization constraints on the sparse latent factor model allow for automatic
rank determination.
·
Third, on the experimental level, we
have validated our approach using both synthetic data and a real-world
Electronic Health Records (EHRs) dataset which contains the longitudinal
medical records of over 20k patients over one year period. We report the
results on the detected signatures, convergence behavior of the algorithm, and
the final matrix reconstruction errors.
ALGORITHMS USED:
ü Algorithm
1. OSC-NMF (Individual)
ü Algorithm
2. OSC-NMF (Group)
Algorithm 1- OSC-NMF (Individual)
Require: X;F; G; r; T; ; λ
Ensure: F 0;G 0
1: Initialize F; G
2: for i = 1 to T do
3: Update F
4: Update G
5: if (converged) then
6: break
7: end if
8: end for
9: return Ro= {W;H}
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
ü Processor - Pentium –IV
ü Speed - 1.1
Ghz
ü RAM - 256
MB(min)
ü Hard Disk -
20 GB
ü Key Board -
Standard Windows Keyboard
ü Mouse - Two
or Three Button Mouse
ü Monitor - SVGA
SOFTWARE CONFIGURATION:-
ü Operating System : Windows XP
ü Programming Language :
JAVA
ü Java Version :
JDK 1.6 & above.
REFERENCE:
Fei Wang, Member, IEEE, Noah Lee,
Jianying Hu, Senior Member, IEEE, Jimeng Sun, Shahram Ebadollahi, Member, IEEE,
and Andrew F. Laine, Fellow, IEEE-
“A Framework for Mining Signatures from
Event Sequences and Its Applications in Healthcare Data”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.
35, NO. 2, FEBRUARY 2013