A Discriminative Model
of Motion and Cross Ratio for View-Invariant Action Recognition
ABSTRACT:
Action
recognition is very important for many applications such as video surveillance,
human–computer interaction, and so on; view-invariant action recognition is hot
and difficult as well in this field. In this paper, a new discriminative model
is proposed for video-based view-invariant action recognition. In the discriminative
model, motion pattern and view invariants are perfectly fused together to make
a better combination of invariance and distinctiveness. We address a series of
issues, including interest point detection in image sequence, motion feature
extraction and description, and view-invariant calculation. First, motion
detection is used to extract motion information from videos, which is much more
efficient than traditional background modeling and tracking based methods.
Second, as for feature representation, we exact variety of statistical
information from motion and view-invariant feature based on cross ratio. Last,
in the action modeling, we apply a discriminative probabilistic model-hidden
conditional random field to model motion patterns and view invariants, by which
we could fuse the statistics of motion and projective invariability of cross
ratio in one framework. Experimental results demonstrate that our method can
improve the ability to distinguish different categories of actions with high
robustness to view change in real circumstances.
EXISTING SYSTEM:
Human action recognition in video
sequences is one of the important and challenging problems in computer vision, which
aims to build the mapping between dynamic image information and semantic
understanding. While the analysis of human action tries to discover the
underlying patterns of human action in image data, it is also much useful in
many real applications such as intelligent video surveillance, content-based
image retrieval, event detection, and so on. Human action recognition involves
a series of problems such as image data acquisition, robust feature extraction
and representation, training classifier with high discriminative capability,
and other application problems that may come out in practical system running.
DISADVANTAGES OF EXISTING SYSTEM:
In previous work of view-invariant
action recognition, it is difficult to obtain very accurate trajectories of
moving objects because of noise caused by self-occlusions
Appearance based methods such as
scale-invariant feature transform (SIFT) are not quite suitable for motion
analysis since those appearance-based methods, such as color, gray level, and
texture, are not stable among neighboring frames of dynamic scenes.
Existing methods of background modeling,
such as Gaussian mixture model, suffer from low effectiveness required for
accurate human behavior analysis. For example, some traditional methods do not
work well with the existence of shadows, light changes, and, particularly, view
changes in real scenes.
PROPOSED SYTEM:
A compact framework is proposed for view-invariant
action recognition from the perspective of motion information in image
sequences, in which we could properly encapsulate motion pattern and view
invariants in such a model that results in a complementary fusion of two
aspects of characteristics of human actions.
In this paper, we will describe our
method in three phases. In the first stage, we will introduce a motion
detection method to detect interest points in space and time domain, which
could facilitate optical flow extraction not only in an expected local area but
also for view invariants–cross ratio from those detected interest points. In
the second stage, the oriented optical flow feature is described to be
represented by oriented histogram projection to capture statistical motion information,
and in the third stage, the optical flow and view-invariant features are fused
together in a discriminative model.
ADVANTAGES OF PROPOSED SYSTEM:
The proposed framework of action
recognition has shown good integration and achieved expected performance on
some challenging databases.
MODULES:
1. Optical flow in local
area
2. View Invariants
extraction
3. Optical flow feature
expression
4. Discriminative model
5. Action Recognition
MODULE DESCRIPTION:
Optical flow in local area
Since appearance-based features such as
Harris, histogram of oriented gradient (HOG), SIFT, Gabor, and shape highly
depend on the stability of image processing, they fail to accurately recognize different
kinds of actions because of the non-rigidity nature of the human body or some
other impacts in real applications. Therefore, in our method, after detection
of interest points in videos, we extract motion features from the neighboring
area around the interest points and build the representation of the statistical
properties of the local area of the image.
View Invariants extraction
Geometric
invariants capture invariant information of a geometric configuration under a
certain class of transformations. Group theory gives us theoretical foundation
for constructing invariants [34]. Since they could be measured directly from images
without knowing the orientation and position of the camera, they have been
widely used for object recognition to tackle the problem of projective
distortion caused by viewpoint variations.
In
view-invariant action recognition, traditional-model-based methods evaluate the
fitness between image points and the predefined 3-D models. However, it is difficult
to detect qualified image points that satisfy the specific geometric configuration
required to get the desired invariants.
Optical flow feature expression
Optical flow takes the form of 2-D
vector representing image pixel velocity in the – and -directions. The
beginning and ending points of the optical flow vector correspond to
displacement of image pixels. There are mainly two kinds of methods to extract
optical flow from images. The first one is a feature-based method, which
calculates the matching score of the feature points between neighboring frames
and takes the displacements of the matched points as the start and endpoints of
the optical flow vector. However, due to the instability of the image edges and
large displacement of moving human body, the calculated optical flow could
hardly exhibit the real movement of human body. The second one is gradient-based
methods, which are widely used in computer vision tasks. Gradient-based methods
assume that the gray level in a local area of the images is relatively stable
between adjacent frames. More importantly, by calculating the image gradient
and optimizing the cost function in an iterative way, we can give a dense field
of optical flow.
Discriminative model
In this module, we have extracted many
key points informative for recognition in spite of the occurrences of noise.
Since many points are detected, a nonmaxima suppression method will be used to
select the relatively stable points of interest as a representation of the
current frame, which gives much better performance particularly for periodic
motion patterns.
Action Recognition
Up to now, we have obtained motion
feature description and view invariants of interest points. The remaining
problem is how to model the temporal information from sequential data. Unlike object
classification in static image, action recognition should take into account the
temporal dependence and paces of an action.
HARDWARE
REQUIREMENTS:
•
SYSTEM :
Pentium IV 2.4 GHz
•
HARD
DISK :
40 GB
•
FLOPPY
DRIVE : 1.44 MB
•
MONITOR : 15 VGA colour
•
MOUSE :
Logitech.
•
RAM : 256 MB
•
KEYBOARD : 110 keys enhanced.
SOFTWARE
REQUIREMENTS:
•
Operating system :- Windows XP
Professional
•
Front End :- Microsoft Visual Studio .Net 2008
•
Coding Language : - C#.NET 2008.
REFERENCE:
Kaiqi Huang, Yeying Zhang, and Tieniu
Tan, “A Discriminative Model of Motion and Cross Ratio for
View-Invariant Action Recognition”, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012.