A Discriminative Model of Motion and Cross Ratio for View-Invariant Action Recognition

ABSTRACT:
Action recognition is very important for many applications such as video surveillance, human–computer interaction, and so on; view-invariant action recognition is hot and difficult as well in this field. In this paper, a new discriminative model is proposed for video-based view-invariant action recognition. In the discriminative model, motion pattern and view invariants are perfectly fused together to make a better combination of invariance and distinctiveness. We address a series of issues, including interest point detection in image sequence, motion feature extraction and description, and view-invariant calculation. First, motion detection is used to extract motion information from videos, which is much more efficient than traditional background modeling and tracking based methods. Second, as for feature representation, we exact variety of statistical information from motion and view-invariant feature based on cross ratio. Last, in the action modeling, we apply a discriminative probabilistic model-hidden conditional random field to model motion patterns and view invariants, by which we could fuse the statistics of motion and projective invariability of cross ratio in one framework. Experimental results demonstrate that our method can improve the ability to distinguish different categories of actions with high robustness to view change in real circumstances.



EXISTING SYSTEM:
Human action recognition in video sequences is one of the important and challenging problems in computer vision, which aims to build the mapping between dynamic image information and semantic understanding. While the analysis of human action tries to discover the underlying patterns of human action in image data, it is also much useful in many real applications such as intelligent video surveillance, content-based image retrieval, event detection, and so on. Human action recognition involves a series of problems such as image data acquisition, robust feature extraction and representation, training classifier with high discriminative capability, and other application problems that may come out in practical system running.

DISADVANTAGES OF EXISTING SYSTEM:
In previous work of view-invariant action recognition, it is difficult to obtain very accurate trajectories of moving objects because of noise caused by self-occlusions

Appearance based methods such as scale-invariant feature transform (SIFT) are not quite suitable for motion analysis since those appearance-based methods, such as color, gray level, and texture, are not stable among neighboring frames of dynamic scenes.

Existing methods of background modeling, such as Gaussian mixture model, suffer from low effectiveness required for accurate human behavior analysis. For example, some traditional methods do not work well with the existence of shadows, light changes, and, particularly, view changes in real scenes.

PROPOSED SYTEM:
A compact framework is proposed for view-invariant action recognition from the perspective of motion information in image sequences, in which we could properly encapsulate motion pattern and view invariants in such a model that results in a complementary fusion of two aspects of characteristics of human actions.

In this paper, we will describe our method in three phases. In the first stage, we will introduce a motion detection method to detect interest points in space and time domain, which could facilitate optical flow extraction not only in an expected local area but also for view invariants–cross ratio from those detected interest points. In the second stage, the oriented optical flow feature is described to be represented by oriented histogram projection to capture statistical motion information, and in the third stage, the optical flow and view-invariant features are fused together in a discriminative model.

ADVANTAGES OF PROPOSED SYSTEM:
The proposed framework of action recognition has shown good integration and achieved expected performance on some challenging databases.





MODULES:
1. Optical flow in local area
2. View Invariants extraction
3. Optical flow feature expression
4. Discriminative model
5. Action Recognition

MODULE DESCRIPTION:
Optical flow in local area
Since appearance-based features such as Harris, histogram of oriented gradient (HOG), SIFT, Gabor, and shape highly depend on the stability of image processing, they fail to accurately recognize different kinds of actions because of the non-rigidity nature of the human body or some other impacts in real applications. Therefore, in our method, after detection of interest points in videos, we extract motion features from the neighboring area around the interest points and build the representation of the statistical properties of the local area of the image.

View Invariants extraction
Geometric invariants capture invariant information of a geometric configuration under a certain class of transformations. Group theory gives us theoretical foundation for constructing invariants [34]. Since they could be measured directly from images without knowing the orientation and position of the camera, they have been widely used for object recognition to tackle the problem of projective distortion caused by viewpoint variations.

In view-invariant action recognition, traditional-model-based methods evaluate the fitness between image points and the predefined 3-D models. However, it is difficult to detect qualified image points that satisfy the specific geometric configuration required to get the desired invariants.

Optical flow feature expression
Optical flow takes the form of 2-D vector representing image pixel velocity in the – and -directions. The beginning and ending points of the optical flow vector correspond to displacement of image pixels. There are mainly two kinds of methods to extract optical flow from images. The first one is a feature-based method, which calculates the matching score of the feature points between neighboring frames and takes the displacements of the matched points as the start and endpoints of the optical flow vector. However, due to the instability of the image edges and large displacement of moving human body, the calculated optical flow could hardly exhibit the real movement of human body. The second one is gradient-based methods, which are widely used in computer vision tasks. Gradient-based methods assume that the gray level in a local area of the images is relatively stable between adjacent frames. More importantly, by calculating the image gradient and optimizing the cost function in an iterative way, we can give a dense field of optical flow.


Discriminative model
In this module, we have extracted many key points informative for recognition in spite of the occurrences of noise. Since many points are detected, a nonmaxima suppression method will be used to select the relatively stable points of interest as a representation of the current frame, which gives much better performance particularly for periodic motion patterns.

Action Recognition
Up to now, we have obtained motion feature description and view invariants of interest points. The remaining problem is how to model the temporal information from sequential data. Unlike object classification in static image, action recognition should take into account the temporal dependence and paces of an action.

HARDWARE REQUIREMENTS:

                     SYSTEM                      : Pentium IV 2.4 GHz
                     HARD DISK                : 40 GB
                     FLOPPY DRIVE         : 1.44 MB
                     MONITOR                   : 15 VGA colour
                     MOUSE                        : Logitech.
                     RAM                             : 256 MB
                     KEYBOARD               : 110 keys enhanced.
SOFTWARE REQUIREMENTS:
                     Operating system         :-  Windows XP Professional
                     Front End                     :-  Microsoft Visual Studio .Net 2008
                     Coding Language         : - C#.NET 2008.
REFERENCE:
Kaiqi Huang, Yeying Zhang, and Tieniu Tan, “A Discriminative Model of Motion and Cross Ratio for View-Invariant Action Recognition”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012.