Explore MINets: Redantant Exploring Multimedia Information Networks
ABSTRACT:

The abundance of multimedia data on the Web presents both challenges (how to annotate, search and mine?) and opportunities (crawling the Web to create large structured multimedia networked knowledge bases which can be used to conduct inference effectively). The proposed research has the ambitious aim of building a unified STRUCTURED networked database representation called Multimedia Information Networks (MINets). In MINets, concept nodes and multimedia data nodes will be connected together by the ontological and cross-domain links, covering both content and context knowledge. Characterizing a dense graph of relations among all concepts will provide a strong graph theoretical framework to study information fusion and inference strategies. Information fusion procedure will occur as a natural consequence of having a complete linkage representation among all kinds of nodes in the network.  In particular, we aspire to construct and utilize the MINets for recognition and inference tasks based on the complementary cross-domain links between multi-modal contents.  Moreover, the resultant MINets representation will also enable cross-domain correspondence for effective information transfer and inferences which would be otherwise much more difficult using traditional single-domain ontology structures, and therefore can unleash much richer knowledge resources for many fields.


SYSTEM ARCHITECTURE:




PROBLEM DEFINITION:

The goal of this approach is to annotate the images with some manually defined concepts, using visual and contextual features for learning a latent space. Specifically, by feeding the latent vectors into existing classification models, it can be applied to multimedia annotation, which is one of the most important problems in multimedia retrieval. Furthermore, we show a more sophisticated algorithm which can directly incorporate the discriminant information in training example for multimedia annotation without using mapping as a prestep. It jointly explores the context and content information based on a latent structure in the semantic concept space.

EXISTING SYSTEM:

The general approach of learning latent semantic space has been extensively studied in the field of information retrieval. Popular techniques include Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI), and Latent Dirichlet Allocation (LDA). These algorithms have also been applied to multimedia domain for problems such as indexing and retrieval. For example, learn latent feature vectors by LSI for natural scene images, and the learned features can be used effectively with general purpose SVM classifiers.

DISADVANTAGES OF EXISTING SYSTEM:
Some preliminary results have shown the effectiveness of these algorithms; however, all these methods suffer from the problem with sparse context links, which we solve with the use of content links.


PROPOSED SYSTEM:
In this paper, we show that a compact latent space can be discovered to summarize the semantic structure in MINs, which can be seamlessly applied in the state-of-the-art multimedia information retrieval systems. Specifically, this algorithm maps each MO into a latent feature vector that encodes the information in both context and content information. Based on these latent feature vectors, MOs can be effectively classified, indexed, and retrieved in a vector space by many mature off-the shelf vector-based multimedia retrieval methods, like clustering, Re-Ranking for multimedia retrieval.

ADVANTAGES OF PROPOSED SYSTEM:

Our approach is a “general purpose technique” which can be leveraged to improve the effectiveness of a wide variety of techniques.

MODULES:

ü Indexing and Training Module
ü Annotation Construction Module
ü Mapping of the multimedia objects
ü Exploring Context Links
ü Exploring Content Links
ü Ambiguity Module

MODULES DESCRIPTION:

Indexing and Training Module

First module of our project is Indexing and training module for the whole set of images. Indexing is done using an implementation of the Document Builder Interface. A simple approach is to use the Document Builder Factory, which creates Document Builder instances for all available features as well as popular combinations of features (e.g. all JPEG features or all avail-able features). In a content based image retrieval system, target images are sorted by feature similarities with respect to the query (CBIR). In this module, we index the images and training the images accordingly.

Annotation Construction Module

These are the virtual links which are created as a result of user feedback (e.g., tags), and may be represented as the linkages between the MOs and the contextual objects such as tags. In the real-world contextual links, the number of user tags attached to an MO is usually quite small. In some extreme cases, only a few or even no tag may be attached to an object, which often leads to sparse contextual links. In such cases, it is hard to derive meaningful latent features for MOs because the determination of the correlation structure in the latent space requires a sufficient number of such contextual objects to occur together. A reasonable solution to this problem is to exploit the content links between MOs. In this paper, we will show how the content links can effectively complement the sparse contextual links by incorporating acoustical and/or visual information to discover the underlying latent semantic space.

Mapping of the multimedia objects

In this paper, content links represent the content similarities between MOs, i.e., those visually and/or acoustically similar objects are assumed to have strong content links between them. Content links contain important knowledge complementary to that embedded in context links. However, to the best of our knowledge, the existing latent space methods, LSI, PLSI, and LDA, cannot seamlessly incorporate the content and context links in a unified framework. Some attempts have been made to jointly model content and context information to learn the latent space. They quantify the MOs into visual words, which are treated in a way similar to some COs by linking them to MOs. However, such approaches greatly increase the number of parameters in the latent space model, and make it more prone to quantization-induced noise and overfitting due to the sparse context links. In contrast, we will show that content and context links can be seamlessly modeled to learn the underlying latent space. The content information does not have to be quantified into some discrete elements such as visual words. Instead, the content link structure will be directly leveraged to discover latent features together with context links. Therefore, we propose an elegant mapping of MINs to the latent space which can support an emerging paradigm of multimedia retrieval which unifies the information in context and content links. In other words, the goal of this approach is to annotate the images with some manually defined concepts, using visual and contextual features for learning a latent space.

Exploring Context Links

With the development of Web 2.0 infrastructures, rich context links are often connected to MOs on the media-rich web sites such as Flickr, Youtube, and Facebook. In contrast to pure content information, these links provide extra semantic information to retrieve and index MOs in the Web environment. For a simple example, the images of “sea” and “sky” have similar color features which are difficult to distinguish by similarity in content feature space. However, by leveraging the user tags in their context links and mapping them into a new latent space by LSI, PLSI, and LDA, they can be distinguished with the semantics in their COs. Context-based Multimedia Retrieval (CxMR) approaches have been widely used in many practical multimedia search engines such as Google Images, which utilize the context links such as surrounding text and user tags. Although the information in the context links is useful in many cases, they are often sparse and noisy. In some cases, it can lead to questionable performance, when the context contains much more irrelevant information to the mining process. This is often evident from the Google Image results when the images do not match the corresponding search at all.


Exploring  Content Links

The CMR approach attempts to model high-level concepts from low-level concepts extracted from the MOs. In a typical multimedia retrieval system like QBIC and Virage, the query is formulated by some example MOs and/or textbased keywords. Then, the relevant MOs are retrieved based on their content features. The advantage of CMR is that it is an automatic retrieval approach. Once the concepts are modeled, no human labels are required to maintain it.

Ambiguity Module

In this module we propose our contribution of providing more accuracy to the proposed system by enhancing using ambiguity resolving problem. Ambiguity is, Middle vision is the stage in visual processing that combines all the basic features in the scene into distinct, recognizable object groups. This stage of vision comes before high-level vision (understanding the scene) and after early vision (determining the basic features of an image). When perceiving and recognizing images, mid-level vision comes into use when we need to classify the object we are seeing. Higher-level vision is used when the object classified must now be recognized as a specific member of its group. For example, through mid-level vision we perceive a face, then through high-level vision we recognize a face of a familiar person. Mid-level vision and high-level vision are crucial for understanding a reality that is filled with ambiguous perceptual inputs. Thus in this module we resolve the problem of ambiguity and enhance the accuracy and propose an efficient system.

HARDWARE REQUIREMENTS:

         System                 : Pentium IV 2.4 GHz.
         Hard Disk            : 40 GB.
         Floppy Drive       : 1.44 Mb.
         Monitor                : 15 VGA Colour.
         Mouse                  : Logitech.
         Ram                     : 512 Mb.

SOFTWARE REQUIREMENTS:

         Operating system           : - Windows XP.
         Coding Language :  ASP.NET, C#.Net.
         Data Base             :  SQL Server 2005

REFERENCE:
Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas S. Huang, “Exploring Context and Content Links in Social Media: A Latent Space Method”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012