Clustering
Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm
ABSTRACT:
In comparison with hard clustering
methods, in which a pattern belongs to a single cluster, fuzzy clustering
algorithms allow patterns to belong to all clusters with differing degrees of
membership. This is important in domains such as sentence clustering, since a
sentence is likely to be related to more than one theme or topic present within
a document or set of documents. However, because most sentence similarity
measures do not represent sentences in a common metric space, conventional
fuzzy clustering approaches based on prototypes or mixtures of Gaussians are
generally not applicable to sentence clustering. This paper presents a novel
fuzzy clustering algorithm that operates on relational input data; i.e., data
in the form of a square matrix of pairwise similarities between data objects.
The algorithm uses a graph representation of the data, and operates in an
Expectation-Maximization framework in which the graph centrality of an object
in the graph is interpreted as a likelihood. Results of applying the algorithm
to sentence clustering tasks demonstrate that the algorithm is capable of
identifying overlapping clusters of semantically related sentences, and that it
is therefore of potential use in a variety of text mining tasks. We also
include results of applying the algorithm to benchmark data sets in several other
domains.