Quantcast
Channel: Number 7
Viewing all articles
Browse latest Browse all 16

Text Categorization using Distributional Features and Semantic Equivalence

$
0
0
{tag} {/tag}
International Journal of Computer Applications
© 2011 by IJCA Journal
Number 7 - Article 5
Year of Publication: 2011
Authors:
Tirupathaiah Kommi
Srikanth Jatla
10.5120/3653-5105
{bibtex}pxc3875105.bib{/bibtex}

Abstract

In text mining domain, text categorization is widely used which is nothing but assigning predefined categories to text. The process of assigning values to words based on the occurrences of words known as bag-of-word approach was used by previous researchers in order to find how frequently a word is used in the document. This approach has a drawback as it does not consider other features of words except the count of it. This paper throws light into assigning other values to a word known as distributional features. This approach is novel and the distributional features include the position of first occurrence of word and compactness of its appearances. Our experimental results revealed that text categorization has been improved with the help of distributional features and semantic equivalence. The research has thrown light into another fact that distributional features are very useful when writing style is casual and document is long. The semantic equivalence used to extend equivalence rough set approach.

Reference

  • L.D.Bakerand A.K.McCallum, Distributional Clustering of Words for Text Classification, Proc. ACM SIGIR ’98, pp. 96-103, 1998.
  • R. Bekkerman, R El-Yaniv, N. Tishb, and Y.Winter Distributional Word Clusters versus Words for Text Categorization, J. Machine Learning Research, vol. 3, pp. 1182-1208, 03.
  • J.P. Callan, Passage Retrieval Evidence in Document Retrieval, Proc. ACM SIGIR ’94, pp. 302-310, 1994.
  • M.F. Caropreso, S. Matwin, and F.Sebastiani, A Learner- Independent Evaluation of the Usefulness of Statistica Phrases for Automated Text Categorization,Text Databases and Document Management Theory and Practice, A.G. Chin, ed., pp. 78-102, Idea Group Publishing, 2001.
  • F.Debole and F.Sebastiani, Supervised Term Weighting for Automated Text Categorization, Proc. 18th ACM Symp. Applied Computing (SAC ’03), pp. 784-788, 2003.
  • S.T. Dumais, J.C. Platt, D. Heckerman, and M. Sahami, Inductive Learning Algorithms and Representations for Text Categorization, Proc. Seventh Int’l Conf. Information and Knowledge Management (CIKM ’98), pp. 148-155, 1998.
  • C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.
  • J. Kim and M.H. Kim, An Evaluation of Passage-Based Text Categorization, J. Intelligent Information Systems, vol. 23, no. 1, pp. 47-65, 2004.
  • M. Lan, S.Y.Sung, H.B. Low, and C.L. Tan, A Comparative Study on Term Weighting Schemes for Text Categorization, Proc. Int’l Joint Conf. Neural Networks (IJCNN ’05), pp.546-551, 2005.
  • K. Lang, Newsweeder: Learning to Filter Netnews Proc. 12th Int’l Conf. Machine Learning (ICML ’95), pp. 331-339, 1995.
  • E. Leopold and J. Kingermann, Text Categorization with Support Vector Machines: How to Represent Text in Input Space? Machine Learning, vol. 46, nos. 1-3, pp. 423-444, 2002.
  • R.E. Schapire and Y.Singer, Boostexter: A Boosting-Based System for Text Categorization, Machine Learning, vol. 39, nos. 2/3, pp.135-168, 2000.
  • F.Sebastiani, Machine Learning in Automated Text categorization, ACM Computing Surveys, vol. 34, no 1, pp. 1-47, 2002
  • S. Shankar and G.Karypis, A Feature Weight Adjustment Algorithm for Document Classification,Proc. SIGKDD’00 Workshop Text Mining, 2000.
  • P. Soucy and G.W. Mineau, Beyond tfidf Weighting for Text Categorization in the Vector Space Model, Proc.19thInt’l J Artificial Intelligence (IJCAI ’05), pp.1130-1135,2005
  • X.-B. Xue and Z.-H. Zhou, Distributional Features for Text Categorization, Proc.17th European Conf. Machine Learning (ICML ’06), pp. 497-508, 2006.
  • Y. Yang and J.O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proc. 14th Int’l Conf. Machine Learning (ICML ’97), pp. 412-420, 1997.

Viewing all articles
Browse latest Browse all 16

Trending Articles