(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-6 Issue-26 September-2016
Full-Text PDF
DOI:10.19101/IJACR.2016.625012
Paper Title : Subspace clustering for high dimensional datasets
Author Name : G.N.V.G. Sirisha and M. Shashi
Abstract :

Clustering high dimensional data is a challenging problem because of the existence of many irrelevant and redundant attributes. Conventional clustering algorithms identify a global set of relevant attributes prior to clustering using attribute selection and feature extraction techniques. All the globally relevant attributes are used in the similarity calculation while clustering. These algorithms fail to identify true clusters that are present in a subset of attributes. So, subspace clustering has become the thrust area of research in the recent past. Subspace clustering detects the clusters that exist in subsets of dimensions. Different types of subspace clustering algorithms are proposed in the literature. This paper discusses the different types of subspace clustering algorithms with main emphasis on 2D subspace clustering. Availability of new and huge datasets like spatiotemporal datasets, temporal datasets, spatial datasets and genomic data has necessitated the development of 3D subspace clustering. This paper presents an overview of subspace clustering for the research community who is interested in subspace clustering.

Keywords : Subspace clustering, Curse of dimensionality, Density divergence, 3D subspace clustering.
Cite this article : G.N.V.G. Sirisha and M. Shashi, " Subspace clustering for high dimensional datasets " , International Journal of Advanced Computer Research (IJACR), Volume-6, Issue-26, September-2016 ,pp.177-184.DOI:10.19101/IJACR.2016.625012
References :
[1]Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011.
[Google Scholar]
[2]Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter. 2004; 6(1):90-105.
[Crossref] [Google Scholar]
[3]Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery. 2013; 26(2):332-97.
[Crossref] [Google Scholar]
[4]Sequeira K, Zaki M. SCHISM: a new approach to interesting subspace mining. International Journal of Business Intelligence and Data Mining. 2005; 1(2):137-60.
[Crossref] [Google Scholar]
[5]Dharmavaram VG, Mogalla S. A framework for context-aware semi supervised learning. Global Journal of Computer Science and Technology. 2014; 14(1):61-70.
[Google Scholar]
[6]Sirisha GNVG, Shashi M. Mining closed interesting subspaces to discover conducive living environment of migratory animals. In proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA) 2015 (pp. 153-66). Springer India.
[Crossref] [Google Scholar]
[7]Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. ACM. 1998; 27(2):94-105.
[Crossref] [Google Scholar]
[8]Agrawal R, Srikant R. Fast algorithms for mining association rules. In proceedings of 14th international conference on VLDB 1994 (pp. 487-99).
[Google Scholar]
[9]Cheng CH, Fu AW, Zhang Y. Entropy-based subspace clustering for mining numerical data. In proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining 1999 (pp. 84-93). ACM.
[Crossref] [Google Scholar]
[10]Goil S, Nagesh H, Choudhary A. MAFIA: efficient and scalable subspace clustering for very large data sets. In proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining 1999 (pp. 443-52). ACM.
[Google Scholar]
[11]Chu YH, Huang JW, Chuang KT, Yang DN, Chen MS. Density conscious subspace clustering for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering. 2010; 22(1):16-30.
[Crossref] [Google Scholar]
[12]Kailing K, Kriegel HP, Kröger P. Density-connected subspace clustering for high-dimensional data. In proceedings of 4th international conference on data mining SDM 2004 (pp. 246-56).
[Crossref] [Google Scholar]
[13]Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996 (pp. 226-31).
[Google Scholar]
[14]Assent I, Krieger R, Müller E, Seidl T. INSCY: Indexing subspace clusters with in-process-removal of redundancy. In eighth IEEE international conference on data mining 2008 (pp. 719-24). IEEE.
[Crossref] [Google Scholar]
[15]Müller E, Assent I, Günnemann S, Seidl T. Scalable density-based subspace clustering. In proceedings of the 20th ACM international conference on information and knowledge management 2011 (pp. 1077-86). ACM.
[Crossref] [Google Scholar]
[16]Kriegel HP, Kroger P, Renz M, Wurst S. A generic framework for efficient subspace clustering of high-dimensional data. In fifth IEEE international conference on data mining (ICDM05) 2005 (pp. 1-8). IEEE.
[Crossref] [Google Scholar]
[17]Assent I, Krieger R, Müller E, Seidl T. DUSC: dimensionality unbiased subspace clustering. In seventh IEEE international conference on data mining (ICDM 2007) 2007 (pp. 409-14). IEEE.
[Crossref] [Google Scholar]
[18]Achtert E, Böhm C, Kriegel HP, Kröger P, Müller-Gorman I, Zimek A. Detection and visualization of subspace cluster hierarchies. In international conference on database systems for advanced applications 2007 (pp. 152-63). Springer Berlin Heidelberg.
[Crossref] [Google Scholar]
[19]Zhao L, Zaki MJ. Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In proceedings of the international conference on management of data 2005 (pp. 694-705). ACM.
[Crossref] [Google Scholar]
[20]Sim K, Liu G, Gopalkrishnan V, Li J. A case study on financial ratios via cross-graph quasi-bicliques. Information Sciences. 2011;181(1):201-16.
[Crossref] [Google Scholar]
[21]Sim K, Yap GE, Hardoon DR, Gopalkrishnan V, Cong G, Lukman S. Centroid-based actionable 3D subspace clustering. IEEE Transactions on Knowledge and Data Engineering. 2013; 25(6):1213-26.
[Crossref] [Google Scholar]