Main Article Content

Abstract

This study examined graph clustering as a combinatorial optimization problem and evaluated the computational complexity of related algorithms. The main objective of this research was to analyze the relationship between multi-cluster objectives and modularity-based goals and to identify the limitations of modularity in graph clustering. This qualitative review study employed mathematical analyses and tests with various algorithms. The findings showed that graph clustering, due to its exponential search space, is an NP-hard problem, and modularity, despite its widespread use, suffers from a resolution limit and cannot accurately detect small-scale clusters. The results also highlighted the superiority of multi-cluster objectives in identifying more subtle structures. By confirming previous findings and offering new insights, this research provided a deeper understanding of optimization and graph clustering and suggested pathways to enhance existing algorithms.

Keywords

Clustering Optimization Graph Network Community detection

Article Details

How to Cite
Ahmadi, E. A., & Danish, B. (1403). Graph Clustering. Journal of Natural Sciences – Kabul University, 7(4), 343–370. https://doi.org/10.62810/jns.v7i4.84

References

  1. Abbe, E. (2018). Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177), 1-86. Retrieved from http://jmlr.org/papers/v18/16-480.html
  2. Ahn, K. J., Cormode, G., Guha, S., McGregor, A., & Wirth, A. (2015). Correlation clustering in data streams. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 37, 2237-2246. Retrieved from http://dl.acm.org/citation.cfm?id=3045118.3045356.
  3. Anava, Y., Avigdor-Elgrabli, N., & Gamzu, I. (2015). Improved theoretical and practical guarantees for chromatic correlation clustering. In Proceedings of the 24th International Conference on World Wide Web, WWW, 15, 55-65. https://doi.org/10.1145/2736277.2741629
  4. Andersen, R., Chung, F., & Lang, K. (2006). Local graph partitioning using PageRank Vectors. InProcedddings of the 47th Annual IEEE Symposium on Foundations of Computer Science. Retrieved from http://www.math.ucsd.edu/~fan/wp/localpartition.pdf.
  5. Arora, S., Rao, S., & Vaziani, U. (2009). Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2). Retrieved from https://dl.acm.org/doi/abs/10.1145/1502793.1502794
  6. Asteris, M., Kryillidis, A., Papailiopoulos, D., & Dimakis, A. G. (2016). Bipartite correlation clustering: Maximizing agreements. In Proceeding of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51. Retrieved from http://proceedings.mlr.press/v51/asteris16.html
  7. Bader, D. A., Meyerhenke, H., Sanders, P., & Wagner, D. (2013). Graph partitioning and graph clustering (Vol. 588). American Mathematical Socity.
  8. Bagon, S., & Galun, M. (2011). Large scale correlation clustering optimization. arXiv preprint. Retrieved from https://arxiv.org/abs/1112.2903
  9. Bansal, N., Blum, A., & Chawla, S. (2014). Correlation clustering. Machine Learining, 56, 89-113. Retrieved from https://link.springer.com/article/10.1023/b:mach.0000033116.57574.95
  10. Beier, T., Hamprecht, F. A., & Kappes, J. H. (2015). Fusion moves for correlation clustering. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, 3507-3516. Retrieved from http://openaccess.thecvf.com/content_cvpr_2015/html/Beier_Fusion_Moves_for_2015_CVPR_paper.html
  11. Beier, T., Kroeger, T., Kappes, J. H., Kthe, U., & Hamprecht, F. A. (2014). Cut, glue, amp; cut: A fast, approximate solver for multicut partitioning. In 2014 IEEE Conference on Computer Vision and Pattern Recongnition, 73-80. https://doi.org/10.1109/CVPR.2014.17
  12. Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Clustering gene expression patterns. Journal of computational biology, 281-297. https://doi.org/https://doi.org/10.1089/106652799318274
  13. Bhaskara, A., Daruki, S., & Venkatasubramanian, S. (2018). Sublinear algorithms for MAXCUT and correlation clustering. In Proceeding International Conference on Automata, Logic and Programming. Retrieved from https://arxiv.org/abs/1802.06992
  14. Bhattacharya, A., & De, R. K. (2008). Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics, 1359-1366. https://doi.org/10.1093/bioinformatics/btn133.
  15. Bocker, S., & Baumbach, J. (2013). Cluster editing. Springer Berlin Heidelberg, 33-44. Retrieved from https://link.springer.com/chapter/10.1007/978-3-642-39053-1_5
  16. Bolla, M. (2011). Penalized versions of the newman-girvan modularity and their relation to normalized cuts and k-means clustering. phys. https://doi.org/10.1103/PhysRevE.84.
  17. Bonchi, F., Gionis, A., Gullo, F., Tsourakakis, C. E., & Ukkonen, A. (2015). Chromatic correlation clustering. ACM Trans, 1-34. https://doi.org/10.1145/2728170.
  18. Bonomo, F., Duran, G., Napoli, A., & Valencia-Pabon, M. (2015b). Complexity of the cluster deletion problem on subclasses of chordal graphs. Theoretical Computer Science, 59-69. Retrieved from https://www.sciencedirect.com/science/article/pii/S0304397515005800
  19. Boonomo, F., Duran, G., Napoli, A., & Valencia-Pabon, M. (2015a). A one-to-one correspondence between potential solution of the cluster deletion problem and the minimum sum coloring problem, and its application to p4-sparse graphs. Information Processing Letters, 600-603. Retrieved from https://lipn.fr/~valenciapabon/papers/cluster-del-chordal-v7.pdf
  20. Buskirk, G., Fan, C., & Raichel, B. (2018). Metric violation distance: Hardness and approximation. In Proceeding of the 29th Annual ACM-SIAm Symposium on Discrete Algorithms, 196-209. Retrieved from https://epubs.siam.org/doi/abs/10.1137/1.9781611975031.14
  21. Charikar, M., Gupta, N., & Schwartz, R. (2017). Local guarantees in graph cuts and clustering. In International Conference on Integer Programming and Combinatorial Optimization, 136-147. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-59250-3_12
  22. Chawla, S., Makarychev, K., Schramm, T., & Yaroslavtsev, G. (2015). Near optimal lp rounding algorithm for correlationclustering on complete and complete k-partite graphs. In Proceedings of the 47th Annual ACM on symposium on Theory of COmputing, 219-228. Retrieved from https://dl.acm.org/doi/abs/10.1145/2746539.2746604
  23. Chierichetti, F., Dalvi, N., & Kumar, R. (2014). Correlation clustering in mapreduce. In Proceeding of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, 641-650. Retrieved from https://dl.acm.org/doi/abs/10.1145/2623330.2623743
  24. Christian, S. (2013). High Quality Graph Partitioning. PhD thesis, Karlsruhe Institute of Technology. Retrieved from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bb7389d0d8d4f8935e23b763c032eab20316b026
  25. Chung, F. (1997). Spectral Graph Theory. American Mathematical Society.
  26. Coleman, T., Saunderson, J., & Wirth, A. (2008). A local-seach 2-approximation for 2 correlation-clustering. Springer Berlin Heidelberg, 308-319. Retrieved from https://link.springer.com/chapter/10.1007/978-3-540-87744-8_26
  27. Damaschke, P. (2009). Bounded-Degree Techniques Accelerate Some Parameterized Graph Algorithms. Springer Berlin Heidelberg, 98-109. https://doi.org/10.1007/978-3-642-11269-0 8.
  28. Dau, P. L., Puleo, G., & Milenkovic, O. (2017). Motif clustering and overlapping clustering for social network analysis. In IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, 1-9. https://doi.org/10.1109/INFOCOM.2017.8056956.
  29. Delvenne, J., Yaliraki, S., & Barahona, M. (2010). Stability of graph communities across thime scales. Proceedings of the National Academy of Sciences, 12755-12760. Retrieved from https://www.pnas.org/doi/abs/10.1073/pnas.0903215107
  30. Demain, E., Dinh, T., LI, X., & Thai, M. (2015). Network clustering via maximizing modularity: Approximation algorithms and theoretical limits. In Proceeding of the 2015 IEEE International Xonference on Data Mining, 101-110. https://doi.org/10.1016/j.tcs.2006.05.008.
  31. Fan, C. (2009). A local graph partitioning algorithm using heat kernel pagerank. Algorithms and Models for the Web-Graph, 62-75. https://doi.org/10.1007/PL00012580.
  32. Fortuato, S., & Hric, D. (2016). Community dtection in networks: A user guide. Physics Reports. https://doi.org/10.1016/j.physrep.2009.11.002.
  33. Fukunaga, T. (2018). Lp-based pivoting algorithm for higher-order correlation clustering. Springer Ingernational Publishing. Retrieved from https://link.springer.com/article/10.1007/s10878-018-0354-y
  34. Gao, Y., Hare, D., & Nastos, J. (2013). The cluster deletion problem for cographs. Discrete Mathematics, 2763-2771. Retrieved from https://www.mdpi.com/2504-2289/7/2/70
  35. Gleich, D. F., Veldt, N., & Wirth, A. (2018). Correlation Clustering Generalized. In 29th International Symposium on Algorithms and Computatiion, 44. https://doi.org/10.4230/LIPIcs.ISAAC.2018.44.
  36. Hou, J. P., Emad, A., Puleo, G. J., Ma, J., & Milenkovic, O. (2016). A new correlation clustering method for cancer mutation analysis. Bioinformatics, 3717-3728. https://doi.org/10.1093/bioinformatics/btw546.
  37. Kim, S., Nowozin, S., Kohli, P., & Yoo, C. (2011). Higher-order correlation clustering for image segmentation. Advances in Neural Information Processing Systems, 24, 1530-1538. Retrieved from http://papers.nips.cc/paper/4406-higher-order-correlation-clustering-for-image-segmentation.pdf.
  38. Kloumann, I. M., Ugander, J., & Kleinberg, J. (2017). Block models and personalized pagerank. Proceedings of the National Academy of Sciences, 33-38. https://doi.org/10.1073/pnas.1611275114.
  39. Lange, J.-H., Karrenbauer, A., & Andres, B. (2018). Partial optimality and fast lower bounds for weighted correlation clustering. Proceedings of the 35th International Conference on Machine Learning, 2898-2907. Retrieved from proceedings.mlr.press/v80/lange18a.html.
  40. Lu, L. F. (2002). Connected components in random graphs with given expected degree sequences. Annuals of Combinatorics, 125-145. https://doi.org/10.1090/cbms/092.
  41. Neewman, M. (2016). Equivalence between modularity optimization and maximum likelihood methods for community detection. physics, 94-100. https://doi.org/10.1103/PhysRevE.94.052315.
  42. Newman, M. E. (2013). Community detection and graph partitioning. Europhysics Letters, 103-105. Retrieved from stacks.iop.org/0295-5075/103/i=2/a=28003.
  43. Pan, X., Papiliopoulos, D., Oymak, S., Recht, B., Ramchandran, K., & Jordan, M. (2015). Parallel correlation clustering on big graphs. Advances in Neural Information Processing systems, 82-90. Retrieved from papers.nips.cc/paper/5814-parallel-correlation-clustering-on-big-graphs.pdf.
  44. Puleo, G. J., & Milenkovic, O. (2018). Correlation clustering and biclustering with locally bounded errors. IEEE Transactions on Information Theory, 4105-4119. https://doi.org/10.1109/TIT.2018.2819696.
  45. Puleo, G., & Milenkovic, O. (2015). Correlation clustering with constrained cluster sizes and extended Weights bounds. SIAM Journal on Optimization, 1857-1872. https://doi.org/10.1137/140994198.
  46. Schulz, C., Bayer, S. K., Hess, C., Steiger, C., Teichmann, M., Jacob, J., . . . Hayrapetyan, S. (2016). Graph partitioning and graph clustering in theory and practice. Lecture notes at Institute for Theoretical Informatics Karlsruhe Institute of Technology. Retrieved from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=98ec106b2757cfcaa27673765b58f9a14a580e6e
  47. Sharma, P., & Singh, M. (2016). Multi chromatics balls with relaxed criterion to detect larger communities in social networks. Smart Trends in Information Technology and Computer Communication, 196-203. Retrieved from https://link.springer.com/chapter/10.1007/978-981-10-3433-6_24
  48. Swoboda, P., & Andres, B. (2017). A message passing algorithm for the minimum cost multicut problem. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 4990-4999. https://doi.org/10.1109/CVPR.2017.530.
  49. Veldt, N., Gleich, D., & Wirth, A. (2018). A correlation clustering framework for community detection. In Proceeding of the 2018 WWW Conference, 439-448. https://doi.org/10.1145/3178876.3186110.
  50. Veldt, N., Wirth, A. I., & Gleich, D. F. (2017). Correlation clustrering with low-rank matrices. In Proceedings og the 26th International Conference on World Wide Web, 1025-1034. https://doi.org/10.1145/3038912.3052586.
  51. Wang, D., Fountoulakis, K., Henzinger, M., Mahoney, M. W., & Rao, S. (2017). Capacity releasing diffusion for speed and locality. Proceedings of the 34th International Conference on Machine Learning, 3598-3607. Retrieved from proceedings.mlr.press/v70/wang17b.html.
  52. Wang, Y., Huang, H., Feng, C., & Liu, Z. (2015). Community detection based on minimum-cut partitioning. Web-Age Information Management, 57-69. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-21042-1_5
  53. Wirth, A., Veldt, N., Gleich, D., & Saunderson, J. (2018a). A projection method for metric-constrained optimization. arXiv preprint arXiv. Retrieved from https://arxiv.org/abs/1806.01678
  54. Yang, J., & Leskovec, J. (2015). Defining and evaluationg network communities based on ground-truth. Knowledge and Information Systems, 181-213. https://doi.org/10.1007/s10115-013-0693-z.
  55. Yu, L., & Ding, C. (2010). Network community discovery: Solving modularity clustering via normalized cut. In Proceeding of the Eighth Workshop on Mining and Learning with Graphs, 34-36. https://doi.org/10.1145/1830252.1830257.
  56. Zhou, H., Li, J., Zhang, F., & Cui, Y. (2017). A graph clustering method for commuinty detection in complex network. Physica A: Statistical Mechanics and its Applications, 551-562. https://doi.org/10.1016/j.physa.2016.11.015.