AccScience Publishing / IJOSI / Volume 8 / Issue 1 / DOI: 10.6977/IJoSI.202403_8(1).0002
ARTICLE

An improved self-training model to detect fake news categories using multi-class classification of unlabeled data: fake news classifi-cation with unlabeled data

Oumaima Stitini1* Soulaimane Kaloun2 Omar Bencharef2 Sara Qassimi2
Show Less
1 Computer and system engineering laboratory, ENS, Cadi Ayyad University, Marrakesh, 40000, Morocco
2 Computer and system engineering laboratory, FSTG, Cadi Ayyad University, Marrakesh, 40000, Morocco
Submitted: 5 May 2023 | Revised: 18 December 2023 | Accepted: 23 December 2023 | Published: 22 February 2024
© 2024 by the Author(s). Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

In recent times, significant attention has been devoted to classifying news content in academic and industrial settings. Some studies have focused on distinguishing between fake and real news using labeled data and have achieved some success in detection. Digital misinformation or fake news content spreads throughonline social communities via shares, re-shares, and re-posts. Social media has faced several challenges in combating the distri-bution of fake news information. Social media platforms and blogs have become widely used daily sources of information due to their low cost and ease of access. However, this widespread use of social media for news con-sumption has led to the dissemination of fake news, creating a severe problem that adversely affects individuals and society. Consequently, identifying and addressing misinformation has become an essential and critical task. Detecting fake news is an emerging research area that has garnered considerable interest, but it also presents spe-cific challenges, mainly due to the limitations of available resources. In this paper, we focus on identifying and classifying different forms of fake news using unlabeled data, specifically exploring how to use unlabeled data for multi-class classification. The proposed approach categorizes fake news into four forms: satire or fake satirical information, manufacturing, manipulation, and propaganda. Our method employs a relevant approach based on multi-class classification using unlabeled data. The experimental evaluation demonstrates the efficiency of our suggested system.

Keywords
Multi-class classification
Unlabeled data
Semi-supervised learning
Self-training
Recommender sys-tem
Fake news
Imbalanced Learning
References
  1. Kumar, S., West, R., & Leskovec, J. (2016). Disinfor-mation on the Web: Impact, Characteristics, andDetection of Wikipedia Hoaxes. Proceedings of the 25th International Conference on World Wide Web.
  2. Qian, F., Gong, C., Sharma, K., & Liu, Y. (2018). Neu-ral User Response Generator: Fake News Detec-tion with Collective User Intelligence. https://doi.org/10.24963/ijcai.2018/533
  3. Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2020). Trends in combating fake news on social media –a survey. Journal of Information and Tele-communication,1–20.https://doi.org/10.1080/24751839.2020.1847379
  4. Vijayaraghavan, S. (2020, February 15). Fake News Detection with Different Models. arXiv.org. https://arxiv.org/abs/2003.04978
  5. Li, Q., Zhang, Q., Si, L., & Liu, Y. (2019). Rumor de-tection on social media: datasets, methods,and op-portunities. https://doi.org/10.18653/v1/d19-5008
  6. Alzanin, S. M., & Azmi, A. M. (2018). Detecting ru-mors in social media: A survey. Procedia ComputerScience,142,294–300. https://doi.org/10.1016/j.procs.2018.10.495
  7. Wang, W. Y. (2017). “Liar, liar Pants on Fire”: a new benchmark dataset for fake news detection. https://doi.org/10.18653/v1/p17-2067
  8. Wu, L., Rao, Y. J., Yu, H., Wang, Y., & Nazir, A. (2018). False information detection on social media via a hybrid deep model. In Lecture Notes in ComputerScience(pp.323–333). https://doi.org/10.1007/978-3-030-01159-8_31
  9. Imran, M., Castillo, C. F., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency. ACM Computing Surveys, 47(4), 1–38. https://doi.org/10.1145/2771588
  10. Oumaima, S., Soulaimane, K., & Omar, B. (2020). La-test Trends in Recommender Systems applied in themedicaldomain. https://doi.org/10.1145/3386723.3387860
  11. Tanha, J. (2019). A multiclass boosting algorithm to labeled and unlabeled data. International Journal of Machine Learning and Cybernetics, 10(12), 3647–3665.https://doi.org/10.1007/s13042-019-00951-4
  12. Stitini, O., Kaloun, S., & Bencharef, O. (2022). To-wards the detection of fake news on social net-works contributing to the improvement of trust and transparency in recommendation systems: Trends and challenges. Information, 13(3), 128. https://doi.org/10.3390/info13030128
  13. Stitini, O., Kaloun, S., & Bencharef, O. (2022a). Inte-grating contextual information into multi-class classification to improve the context-aware recom-mendation. Procedia Computer Science, 198, 311–316.https://doi.org/10.1016/j.procs.2021.12.246
  14. Silva-Palacios, D., Ferri, C., & Ramírez-Quintana, M.J. (2017). Improving performance of multiclass classification by inducing class hierarchies. Proce-dia Computer Science, 108, 1692–1701. https://doi.org/10.1016/j.procs.2017.05.218
  15. Rasool, T., Butt, W. H., Shaukat, A., & Akram, M. U. (2019b). Multi-Label Fake News Detection using Multi-layeredSupervisedLearning. https://doi.org/10.1145/3313991.3314008
  16. Jedrzejowicz, J., Kostrzewski, R., Neumann, J., & Zakrzewska, M. (2018). Imbalanced data classifi-cation using MapReduce and Relief. Journal of In-formation and Telecommunication. https://doi.org/10.1080/24751839.2018.1440454
  17. Forestier, G., & Wemmert, C. (2016). Semi-supervised learning using multiple clusterings with limited la-beled data. Information Sciences, 361–362, 48–65. https://doi.org/10.1016/j.ins.2016.04.040
  18. Karimi, H. R., Roy, P. C., Saba-Sadiya, S., & Tang, J. (2018). Multi-Source Multi-Class fake news detec-tion. In International Conference on Computatio-nal Linguistics (pp. 1546–1557). https://www.aclweb.org/anthology/C18-1131.pdf
  19. Kaliyar, R. K., Goswami, A., & Narang, P. (2019). Multiclass Fake News Detection using Ensemble MachineLearning.https://doi.org/10.1109/iacc48062.2019.8971579
  20. Li, J., & Zhu, Q. (2019). Semi-Supervised Self-Trai-ning method based on an Optimum-Path forest. IEEE Access, 7, 36388–36399. https://doi.org/10.1109/access.2019.2903839
  21. Wu, D., Shang, M., Wang, G., & Li, L. (2018). Aself-training semi-supervised classification algorithm based on density peaks of data and differentialevolution. https://doi.org/10.1109/ic-nsc.2018.8361359
  22. Martineau, M., Raveaux, R., Conte, D., & Venturini, G. (2020). Learning error-correcting graph mat-ching with a multiclass neural network. Pattern Re-cognitionLetters,134,68–76. https://doi.org/10.1016/j.patrec.2018.03.031
  23. Yang, P., Zhao, P., Hai, Z., Liu, W., Hoi, S. C. H., & Li,X. (2016). Efficient multi-class selective sampling on graphs. In Uncertainty in Artificial Intelligence (pp.805–814).http://auai.org/uai2016/proce-edings/papers/34.pdf
  24. Kaneko, T. (2019, February 4). Online multiclass clas-sification based on prediction margin for partial feedback.arXiv.org.https://ar-xiv.org/abs/1902.01056
  25. Gertrudes, J. C., Zimek, A., Sander, J., & Campello, R. J. G. B. (2019). A unified view of density-based methods for semi-supervised clustering and classi-fication. Data Mining and Knowledge Discovery,33(6),1894–1952. https://doi.org/10.1007/s10618-019-00651-1
  26. Larriva-Novo, X., Sánchez-Zas, C., Villagrá, V. A., Vega-Barbas, M., & Rivera, D. (2020). An ap-proach for the application of a dynamic Multi-Class classifier for network intrusion detection sys-tems. Electronics,9(11),1759. https://doi.org/10.3390/electronics9111759
  27. Deepti Nikumbh, Anuradha Thakare (2023). A com-prehensive review of fake news detection on social media: feature engineering, feature fusion, and fu-ture research directions, International Journal of Systematic Innovation, 7(6), 36-53. DOI: 10.6977/IJoSI.202306_7(6).0004
  28. Livieris, I. E., Kanavos, A., Tampakas, V., & Pintelas, P. E. (2018). An Auto-Adjustable Semi-Supervised Self-Training Algorithm. Algorithms, 11(9), 139. https://doi.org/10.3390/a11090139
  29. Hyams, G. (2017, September 30). Improved training for Self-Training by Confidence assessments. ar-Xiv.org. https://arxiv.org/abs/1710.00209
  30. Piroonsup, N., & Sinthupinyo, S. (2018). Analysis of training data using clustering to improve semi-su-pervised self-training. Knowledge Based Systems,143,65–80. https://doi.org/10.1016/j.knosys.2017.12.006
  31. Xing, Y., Yu, G., Domeniconi, C., Wang, J., & Zhang, Z. (2018). Multi-Label Co-Training. https://doi.org/10.24963/ijcai.2018/400
  32. Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K., & Wierstra, D. (2016). One-shot generalization in deep generative models. In International Conference on Machine Learning (pp. 1521–1529). http://jmlr.org/proceedings/pa-pers/v48/rezende16.pdf
  33. Wang, L., Ding, Z., & Fu, Y. (2018). Adaptive Graph guided embedding for multi-label annotation. https://doi.org/10.24963/ijcai.2018/388
  34. Nakano, F. K., Cerri, R., & Vens, C. (2020). Active learning for hierarchical multi-label classification. Data Mining and Knowledge Discovery, 34(5), 1496–1530. https://doi.org/10.1007/s10618-020-00704-w
  35. FactChecking. https://hrashkin.github.io/fact-check.html Accessed:2021-02-24
  36. GettingRealaboutFakeNews. https://www.ka-ggle.com/datasets/mrisdal/fake-news . Accessed: 2021-02-24
  37. Stitini, O., Kaloun, S., &Bencharef, O.: Investigatingdifferentsimilarity metrics usedin various recom-mender systems types:scenario cases, Int. Arch.Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-4/W3-2022,187-193,https://isprs-archives.copernicus.org/articles/XLVII-4-W3-2022/187/2022/
Share
Back to top
International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing