An improved self-training model to detect fake news categories using multi-class classification of unlabeled data: fake news classifi-cation with unlabeled data

Oumaima Stitini^1*, Soulaimane Kaloun², Omar Bencharef², Sara Qassimi²

Show Less

¹ Computer and system engineering laboratory, ENS, Cadi Ayyad University, Marrakesh, 40000, Morocco

² Computer and system engineering laboratory, FSTG, Cadi Ayyad University, Marrakesh, 40000, Morocco

IJOSI 2024 , 8(1), 11–26; https://doi.org/10.6977/IJoSI.202403_8(1).0002

Submitted: 5 May 2023 | Revised: 18 December 2023 | Accepted: 23 December 2023 | Published: 22 February 2024

© 2024 by the Author(s). Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Abstract

In recent times, significant attention has been devoted to classifying news content in academic and industrial settings. Some studies have focused on distinguishing between fake and real news using labeled data and have achieved some success in detection. Digital misinformation or fake news content spreads throughonline social communities via shares, re-shares, and re-posts. Social media has faced several challenges in combating the distri-bution of fake news information. Social media platforms and blogs have become widely used daily sources of information due to their low cost and ease of access. However, this widespread use of social media for news con-sumption has led to the dissemination of fake news, creating a severe problem that adversely affects individuals and society. Consequently, identifying and addressing misinformation has become an essential and critical task. Detecting fake news is an emerging research area that has garnered considerable interest, but it also presents spe-cific challenges, mainly due to the limitations of available resources. In this paper, we focus on identifying and classifying different forms of fake news using unlabeled data, specifically exploring how to use unlabeled data for multi-class classification. The proposed approach categorizes fake news into four forms: satire or fake satirical information, manufacturing, manipulation, and propaganda. Our method employs a relevant approach based on multi-class classification using unlabeled data. The experimental evaluation demonstrates the efficiency of our suggested system.

Keywords

Multi-class classification

Unlabeled data

Semi-supervised learning

Self-training

Recommender sys-tem

Fake news

Imbalanced Learning

References

Kumar, S., West, R., & Leskovec, J. (2016). Disinfor-mation on the Web: Impact, Characteristics, andDetection of Wikipedia Hoaxes. Proceedings of the 25th International Conference on World Wide Web.
Qian, F., Gong, C., Sharma, K., & Liu, Y. (2018). Neu-ral User Response Generator: Fake News Detec-tion with Collective User Intelligence. https://doi.org/10.24963/ijcai.2018/533
Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2020). Trends in combating fake news on social media –a survey. Journal of Information and Tele-communication,1–20.https://doi.org/10.1080/24751839.2020.1847379
Vijayaraghavan, S. (2020, February 15). Fake News Detection with Different Models. arXiv.org. https://arxiv.org/abs/2003.04978
Li, Q., Zhang, Q., Si, L., & Liu, Y. (2019). Rumor de-tection on social media: datasets, methods,and op-portunities. https://doi.org/10.18653/v1/d19-5008
Alzanin, S. M., & Azmi, A. M. (2018). Detecting ru-mors in social media: A survey. Procedia ComputerScience,142,294–300. https://doi.org/10.1016/j.procs.2018.10.495
Wang, W. Y. (2017). “Liar, liar Pants on Fire”: a new benchmark dataset for fake news detection. https://doi.org/10.18653/v1/p17-2067
Wu, L., Rao, Y. J., Yu, H., Wang, Y., & Nazir, A. (2018). False information detection on social media via a hybrid deep model. In Lecture Notes in ComputerScience(pp.323–333). https://doi.org/10.1007/978-3-030-01159-8_31
Imran, M., Castillo, C. F., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency. ACM Computing Surveys, 47(4), 1–38. https://doi.org/10.1145/2771588
Oumaima, S., Soulaimane, K., & Omar, B. (2020). La-test Trends in Recommender Systems applied in themedicaldomain. https://doi.org/10.1145/3386723.3387860
Tanha, J. (2019). A multiclass boosting algorithm to labeled and unlabeled data. International Journal of Machine Learning and Cybernetics, 10(12), 3647–3665.https://doi.org/10.1007/s13042-019-00951-4
Stitini, O., Kaloun, S., & Bencharef, O. (2022). To-wards the detection of fake news on social net-works contributing to the improvement of trust and transparency in recommendation systems: Trends and challenges. Information, 13(3), 128. https://doi.org/10.3390/info13030128
Stitini, O., Kaloun, S., & Bencharef, O. (2022a). Inte-grating contextual information into multi-class classification to improve the context-aware recom-mendation. Procedia Computer Science, 198, 311–316.https://doi.org/10.1016/j.procs.2021.12.246
Silva-Palacios, D., Ferri, C., & Ramírez-Quintana, M.J. (2017). Improving performance of multiclass classification by inducing class hierarchies. Proce-dia Computer Science, 108, 1692–1701. https://doi.org/10.1016/j.procs.2017.05.218
Rasool, T., Butt, W. H., Shaukat, A., & Akram, M. U. (2019b). Multi-Label Fake News Detection using Multi-layeredSupervisedLearning. https://doi.org/10.1145/3313991.3314008
Jedrzejowicz, J., Kostrzewski, R., Neumann, J., & Zakrzewska, M. (2018). Imbalanced data classifi-cation using MapReduce and Relief. Journal of In-formation and Telecommunication. https://doi.org/10.1080/24751839.2018.1440454
Forestier, G., & Wemmert, C. (2016). Semi-supervised learning using multiple clusterings with limited la-beled data. Information Sciences, 361–362, 48–65. https://doi.org/10.1016/j.ins.2016.04.040
Karimi, H. R., Roy, P. C., Saba-Sadiya, S., & Tang, J. (2018). Multi-Source Multi-Class fake news detec-tion. In International Conference on Computatio-nal Linguistics (pp. 1546–1557). https://www.aclweb.org/anthology/C18-1131.pdf
Kaliyar, R. K., Goswami, A., & Narang, P. (2019). Multiclass Fake News Detection using Ensemble MachineLearning.https://doi.org/10.1109/iacc48062.2019.8971579
Li, J., & Zhu, Q. (2019). Semi-Supervised Self-Trai-ning method based on an Optimum-Path forest. IEEE Access, 7, 36388–36399. https://doi.org/10.1109/access.2019.2903839
Wu, D., Shang, M., Wang, G., & Li, L. (2018). Aself-training semi-supervised classification algorithm based on density peaks of data and differentialevolution. https://doi.org/10.1109/ic-nsc.2018.8361359
Martineau, M., Raveaux, R., Conte, D., & Venturini, G. (2020). Learning error-correcting graph mat-ching with a multiclass neural network. Pattern Re-cognitionLetters,134,68–76. https://doi.org/10.1016/j.patrec.2018.03.031
Yang, P., Zhao, P., Hai, Z., Liu, W., Hoi, S. C. H., & Li,X. (2016). Efficient multi-class selective sampling on graphs. In Uncertainty in Artificial Intelligence (pp.805–814).http://auai.org/uai2016/proce-edings/papers/34.pdf
Kaneko, T. (2019, February 4). Online multiclass clas-sification based on prediction margin for partial feedback.arXiv.org.https://ar-xiv.org/abs/1902.01056
Gertrudes, J. C., Zimek, A., Sander, J., & Campello, R. J. G. B. (2019). A unified view of density-based methods for semi-supervised clustering and classi-fication. Data Mining and Knowledge Discovery,33(6),1894–1952. https://doi.org/10.1007/s10618-019-00651-1
Larriva-Novo, X., Sánchez-Zas, C., Villagrá, V. A., Vega-Barbas, M., & Rivera, D. (2020). An ap-proach for the application of a dynamic Multi-Class classifier for network intrusion detection sys-tems. Electronics,9(11),1759. https://doi.org/10.3390/electronics9111759
Deepti Nikumbh, Anuradha Thakare (2023). A com-prehensive review of fake news detection on social media: feature engineering, feature fusion, and fu-ture research directions, International Journal of Systematic Innovation, 7(6), 36-53. DOI: 10.6977/IJoSI.202306_7(6).0004
Livieris, I. E., Kanavos, A., Tampakas, V., & Pintelas, P. E. (2018). An Auto-Adjustable Semi-Supervised Self-Training Algorithm. Algorithms, 11(9), 139. https://doi.org/10.3390/a11090139
Hyams, G. (2017, September 30). Improved training for Self-Training by Confidence assessments. ar-Xiv.org. https://arxiv.org/abs/1710.00209
Piroonsup, N., & Sinthupinyo, S. (2018). Analysis of training data using clustering to improve semi-su-pervised self-training. Knowledge Based Systems,143,65–80. https://doi.org/10.1016/j.knosys.2017.12.006
Xing, Y., Yu, G., Domeniconi, C., Wang, J., & Zhang, Z. (2018). Multi-Label Co-Training. https://doi.org/10.24963/ijcai.2018/400
Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K., & Wierstra, D. (2016). One-shot generalization in deep generative models. In International Conference on Machine Learning (pp. 1521–1529). http://jmlr.org/proceedings/pa-pers/v48/rezende16.pdf
Wang, L., Ding, Z., & Fu, Y. (2018). Adaptive Graph guided embedding for multi-label annotation. https://doi.org/10.24963/ijcai.2018/388
Nakano, F. K., Cerri, R., & Vens, C. (2020). Active learning for hierarchical multi-label classification. Data Mining and Knowledge Discovery, 34(5), 1496–1530. https://doi.org/10.1007/s10618-020-00704-w
FactChecking. https://hrashkin.github.io/fact-check.html Accessed:2021-02-24
GettingRealaboutFakeNews. https://www.ka-ggle.com/datasets/mrisdal/fake-news . Accessed: 2021-02-24
Stitini, O., Kaloun, S., &Bencharef, O.: Investigatingdifferentsimilarity metrics usedin various recom-mender systems types:scenario cases, Int. Arch.Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-4/W3-2022,187-193,https://isprs-archives.copernicus.org/articles/XLVII-4-W3-2022/187/2022/

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At AccScience Publishing, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

8 Burn Road#15-03 Trivex Singapore 369977

editorial@accscience.com