An improved self-training model to detect fake news categories using multi-class classification of unlabeled data: fake news classifi-cation with unlabeled data

In recent times, significant attention has been devoted to classifying news content in academic and industrial settings. Some studies have focused on distinguishing between fake and real news using labeled data and have achieved some success in detection. Digital misinformation or fake news content spreads throughonline social communities via shares, re-shares, and re-posts. Social media has faced several challenges in combating the distri-bution of fake news information. Social media platforms and blogs have become widely used daily sources of information due to their low cost and ease of access. However, this widespread use of social media for news con-sumption has led to the dissemination of fake news, creating a severe problem that adversely affects individuals and society. Consequently, identifying and addressing misinformation has become an essential and critical task. Detecting fake news is an emerging research area that has garnered considerable interest, but it also presents spe-cific challenges, mainly due to the limitations of available resources. In this paper, we focus on identifying and classifying different forms of fake news using unlabeled data, specifically exploring how to use unlabeled data for multi-class classification. The proposed approach categorizes fake news into four forms: satire or fake satirical information, manufacturing, manipulation, and propaganda. Our method employs a relevant approach based on multi-class classification using unlabeled data. The experimental evaluation demonstrates the efficiency of our suggested system.
- Kumar, S., West, R., & Leskovec, J. (2016). Disinfor-mation on the Web: Impact, Characteristics, andDetection of Wikipedia Hoaxes. Proceedings of the 25th International Conference on World Wide Web.
- Qian, F., Gong, C., Sharma, K., & Liu, Y. (2018). Neu-ral User Response Generator: Fake News Detec-tion with Collective User Intelligence. https://doi.org/10.24963/ijcai.2018/533
- Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2020). Trends in combating fake news on social media –a survey. Journal of Information and Tele-communication,1–20.https://doi.org/10.1080/24751839.2020.1847379
- Vijayaraghavan, S. (2020, February 15). Fake News Detection with Different Models. arXiv.org. https://arxiv.org/abs/2003.04978
- Li, Q., Zhang, Q., Si, L., & Liu, Y. (2019). Rumor de-tection on social media: datasets, methods,and op-portunities. https://doi.org/10.18653/v1/d19-5008
- Alzanin, S. M., & Azmi, A. M. (2018). Detecting ru-mors in social media: A survey. Procedia ComputerScience,142,294–300. https://doi.org/10.1016/j.procs.2018.10.495
- Wang, W. Y. (2017). “Liar, liar Pants on Fire”: a new benchmark dataset for fake news detection. https://doi.org/10.18653/v1/p17-2067
- Wu, L., Rao, Y. J., Yu, H., Wang, Y., & Nazir, A. (2018). False information detection on social media via a hybrid deep model. In Lecture Notes in ComputerScience(pp.323–333). https://doi.org/10.1007/978-3-030-01159-8_31
- Imran, M., Castillo, C. F., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency. ACM Computing Surveys, 47(4), 1–38. https://doi.org/10.1145/2771588
- Oumaima, S., Soulaimane, K., & Omar, B. (2020). La-test Trends in Recommender Systems applied in themedicaldomain. https://doi.org/10.1145/3386723.3387860
- Tanha, J. (2019). A multiclass boosting algorithm to labeled and unlabeled data. International Journal of Machine Learning and Cybernetics, 10(12), 3647–3665.https://doi.org/10.1007/s13042-019-00951-4
- Stitini, O., Kaloun, S., & Bencharef, O. (2022). To-wards the detection of fake news on social net-works contributing to the improvement of trust and transparency in recommendation systems: Trends and challenges. Information, 13(3), 128. https://doi.org/10.3390/info13030128
- Stitini, O., Kaloun, S., & Bencharef, O. (2022a). Inte-grating contextual information into multi-class classification to improve the context-aware recom-mendation. Procedia Computer Science, 198, 311–316.https://doi.org/10.1016/j.procs.2021.12.246
- Silva-Palacios, D., Ferri, C., & Ramírez-Quintana, M.J. (2017). Improving performance of multiclass classification by inducing class hierarchies. Proce-dia Computer Science, 108, 1692–1701. https://doi.org/10.1016/j.procs.2017.05.218
- Rasool, T., Butt, W. H., Shaukat, A., & Akram, M. U. (2019b). Multi-Label Fake News Detection using Multi-layeredSupervisedLearning. https://doi.org/10.1145/3313991.3314008
- Jedrzejowicz, J., Kostrzewski, R., Neumann, J., & Zakrzewska, M. (2018). Imbalanced data classifi-cation using MapReduce and Relief. Journal of In-formation and Telecommunication. https://doi.org/10.1080/24751839.2018.1440454
- Forestier, G., & Wemmert, C. (2016). Semi-supervised learning using multiple clusterings with limited la-beled data. Information Sciences, 361–362, 48–65. https://doi.org/10.1016/j.ins.2016.04.040
- Karimi, H. R., Roy, P. C., Saba-Sadiya, S., & Tang, J. (2018). Multi-Source Multi-Class fake news detec-tion. In International Conference on Computatio-nal Linguistics (pp. 1546–1557). https://www.aclweb.org/anthology/C18-1131.pdf
- Kaliyar, R. K., Goswami, A., & Narang, P. (2019). Multiclass Fake News Detection using Ensemble MachineLearning.https://doi.org/10.1109/iacc48062.2019.8971579
- Li, J., & Zhu, Q. (2019). Semi-Supervised Self-Trai-ning method based on an Optimum-Path forest. IEEE Access, 7, 36388–36399. https://doi.org/10.1109/access.2019.2903839
- Wu, D., Shang, M., Wang, G., & Li, L. (2018). Aself-training semi-supervised classification algorithm based on density peaks of data and differentialevolution. https://doi.org/10.1109/ic-nsc.2018.8361359
- Martineau, M., Raveaux, R., Conte, D., & Venturini, G. (2020). Learning error-correcting graph mat-ching with a multiclass neural network. Pattern Re-cognitionLetters,134,68–76. https://doi.org/10.1016/j.patrec.2018.03.031
- Yang, P., Zhao, P., Hai, Z., Liu, W., Hoi, S. C. H., & Li,X. (2016). Efficient multi-class selective sampling on graphs. In Uncertainty in Artificial Intelligence (pp.805–814).http://auai.org/uai2016/proce-edings/papers/34.pdf
-
Kaneko, T. (2019, February 4). Online multiclass clas-sification based on prediction margin for partial feedback.arXiv.org.https://ar-xiv.org/abs/1902.01056
-
Gertrudes, J. C., Zimek, A., Sander, J., & Campello, R. J. G. B. (2019). A unified view of density-based methods for semi-supervised clustering and classi-fication. Data Mining and Knowledge Discovery,33(6),1894–1952. https://doi.org/10.1007/s10618-019-00651-1
-
Larriva-Novo, X., Sánchez-Zas, C., Villagrá, V. A., Vega-Barbas, M., & Rivera, D. (2020). An ap-proach for the application of a dynamic Multi-Class classifier for network intrusion detection sys-tems. Electronics,9(11),1759. https://doi.org/10.3390/electronics9111759
-
Deepti Nikumbh, Anuradha Thakare (2023). A com-prehensive review of fake news detection on social media: feature engineering, feature fusion, and fu-ture research directions, International Journal of Systematic Innovation, 7(6), 36-53. DOI: 10.6977/IJoSI.202306_7(6).0004
-
Livieris, I. E., Kanavos, A., Tampakas, V., & Pintelas, P. E. (2018). An Auto-Adjustable Semi-Supervised Self-Training Algorithm. Algorithms, 11(9), 139. https://doi.org/10.3390/a11090139
-
Hyams, G. (2017, September 30). Improved training for Self-Training by Confidence assessments. ar-Xiv.org. https://arxiv.org/abs/1710.00209
-
Piroonsup, N., & Sinthupinyo, S. (2018). Analysis of training data using clustering to improve semi-su-pervised self-training. Knowledge Based Systems,143,65–80. https://doi.org/10.1016/j.knosys.2017.12.006
-
Xing, Y., Yu, G., Domeniconi, C., Wang, J., & Zhang, Z. (2018). Multi-Label Co-Training. https://doi.org/10.24963/ijcai.2018/400
-
Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K., & Wierstra, D. (2016). One-shot generalization in deep generative models. In International Conference on Machine Learning (pp. 1521–1529). http://jmlr.org/proceedings/pa-pers/v48/rezende16.pdf
-
Wang, L., Ding, Z., & Fu, Y. (2018). Adaptive Graph guided embedding for multi-label annotation. https://doi.org/10.24963/ijcai.2018/388
-
Nakano, F. K., Cerri, R., & Vens, C. (2020). Active learning for hierarchical multi-label classification. Data Mining and Knowledge Discovery, 34(5), 1496–1530. https://doi.org/10.1007/s10618-020-00704-w
-
FactChecking. https://hrashkin.github.io/fact-check.html Accessed:2021-02-24
-
GettingRealaboutFakeNews. https://www.ka-ggle.com/datasets/mrisdal/fake-news . Accessed: 2021-02-24
-
Stitini, O., Kaloun, S., &Bencharef, O.: Investigatingdifferentsimilarity metrics usedin various recom-mender systems types:scenario cases, Int. Arch.Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-4/W3-2022,187-193,https://isprs-archives.copernicus.org/articles/XLVII-4-W3-2022/187/2022/