AccScience Publishing / IJOSI / Volume 7 / Issue 8 / DOI: 10.6977/IJoSI.202309_7(8).0003
ARTICLE

Investigating feature extraction techniques for imbalanced time-series data

Harshita Chaurasiya*1 Dr. Anand Kumar Pandey2
Show Less
1 Department of CSA,ITM University, Gwalior, MP, INDIA
2 Department of CSA, ITM University, Gwalior, MP, INDIA
Submitted: 28 December 2022 | Revised: 17 October 2023 | Accepted: 1 December 2023 | Published: 14 December 2023
© 2023 by the Author(s). Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

High-class data imbalance is usually present in many applications, such as fraud detection and cancer diagnosis, hence effective classification with time-series data is an essential topic of study. Furthermore, excessively imbalanced data presents a challenge, since most learners will be biased toward the majority group, and in extreme circumstances, will overlook the minority group completely. Over the previous two decades,fundamental methodologies have been used to study class imbalance in depth. Despite recent breakthroughs in addressing data imbalance with feature extraction and its growing popularity, there is relatively little empirical work in the domain of feature extraction with time-series-based class imbalance. Following record-breaking performance outcomes in various complicated domains, researchers are now looking into the usage of feature extraction approaches for issues with significant degrees of class imbalance. To better understand the effectiveness of feature extraction when applied to class-imbalanced data, available research on class imbalance, feature extraction, and fundamental approaches like SMOTE, Resampling, and others are examined. This study explores the specifics of each study's execution and experimental outcomes, as well as provides more insight into its advantages and limitations. We discovered that there is relatively limited study in this field. Several classic approaches for class imbalance, such as data sampling and SMOTE, work with feature extraction, but more sophisticated methods that take the use of minority class feature learning abilities have potential applications. The survey continues with a discussion that identifies numerous gaps in time-series data based on class-imbalanced data to improve future studies.

Keywords
Big Data
Time Series
Machine Learning
Feature Extraction.
References
  1. Abdulhammed, R., Faezipour, M., Musafer, H., & Abuzneid, A. (2019). Efficient network intrusion detection using PCA-based dimensionality reduction of features. 2019 International Symposium on Networks, Computers and Communications, ISNCC 2019. https://doi.org/10.1109/ISNCC.2019.8909140
  2. Alhassan, Z., Budgen, D., Alshammari, R., Daghstani, T., McGough, A. S., & Al Moubayed, N. (2019). Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data. Proceedings -17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018, 541–546. https://doi.org/10.1109/ICMLA.2018.00087
  3. Bedi, P., Gupta, N., & Jindal, V. (2021). I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems. Applied Intelligence, 51(2), 1133–1151. https://doi.org/10.1007/S10489-020-01886-Y/FIGURES/14
  4. Braytee, A., Liu, W., & Kennedy, P. (2016). A cost-sensitive learning strategy for feature extraction from imbalanced data. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9949 LNCS, 78–86. https://doi.org/10.1007/978-3-319-46675-0_9
  5. Chang, C. D., Wang, C. C., & Jiang, B. C. (2012). Singular value decomposition based feature extraction technique for physiological signal analysis. Journal of Medical Systems, 36(3), 1769–1777. https://doi.org/10.1007/S10916-010-9636-3
  6. Chen, M. C., Chen, L. S., Hsu, C. C., & Zeng, W. R. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178(16), 3214–3227. https://doi.org/10.1016/J.INS.2008.03.018
  7. Fulcher, B. D., & Jones, N. S. (2017). hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Systems, 5(5), 527-531.e3. https://doi.org/10.1016/J.CELS.2017.10.001
  8. Hamed, M., Abidine, B., Fergani, B., & Oualkadi, A. El. (2015). News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information, 6(3), 505–521. https://doi.org/10.3390/INFO6030505
  9. He, G., Duan, Y., Peng, R., Jing, X., Qian, T., & Wang, L. (2015). Early classification on multivariate time series. Neurocomputing, 149(PB), 777–787. https://doi.org/10.1016/J.NEUCOM.2014.07.056
  10. Hossain, T., Mauni, H. Z., & Rab, R. (2022). Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning. COMPUTING AND INFORMATICS, 41(1), 98–115. https://doi.org/10.31577/CAI_2022_1_98
  11. Huang, C. Y., Dai, H. L., Huang, C. Y., & Dai, H. L. (2021). Learning from class-imbalanced data: review of data driven methods and algorithm driven methods. Data Science in Finance and Economics, 1(1), 21–36. https://doi.org/10.3934/DSFE.2021002
  12. Jian Cao, Zhi Li, Jian Li. (2019). Financial time series forecasting model based on CEEMDAN and LSTM. Physica A: Statistical Mechanics and its Applications, 519, 127–139. https://doi.org/10.1016/j.physa.2018.11.061
  13. Joo, Y., & Jeong, J. (2019). Under Sampling Adaboosting Shapelet Transformation for Time Series Feature Extraction. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11624 LNCS, 69–80. https://doi.org/10.1007/978-3-030-24311-1_5
  14. Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/S13748-016-0094-0
  15. Leevy, J. L., Khoshgoftaar, T. M., Bauder, R. A., & Seliya, N. (2018). A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1), 1–30. https://doi.org/10.1186/S40537-018-0151-6
  16. Lee, Y. O., & Kim, Y. J. (2020). The Effect of Resampling on Data-imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning. Molecular Informatics, 39(8), 1900131. https://doi.org/10.1002/MINF.201900131
  17. Li, L., Wu, Y., Ou, Y., Li, Q., Zhou, Y., & Chen, D. (2018). Research on machine learning algorithms and feature extraction for time series. IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 2017-October, 1–5. https://doi.org/10.1109/PIMRC.2017.8292668
  18. Liu, J., Chen, X. X., Fang, L., Li, J. X., Yang, T., Zhan, Q., Tong, K., & Fang, Z. (2018). Mortality prediction based on imbalanced high-dimensional ICU big data. Computers in Industry, 98, 218–225. https://doi.org/10.1016/J.COMPIND.2018.01.017
  19. Maldonado, S., Vairetti, C., Fernandez, A., & Herrera, F. (2022). FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification. Pattern Recognition, 124, 108511. https://doi.org/10.1016/J.PATCOG.2021.108511
  20. Malhotra, R., & Jain, J. (2022). Predicting defects in imbalanced data using resampling methods: an empirical investigation. PeerJ. Computer Science, 8. https://doi.org/10.7717/PEERJ-CS.573
  21. Manogaran, G., Shakeel, P. M., Hassanein, A. S., Malarvizhi Kumar, P., & Chandra Babu, G. (2019). Machine Learning Approach-Based Gamma Distribution for Brain Tumor Detection and Data Sample Imbalance Analysis. IEEE Access, 7, 12–19. https://doi.org/10.1109/ACCESS.2018.2878276
  22. Modarresi, K. (2015). Unsupervised Feature Extraction Using Singular Value Decomposition. Procedia Computer Science, 51(1), 2417–2425. https://doi.org/10.1016/J.PROCS.2015.05.424
  23. Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. 2020 11th International Conference on Information and Communication Systems, ICICS 2020, 243–248. https://doi.org/10.1109/ICICS49469.2020.239556
  24. Narwane, S. V., & Sawarkar, S. D. (2022). Comparative Analysis of Machine Learning Algorithms for Imbalance Data Set Using Principle Component Analysis. 103–115. https://doi.org/10.1007/978-981-16-9650-3_8
  25. Ning, Z., Ye, Z., Jiang, Z., & Zhang, D. (2022). BESS: Balanced evolutionary semi-stacking for disease detection using partially labeled imbalanced data. Information Sciences, 594, 233–248. https://doi.org/10.1016/J.INS.2022.02.026
  26. Pandey, S. K., & Janghel, R. R. (2019). Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE. Australasian Physical & Engineering Sciences in Medicine, 42(4), 1129–1139. https://doi.org/10.1007/S13246-019-00815-9
  27. Rathpisey, H., & Adji, T. B. (2019). Handling Imbalance Issue in Hate Speech Classification using Sampling-based Methods. Proceeding -2019 5th International Conference on Science in Information Technology: Embracing Industry 4.0: Towards Innovation in Cyber Physical System, ICSITech 2019, 193–198. https://doi.org/10.1109/ICSITECH46713.2019.8987500
  28. Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model. IEEE Access, 9, 78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638
  29. Satriaji, W., & Kusumaningrum, R. (2018). Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis. 2018 2nd International Conference on Informatics and Computational Sciences, ICICoS 2018, 99–103. https://doi.org/10.1109/ICICOS.2018.8621648
  30. Shahriar, M. S., Rahman, A., & McCulloch, J. (2014). Predicting shellfish farm closures using time series classification for aquaculture decision support. Computers and Electronics in Agriculture, 102, 85–97. https://doi.org/10.1016/J.COMPAG.2014.01.011
  31. Singh, S., & Yassine, A. (2018). Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting. Energies 2018, Vol. 11, Page 452, 11(2), 452. https://doi.org/10.3390/EN11020452
  32. Shih, S. Y., Sun, F. K., & Lee, H. yi. (2018). Temporal Pattern Attention for Multivariate Time Series Forecasting. Machine Learning, 108(8–9), 1421–1441. https://doi.org/10.1007/s10994-019-05815-0
  33. Wang, S., Duan, F., & Zhang, M. (2020). Convolution-GRU Based on Independent Component Analysis for fMRI Analysis with Small and Imbalanced Samples. Applied Sciences 2020, Vol. 10, Page 7465, 10(21), 7465. https://doi.org/10.3390/APP10217465
  34. Wong, M. L., Seng, K., & Wong, P. K. (2020). Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Systems with Applications, 141, 112918. https://doi.org/10.1016/J.ESWA.2019.112918
  35. Wu, J., Yao, L., & Liu, B. (2018). An overview on feature-based classification algorithms for multivariate time series. 2018 3rd IEEE International Conference on Cloud Computing and Big Data Analysis, ICCCBDA 2018, 32–38. https://doi.org/10.1109/ICCCBDA.2018.8386483
  36. Yang, J. Y., Hu, H. W., Liu, C. H., Chen, K. Y., Un, C. H., Huang, C. C., Chen, C. C., Lin, C. C. K., Chang, H., & Lin, H. M. (2021). Differencing Time Series as an Important Feature Extraction for Intradialytic Hypotension Prediction using Machine Learning. 3rd IEEE Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2021, 19–20. https://doi.org/10.1109/ECBIOS51820.2021.9510749
  37. Yang, W., Si, Y., Zhang, G., Wang, D., Sun, M., Fan, W., Liu, X., & Li, L. (2021). A novel method for automated congestive heart failure and coronary artery disease recognition using THC-Net. Information Sciences, 568, 427–447. https://doi.org/10.1016/J.INS.2021.04.036
  38. Zhang, T., Chen, J., Li, F., Zhang, K., Lv, H., He, S., & Xu, E. (2022). Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions. ISA Transactions, 119, 152–171. https://doi.org/10.1016/J.ISATRA.2021.02.042
  39. Zhang, Y., Chen, X., Guo, D., Song, M., Teng, Y., & Wang, X. (2019). PCCN: Parallel Cross Convolutional Neural Network for Abnormal Network Traffic Flows Detection in Multi-Class  Imbalanced Network Traffic Flows. IEEE Access, 7, 119904–119916. https://doi.org/10.1109/ACCESS.2019.2933165
  40. Zhou, X., Hu, Y., Liang, W., Ma, J., & Jin, Q. (2021). Variational LSTM Enhanced Anomaly Detection for Industrial Big Data. IEEE Transactions on Industrial Informatics, 17(5), 3469–3477. https://doi.org/10.1109/TII.2020.3022432
  41. Zhao, X., Jia, M., & Lin, M. (2020). Deep Laplacian Auto-encoder and its application into imbalanced fault diagnosis of rotating machinery. Measurement, 152, 107320. https://doi.org/10.1016/J.MEASUREMENT.2019.107320. 
Share
Back to top
International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing