Investigating feature extraction techniques for imbalanced time-series data

High-class data imbalance is usually present in many applications, such as fraud detection and cancer diagnosis, hence effective classification with time-series data is an essential topic of study. Furthermore, excessively imbalanced data presents a challenge, since most learners will be biased toward the majority group, and in extreme circumstances, will overlook the minority group completely. Over the previous two decades,fundamental methodologies have been used to study class imbalance in depth. Despite recent breakthroughs in addressing data imbalance with feature extraction and its growing popularity, there is relatively little empirical work in the domain of feature extraction with time-series-based class imbalance. Following record-breaking performance outcomes in various complicated domains, researchers are now looking into the usage of feature extraction approaches for issues with significant degrees of class imbalance. To better understand the effectiveness of feature extraction when applied to class-imbalanced data, available research on class imbalance, feature extraction, and fundamental approaches like SMOTE, Resampling, and others are examined. This study explores the specifics of each study's execution and experimental outcomes, as well as provides more insight into its advantages and limitations. We discovered that there is relatively limited study in this field. Several classic approaches for class imbalance, such as data sampling and SMOTE, work with feature extraction, but more sophisticated methods that take the use of minority class feature learning abilities have potential applications. The survey continues with a discussion that identifies numerous gaps in time-series data based on class-imbalanced data to improve future studies.
- Abdulhammed, R., Faezipour, M., Musafer, H., & Abuzneid, A. (2019). Efficient network intrusion detection using PCA-based dimensionality reduction of features. 2019 International Symposium on Networks, Computers and Communications, ISNCC 2019. https://doi.org/10.1109/ISNCC.2019.8909140
- Alhassan, Z., Budgen, D., Alshammari, R., Daghstani, T., McGough, A. S., & Al Moubayed, N. (2019). Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data. Proceedings -17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018, 541–546. https://doi.org/10.1109/ICMLA.2018.00087
- Bedi, P., Gupta, N., & Jindal, V. (2021). I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems. Applied Intelligence, 51(2), 1133–1151. https://doi.org/10.1007/S10489-020-01886-Y/FIGURES/14
- Braytee, A., Liu, W., & Kennedy, P. (2016). A cost-sensitive learning strategy for feature extraction from imbalanced data. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9949 LNCS, 78–86. https://doi.org/10.1007/978-3-319-46675-0_9
- Chang, C. D., Wang, C. C., & Jiang, B. C. (2012). Singular value decomposition based feature extraction technique for physiological signal analysis. Journal of Medical Systems, 36(3), 1769–1777. https://doi.org/10.1007/S10916-010-9636-3
- Chen, M. C., Chen, L. S., Hsu, C. C., & Zeng, W. R. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178(16), 3214–3227. https://doi.org/10.1016/J.INS.2008.03.018
- Fulcher, B. D., & Jones, N. S. (2017). hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Systems, 5(5), 527-531.e3. https://doi.org/10.1016/J.CELS.2017.10.001
- Hamed, M., Abidine, B., Fergani, B., & Oualkadi, A. El. (2015). News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information, 6(3), 505–521. https://doi.org/10.3390/INFO6030505
- He, G., Duan, Y., Peng, R., Jing, X., Qian, T., & Wang, L. (2015). Early classification on multivariate time series. Neurocomputing, 149(PB), 777–787. https://doi.org/10.1016/J.NEUCOM.2014.07.056
- Hossain, T., Mauni, H. Z., & Rab, R. (2022). Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning. COMPUTING AND INFORMATICS, 41(1), 98–115. https://doi.org/10.31577/CAI_2022_1_98
- Huang, C. Y., Dai, H. L., Huang, C. Y., & Dai, H. L. (2021). Learning from class-imbalanced data: review of data driven methods and algorithm driven methods. Data Science in Finance and Economics, 1(1), 21–36. https://doi.org/10.3934/DSFE.2021002
- Jian Cao, Zhi Li, Jian Li. (2019). Financial time series forecasting model based on CEEMDAN and LSTM. Physica A: Statistical Mechanics and its Applications, 519, 127–139. https://doi.org/10.1016/j.physa.2018.11.061
- Joo, Y., & Jeong, J. (2019). Under Sampling Adaboosting Shapelet Transformation for Time Series Feature Extraction. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11624 LNCS, 69–80. https://doi.org/10.1007/978-3-030-24311-1_5
- Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/S13748-016-0094-0
- Leevy, J. L., Khoshgoftaar, T. M., Bauder, R. A., & Seliya, N. (2018). A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1), 1–30. https://doi.org/10.1186/S40537-018-0151-6
- Lee, Y. O., & Kim, Y. J. (2020). The Effect of Resampling on Data-imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning. Molecular Informatics, 39(8), 1900131. https://doi.org/10.1002/MINF.201900131
- Li, L., Wu, Y., Ou, Y., Li, Q., Zhou, Y., & Chen, D. (2018). Research on machine learning algorithms and feature extraction for time series. IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 2017-October, 1–5. https://doi.org/10.1109/PIMRC.2017.8292668
- Liu, J., Chen, X. X., Fang, L., Li, J. X., Yang, T., Zhan, Q., Tong, K., & Fang, Z. (2018). Mortality prediction based on imbalanced high-dimensional ICU big data. Computers in Industry, 98, 218–225. https://doi.org/10.1016/J.COMPIND.2018.01.017
- Maldonado, S., Vairetti, C., Fernandez, A., & Herrera, F. (2022). FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification. Pattern Recognition, 124, 108511. https://doi.org/10.1016/J.PATCOG.2021.108511
- Malhotra, R., & Jain, J. (2022). Predicting defects in imbalanced data using resampling methods: an empirical investigation. PeerJ. Computer Science, 8. https://doi.org/10.7717/PEERJ-CS.573
- Manogaran, G., Shakeel, P. M., Hassanein, A. S., Malarvizhi Kumar, P., & Chandra Babu, G. (2019). Machine Learning Approach-Based Gamma Distribution for Brain Tumor Detection and Data Sample Imbalance Analysis. IEEE Access, 7, 12–19. https://doi.org/10.1109/ACCESS.2018.2878276
- Modarresi, K. (2015). Unsupervised Feature Extraction Using Singular Value Decomposition. Procedia Computer Science, 51(1), 2417–2425. https://doi.org/10.1016/J.PROCS.2015.05.424
- Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. 2020 11th International Conference on Information and Communication Systems, ICICS 2020, 243–248. https://doi.org/10.1109/ICICS49469.2020.239556
- Narwane, S. V., & Sawarkar, S. D. (2022). Comparative Analysis of Machine Learning Algorithms for Imbalance Data Set Using Principle Component Analysis. 103–115. https://doi.org/10.1007/978-981-16-9650-3_8
- Ning, Z., Ye, Z., Jiang, Z., & Zhang, D. (2022). BESS: Balanced evolutionary semi-stacking for disease detection using partially labeled imbalanced data. Information Sciences, 594, 233–248. https://doi.org/10.1016/J.INS.2022.02.026
- Pandey, S. K., & Janghel, R. R. (2019). Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE. Australasian Physical & Engineering Sciences in Medicine, 42(4), 1129–1139. https://doi.org/10.1007/S13246-019-00815-9
- Rathpisey, H., & Adji, T. B. (2019). Handling Imbalance Issue in Hate Speech Classification using Sampling-based Methods. Proceeding -2019 5th International Conference on Science in Information Technology: Embracing Industry 4.0: Towards Innovation in Cyber Physical System, ICSITech 2019, 193–198. https://doi.org/10.1109/ICSITECH46713.2019.8987500
- Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model. IEEE Access, 9, 78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638
- Satriaji, W., & Kusumaningrum, R. (2018). Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis. 2018 2nd International Conference on Informatics and Computational Sciences, ICICoS 2018, 99–103. https://doi.org/10.1109/ICICOS.2018.8621648
- Shahriar, M. S., Rahman, A., & McCulloch, J. (2014). Predicting shellfish farm closures using time series classification for aquaculture decision support. Computers and Electronics in Agriculture, 102, 85–97. https://doi.org/10.1016/J.COMPAG.2014.01.011
- Singh, S., & Yassine, A. (2018). Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting. Energies 2018, Vol. 11, Page 452, 11(2), 452. https://doi.org/10.3390/EN11020452
- Shih, S. Y., Sun, F. K., & Lee, H. yi. (2018). Temporal Pattern Attention for Multivariate Time Series Forecasting. Machine Learning, 108(8–9), 1421–1441. https://doi.org/10.1007/s10994-019-05815-0
- Wang, S., Duan, F., & Zhang, M. (2020). Convolution-GRU Based on Independent Component Analysis for fMRI Analysis with Small and Imbalanced Samples. Applied Sciences 2020, Vol. 10, Page 7465, 10(21), 7465. https://doi.org/10.3390/APP10217465
- Wong, M. L., Seng, K., & Wong, P. K. (2020). Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Systems with Applications, 141, 112918. https://doi.org/10.1016/J.ESWA.2019.112918
- Wu, J., Yao, L., & Liu, B. (2018). An overview on feature-based classification algorithms for multivariate time series. 2018 3rd IEEE International Conference on Cloud Computing and Big Data Analysis, ICCCBDA 2018, 32–38. https://doi.org/10.1109/ICCCBDA.2018.8386483
- Yang, J. Y., Hu, H. W., Liu, C. H., Chen, K. Y., Un, C. H., Huang, C. C., Chen, C. C., Lin, C. C. K., Chang, H., & Lin, H. M. (2021). Differencing Time Series as an Important Feature Extraction for Intradialytic Hypotension Prediction using Machine Learning. 3rd IEEE Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2021, 19–20. https://doi.org/10.1109/ECBIOS51820.2021.9510749
- Yang, W., Si, Y., Zhang, G., Wang, D., Sun, M., Fan, W., Liu, X., & Li, L. (2021). A novel method for automated congestive heart failure and coronary artery disease recognition using THC-Net. Information Sciences, 568, 427–447. https://doi.org/10.1016/J.INS.2021.04.036
- Zhang, T., Chen, J., Li, F., Zhang, K., Lv, H., He, S., & Xu, E. (2022). Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions. ISA Transactions, 119, 152–171. https://doi.org/10.1016/J.ISATRA.2021.02.042
- Zhang, Y., Chen, X., Guo, D., Song, M., Teng, Y., & Wang, X. (2019). PCCN: Parallel Cross Convolutional Neural Network for Abnormal Network Traffic Flows Detection in Multi-Class Imbalanced Network Traffic Flows. IEEE Access, 7, 119904–119916. https://doi.org/10.1109/ACCESS.2019.2933165
- Zhou, X., Hu, Y., Liang, W., Ma, J., & Jin, Q. (2021). Variational LSTM Enhanced Anomaly Detection for Industrial Big Data. IEEE Transactions on Industrial Informatics, 17(5), 3469–3477. https://doi.org/10.1109/TII.2020.3022432
- Zhao, X., Jia, M., & Lin, M. (2020). Deep Laplacian Auto-encoder and its application into imbalanced fault diagnosis of rotating machinery. Measurement, 152, 107320. https://doi.org/10.1016/J.MEASUREMENT.2019.107320.