Measuring the accuracy of time series reduction methods based on modified dynamic time warping distance calculations

Representation of sensor data in the form of time series is a crucial aspect of numerous related tasks such as comparison, reduction, clustering, and classification. Time series representation methods included in most programming languages/integrated development environments support dimensionality reduction, data preprocessing, and feature extraction for time series data, as do several normalization techniques. This research study focused on 14 different methods of dimensionality reduction from the TSepr (R Studio) package on eight different time series, which are collections of sensor data of varying lengths. The similarity of the reduced time series and the original time series is compared using a modified version of dynamic time warping with time alignment measurement. These methods are further combined with the Gaussian kernel function to normalize the distance between variously aligned series. The results showed that perceptually important points (PIP) and piecewise linear approximation (PLA) were found as the best methods for TS reduction with a minimum deviation (error term) as low as 5 – 12%. The results also indicate that PIP performs significantly differently compared to seasonal decomposition, while there are no significant differences between PIP and the other methods (PLA, FEACLIPTREND, and FEACLIP). In addition, this research study demonstrated the development of an interactive web-based application in which time series are stored in csv files, and the distance between them is calculated through the chosen reduction method.
- Ali, M., Alqahtani, A., Jones, M.W., & Xie, X. (2019). Clustering and Classification for Time Series Data in Visual Analytics: A Survey. *IEEE Access*, 7, 181314–181338. https://doi.org/10.1109/ACCESS.2019.2958551
- Anand, A., Gawande, R., Jadhav, P., Shahapurkar, R., Devi, A., & Kumar, N. (2020). Intelligent Vehicle Speed Controlling and Pothole Detection System. *E3S Web of Conferences*, 170, 02010. https://doi.org/10.1051/e3sconf/202017002010
- Ashraf, M., Anowar, F., Setu, J.H., Chowdhury, A.I., Ahmed, E., Islam, A., & Al-Mamun, A. (2023). A Survey on Dimensionality Reduction Techniques for Time-Series Data. *IEEE Access*, 11, 42909–42923. https://doi.org/10.1109/ACCESS.2023.3269693
- Bairagi, V. (2018). EEG Signal Analysis for Early Diagnosis of Alzheimer Disease Using Spectral and Wavelet Based Features. *International Journal of Information Technology*, 10(3), 403–412. https://doi.org/10.1007/s41870-018-0165-5
- Biemann, D.C., & Masseglia, F. (n.d.). Time Series Clustering in the Field of Agronomy Cluster Analyse Agronomischer Zeitreihen. *Master-Thesis*, p70.
- Camerra, A., Palpanas, T., Shieh, J., & Keogh, E. (2010). iSAX 2.0: Indexing and Mining One Billion Time Series. In: *2010 IEEE International Conference on Data Mining*, p58–67. https://doi.org/10.1109/ICDM.2010.124
- De Oliveira Marques, E.S., Alves, K.S.T.R., Pekaslan, D., & De Aguiar, E.P. (2022). Kernel Evolving Participatory Fuzzy Modeling for Time Series Forecasting: New Perspectives Based on Distance Measures. In: *2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)*, p1–8.https://doi.org/10.1109/FUZZ-IEEE55066.2022.9882602
- Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless R and C++ Integration. *Journal of Statistical Software*, 40(8), 1–18. https://doi.org/10.18637/jss.v040.i08
- Giorgino, T. (2009). Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. *Journal of Statistical Software*, 31(7), 1–24. https://doi.org/10.18637/jss.v031.i07
- He, Z., Zhang, C., & Cheng, Y. (2023). Similarity Measurement and Classification of Temporal Data Based on Double Mean Representation. *Algorithms*, 16(7), 347. https://doi.org/10.3390/a16070347
- (n.d.). Available from: https://acmbulletin.fiit.stuba.sk/vol10num2/vol10num2.pdf
- Hussein, D., Nelson, L., & Bhat, G. (2024). Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices. *arXiv*. https://doi.org/10.48550/arXiv.2407.08715
- Ines Silva, M., & Henriques, R. (2020). Exploring Time-Series Motifs through DTW-SOM. In: *2020 International Joint Conference on Neural Networks (IJCNN)*, p1–8. https://doi.org/10.1109/IJCNN48605.2020.9207614
- Jiménez, P., Nogal, M., Caulfield, B., & Pilla, F. (2016). Perceptually Important Points of Mobility Patterns to Characterise Bike Sharing Systems: The Dublin Case. *Journal of Transport Geography*, 54, 228–239. https://doi.org/10.1016/j.jtrangeo.2016.06.010
- Juliusdottir, T. (2023). topr: An R Package for Viewing and Annotating Genetic Association Results. https://doi.org/10.21203/rs.3.rs-2499681/v1
- Laurinec, P. (2018). TSrepr R Package: Time Series Representations. *Journal of Open Source Software*, 3(23), 577. https://doi.org/10.21105/joss.00577
- Laurinec, P., & Lucka, M. (2016). Comparison of Representations of Time Series for Clustering Smart Meter Data. In: *Proceedings of the World Congress on Engineering and Computer Science (WCECS 2016)*, p6.
- Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In: *Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD ’03)*, p2. https://doi.org/10.1145/882082.882086
- Matsila, H., & Bokoro, P. (2018). Load Forecasting Using Statistical Time Series Model in a Medium Voltage Distribution Network. In: *IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society*, p4974–4979. https://doi.org/10.1109/IECON.2018.8592891
- Meng, J., Huo, X., He, C., & Zhu, C. (2024). Dimension Reduction of Multi-Source Time Series Sensor Data for Industrial Process. In: *2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE)*, p1–6. https://doi.org/10.1109/ISIE54533.2024.10595725
- Montero, P., & Vilar, J.A. (2014). TSclust: An R Package for Time Series Clustering. *Journal of Statistical Software*, 62(1), 1–43. https://doi.org/10.18637/jss.v062.i01
- Ngabesong, R., & McLauchlan, L. (2019). Implementing “R” Programming for Time Series Analysis and Forecasting of Electricity Demand for Texas, USA. In: *2019 IEEE Green Technologies Conference (GreenTech)*, p1–4. https://doi.org/10.1109/GreenTech.2019.8767131
- Salvador, S., & Chan, P. (n.d.). FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space.
- Sharma, S.K., Phan, H., & Lee, J. (2020). An Application Study on Road Surface Monitoring Using DTW Based Image Processing and Ultrasonic Sensors. *Applied Sciences*, 10(13), 4490. https://doi.org/10.3390/app10134490
- Singh, V., & Meena, N. (2009). Engine Fault Diagnosis Using DTW, MFCC, and FFT. In: U. S. Tiwary, T. J. Siddiqui, M. Radhakrishna, & M. D. Tiwari (Eds.), *Proceedings of the First International Conference on Intelligent Human Computer Interaction*. Springer, India, p83–94. https://doi.org/10.1007/978-81-8489-203-1_6
- Tanwar, H., & Kakkar, M. (2017). Performance Comparison and Future Estimation of Time Series Data Using Predictive Data Mining Techniques. In: *2017 International Conference on Data Management, Analytics and Innovation (ICDMAI)*, p9–12. https://doi.org/10.1109/ICDMAI.2017.8073477
- Tonle, F., Tonnang, H., Ndadji, M., Tchendji, M., Nzeukou, A., Senagi, K., & Niassy, S. (2024). Advancing Multivariate Time Series Similarity Assessment: An Integrated Computational Approach (Version 1). *arXiv*. https://doi.org/10.48550/ARXIV.2403.11044
- Wang, X., Ding, H., Trajcevski, G., Scheuermann, P., & Keogh, E. (2010). Experimental.comparison of representation methods and distance measures for time series data. arXiv:1012.2789 [Cs]. https://doi.org/10.48550/arXiv.1012.2789
- Wang, Y., Xu, Y., Yang, J., Chen, Z., Wu, M., Li, X., & Xie, L. (2023). SEnsor alignment for multivariate time-series unsupervised domain adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 10253–10261. https://doi.org/10.1609/aaai.v37i8.26221