Hybrid prediction model by integrating machine learning techniques with MLOps

Recent advancements in machine learning (ML) have sparked widespread interest in integrating DevOps capabilities into software and services within the information technology sector. This objective has compelled organizations to revise their development processes. We propose a ML operations model based on meta-ensembling algorithm for gradient boosting regressor with a case study of real estate price prediction. The train and test dataset is loaded with (1460,80) predictive variables, with the sale price as the target variable. The forecasting model is developed using an artificial neural network and a linear logistic regression model, such as LASSO, alongside with the Heroku tool for model deployment. The methodology addresses different steps of data pre-processing, and feature engineering, followed by feature selection, model building, evolution, creating, and calling application programming interfaces for deployment as IaaS, under research, development, and production environment phases. The model is built using the Anaconda Jupyter notebook with various Python libraries and Docker to ensure reproducibility and robustness. To ensure good business value, the performance of the proposed and implemented model is evaluated using different classification metrics, such as area under the curve-ROC for correct assessment measure, alongside accuracy metrics like mean squared error, root mean squared error, and R-squared. Our work serves as a useful reference for building and deploying ML pipeline platforms in practice.
- Baldominos, A., Blanco, I., Moreno, A.J., Iturrarte, R., Bernárdez, Ó., & Afonso, C. (2018). Identifying real estate opportunities using machine learning. *Applied Sciences*, 8(11), 2321. https://doi.org/10.3390/app8112321
- Bass, L., Weber, I., & Zhu, L. (2015). *DevOps: A Software Architect’s Perspective*. Addison-Wesley Professional, Boston.
- Baylor, D., Breck, E., Cheng, H.T., Fiedel, N., Foo, C.Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., & Koo, C.Y. (2017). Tfx: A TensorFlow-Based Production-Scale Machine Learning Platform. In: *Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, pp. 1387–1395.
- Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. *Mobile Networks and Applications*, 19, 171–209.
- Cheung, K.S., Yiu, C.Y., & Xiong, C. (2021). Housing market in the time of pandemic: A price gradient analysis from the COVID-19 epicentre in China. *Journal of Risk and Financial Management*, 14(3), 108.
- Duvall, P.M., Matyas, S., & Glover, A. (2007). *Continuous Integration: Improving Software Quality and Reducing Risk*. United Kingdom: Pearson Education.
- Fan, C., Cui, Z., & Zhong, X. (2018). House Prices Prediction with Machine Learning Algorithms. In: *Proceedings of the 2018 10th International Conference on Machine Learning and Computing*, February 26–28, 2018, Macau, China, pp. 6–10.
- Google Cloud. (2020). MLOps: Continuous Delivery and Automation Pipelines in Machine Learning. Available from: https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
- Humble, J., & Farley, D. (2010). *Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation*. Pearson Education, United Kingdom.
- Hummer, W., Muthusamy, V., Rausch, T., Dube, P., El Maghraoui, K., Murthi, A., & Oum, P. (2019). ModelOps: Cloud-Based Lifecycle Management for Reliable and Trusted AI. In: *2019 IEEE International Conference on Cloud Engineering (IC2E)*, June 24–27, 2016, Prague, Czech Republic, pp. 113–120.
- John, M.M., Olsson, H.H., & Bosch, J. (2021). Towards MLOps: A Framework and Maturity Model. In: *Proceedings of the 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)*, September 1–3, 2021, Palermo, Italy, pp. 334–341.
- Jui, J.J., Imran Molla, M.M., Bari, B.S., Rashid, M., & Hasan, M.J. (2020). Flat Price Prediction Using Linear and Random Forest Regression Based on Machine Learning Techniques. In: *Embracing Industry 4.0: Selected Articles from MUCET 2019*. Vol. 678. Springer, Singapore, pp. 205–217.
- Kang, J., Lee, H.J., Jeong, S.H., Lee, H.S., & Oh, K.J. (2020). Developing a forecasting model for real estate auction prices using artificial intelligence. *Sustainability*, 12(7), 2899. https://doi.org/10.3390/su12072899
- Kaynak, S., Ekinci, A., & Kaya, H.F. (2021). The effect of COVID-19 pandemic on residential real estate prices: Turkish case. *Quantitative Finance and Economics*, 5, 623–639. https://doi.org/10.3934/QFE.2021028
- Kumeno, F. (2019). Software engineering challenges for machine learning applications: A literature review. *Intelligent Decision Technologies*, 13(4), 463–476. https://doi.org/10.3233/IDT-190160
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. *Nature*, 521(7553), 436–444. https://doi.org/10.1038/nature14539
- Makinen, S., Skogstrom, H., Laaksonen, E., & Mikkonen, T. (2021). Who needs MLOps: What Data Scientists Seek to Accomplish and How can MLOps Help? In: *Proceedings of the 2021 IEEE/ACM 1st Workshop on AI Engineering—Software Engineering for AI (WAIN)*, May 30–31, 2021, Madrid, Spain.
- Marrero, L., & Astudillo, H. (2021). DevOps-RAF: An Assessment Framework to Measure DevOps Readiness in Software Organizations. *Proceedings of the 2021 40th International Conference of the Chilean Computer Science Society (SCCC)*, November 15–19, 2021, La Serena, Chile, pp. 1–8.
- Matsui, B., & Goya, D. (2020). Application of DevOps in the Improvement of Machine Learning Processes. Conference Paper.
- Mora-Garcia, R.T., Cespedes-Lopez, M.F., & Perez-Sanchez, V.R. (2022). Housing Price Prediction Using Machine Learning Algorithms in COVID-19 Times. *Land*, 11(11), 2100. https://doi.org/10.3390/land11112100
- Neloy, A.A., Haque, H.S., & Ul Islam, M.M. (2019). Ensemble Learning Based Rental Apartment Price Prediction Model by Categorical Features Factoring. In: *Proceedings of the 2019 11th International Conference on Machine Learning and Computing*, February 22–24, 2019, Zhuhai, China, pp. 350–356.
- Olorisade, B.K., Brereton, P., & Andras, P. (2017). Reproducibility in Machine Learning-Based Studies: An Example of Text Mining. *Reproducibility in Machine Learning*, Australia.
- Oxenstierna, J. (2017). Predicting House Prices Using Ensemble Learning with Cluster Aggregations. *Bachelor thesis*, Uppsala University, Sweden.
- Pai, P.F., & Wang, W.C. (2020). Using Machine Learning Models and Actual Transaction Data for Predicting Real Estate Prices. *Applied Sciences*, 10, 5832. https://doi.org/10.3390/app10175832
- Ruf, P., Madan, M., Reich, C., & Ould-Abdeslam, D. (2021). Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools. *Applied Sciences*, 11(19), 8861. https://doi.org/10.3390/app11198861
- Rzig, D.E., Hassan, F., & Kessentini, M. (2022). An Empirical Study on ML DevOps Adoption Trends, Efforts, and Benefits Analysis. *Information and Software Technology*, 152, 107037. https://doi.org/10.1016/j.infsof.2022.107037
- Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.F., & Dennison, D. (2015). Hidden Technical Debt in Machine Learning Systems. *Advances in Neural Information Processing Systems*, 28, 2503–2511.
- Subramanya, R., Sierla, S., & Vyatkin, V. (2022). From DevOps to MLOps: Overview and Application to Electricity Market Forecasting. *Applied Sciences*, 12(19), 9851. https://doi.org/10.3390/app12199851
- Wang, D., & Li, V.J. (2019). Mass Appraisal Models of Real Estate in the 21st Century: A Systematic Literature Review. *Sustainability*, 11, 7006. https://doi.org/10.3390/su11247006
- Wikipedia. (2020). Online Machine Learning. Available from: https://en.wikipedia.org/wiki/online_machine_learning