Analysis and Prediction of Relative Humidity Level using Generalized Linear Model
Authors
Adi Nugroho , Aditya Pramada Wicaksono , Achmad ChoiruddinDOI:
10.31289/jite.v7i1.9896Published:
2023-07-28Issue:
Vol. 7 No. 1 (2023): Issues July 2023Keywords:
generalized-linear-model, humidity, multicollinearity, regularizationDownloads
Abstract
The significance of humidity as a critical climate parameter impacts various sectors, including agriculture, health, and energy, necessitating a comprehensive understanding of its influencing factors. This study investigates the influence of climatic variables such as temperature, rainfall, sunshine duration, wind speed, and wind direction on the humidity levels in DKI Jakarta from 2019 to 2022. The objective is to develop a time-independent predictive model for humidity based on historical climate data. The methodology includes data pre-processing to impute missing values and replace outliers, followed by exploratory data analysis to ascertain variable distribution and inter-relationships. A regression model was initially employed for analysis, with subsequent application of regularization via a generalized linear model to enhance prediction accuracy. Results indicate that temperature, rainfall, sunshine duration, and wind direction significantly impact humidity levels in the investigated period. High inter-variable correlation posed challenges of multicollinearity and overfitting in the initial model. However, the application of regularization, trained with 75% of the historical dataset, mitigated these issues and improved model accuracy. This is evident in the improved Mean Squared Error (MSE) performance metrics of the Elastic-Net Regression Model (12.2), compared to the initial Multiple Regression Model (12.5). These findings hold potential implications for weather forecasting and climate change studies
References
Aheto, J. M., Duah, H. O., Agbadi, P., & Nakua, E. K. (2021). A predictive model, and predictors of under-five child malaria prevalence in Ghana: How do lasso, Ridge and elastic net regression approaches compare? Preventive Medicine Reports, 23, 101475. https://doi.org/10.1016/j.pmedr.2021.101475
Altelbany, S. (2021). Evaluation of ridge, elastic net and lasso regression methods in precedence of multicollinearity problem: A simulation study. Journal of Applied Economics and Business Studies, 5(1), 131–142. https://doi.org/10.34260/jaebs.517
Badan Meteorologi, Klimatologi, dan Geofisika (BMKG). (2023). Data Online Pusat Database – BMKG. https://dataonline.bmkg.go.id/home.
Bhadauriya, R. (2021). Lasso, Ridge & Elastic net regression: A complete understanding (2021). Medium. https://medium.com/@creatrohit9/lasso-ridge-elastic-net-regression-a-complete-understanding-2021-b335d9e8ca3
Chandler, R. E., & Wheater, H. S. (2002). Analysis of rainfall variability using generalized linear models: A case study from the west of Ireland. Water Resources Research, 38(10). https://doi.org/10.1029/2001wr000906
Davis, R. E., McGregor, G. R., & Enfield, K. B. (2016). Humidity: A review and primer on atmospheric moisture and human health. Environmental Research, 144, 106–116. https://doi.org/10.1016/j.envres.2015.10.014
de Goeij, M. C., van Diepen, M., Jager, K. J., Tripepi, G., Zoccali, C., & Dekker, F. W. (2013). Multiple imputation: Dealing with missing data. Nephrology Dialysis Transplantation, 28(10), 2415–2420. https://doi.org/10.1093/ndt/gft221
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1). https://doi.org/10.18637/jss.v033.i01
Hargrave, M. (2023, June 30). Winsorized mean: Formula, examples and meaning. Investopedia. https://www.investopedia.com/terms/w/winsorized_mean.asp
Hendrawati, T. (2015). Kajian Metode Imputasi Dalam Menangani Missing Data. Prosiding Seminar Nasional Matematika dan Pendidikan Matematika UMS.
Hirai, G., Okumura, T., Takeuchi, S., Tanaka, O., & Chujo, H. (2000). Studies on the effect of the relative humidity of the atmosphere on the growth and physiology of Rice plants. Plant Production Science, 3(2), 129–133. https://doi.org/10.1626/pps.3.129
Ho, Z. Y., Jain, M., & Dev, S. (2021). Multivariate convolutional lstms for relative humidity forecasting. 2021 Photonics & Electromagnetics Research Symposium (PIERS). https://doi.org/10.1109/piers53385.2021.9695076
Hou, Y., Wang, Q., & Tan, T. (2022). Prediction of carbon dioxide emissions in China using shallow learning with cross validation. Energies, 15(22), 8642. https://doi.org/10.3390/en15228642
Hutapea, M. I., Pratiwi, Y. Y., Sarkis, I. M., Jaya, I. K., & Sinambela, M. (2020). Prediction of relative humidity based on long short-term memory network. AIP Conference Proceedings. https://doi.org/10.1063/5.0003171
Jung, Y. (2017). Multiple predicting K-fold cross-validation for model selection. Journal of Nonparametric Statistics, 30(1), 197–215. https://doi.org/10.1080/10485252.2017.1404598
Keita, Z. (2022, December 6). Multiple linear regression in R: Tutorial with examples. DataCamp. https://www.datacamp.com/tutorial/multiple-linear-regression-r-tutorial
Lien, D., & Balakrishnan, N. (2005). On regression analysis with data cleaning via trimming, Winsorization, and dichotomization. Communications in Statistics - Simulation and Computation, 34(4), 839–849. https://doi.org/10.1080/03610910500307695
Little, A. R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. John Wiley & Sons.
Newgard, C. D., & Lewis, R. J. (2015). Missing data. JAMA, 314(9), 940. https://doi.org/10.1001/jama.2015.10516
Tay, J. K., Narasimhan, B., & Hastie, T. (2023). Elastic net regularization paths for all generalized linear models. Journal of Statistical Software, 106(1). https://doi.org/10.18637/jss.v106.i01
Thevaraja, M., Rahman, A., & Gabirial, M. (2019). Recent developments in data science: Comparing linear, ridge and lasso regressions techniques using wine data. In F. Hidoussi (Ed.), Proceedings of the International Conference on Digital Image & Signal Processing (pp. 1-6). [217] University of Oxford.
van Buuren, S. (2018). Flexible Imputation of Missing Data, Second Edition. https://doi.org/10.1201/9780429492259
Xie, J., Chen, Y., Hong, T., & Laing, T. D. (2018). Relative humidity for load forecasting models. IEEE Transactions on Smart Grid, 9(1), 191–198. https://doi.org/10.1109/tsg.2016.2547964
Yamasari, Y., Rochmawati, N., Putra, R. E., Qoiriah, A., Asmunin, & Yustanti, W. (2021). Predicting the student’s performance using regularization-based linear regression. 2021 Fourth International Conference on Vocational Education and Electrical Engineering (ICVEE). https://doi.org/10.1109/icvee54186.2021.9649704
License
Copyright (c) 2023 Adi Nugroho, Aditya Pramada Wicaksono, Achmad Choiruddin (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.
This work is licensed under aCreative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).