Application of machine learning algorithms to PM2.5 concentration analysis in the state of São Paulo, Brazil

Angela Rosa Locateli Godoy; Ana Estela Antunes da Silva; Mirelle Candida Bueno; Simone Andréa Pozza; Guilherme Palermo Coelho

doi:10.5327/Z21769478782

Authors

Angela Rosa Locateli Godoy Universidade Estadual de Campinas (Unicamp) - Brazil https://orcid.org/0000-0003-2858-5189
Ana Estela Antunes da Silva Universidade Estadual de Campinas (Unicamp) - Brazil https://orcid.org/0000-0001-9886-3506
Mirelle Candida Bueno Universidade Estadual de Campinas (Unicamp) - Brazil https://orcid.org/0000-0003-2374-6123
Simone Andréa Pozza Universidade Estadual de Campinas (Unicamp) - Brazil https://orcid.org/0000-0001-7423-0982
Guilherme Palermo Coelho Universidade Estadual de Campinas (Unicamp) - Brazil https://orcid.org/0000-0002-4641-0684

DOI:

https://doi.org/10.5327/Z21769478782

Keywords:

Air pollutants; Particulate Matter; Clustering; Association Rules; Air quality; Respiratory diseases.

Abstract

Air quality monitoring data are useful in different areas of research and have varied applications, especially with a focus on the relationship between air pollution, respiratory problems, and other health hazards. The main atmospheric pollutants are: ozone (O₃), sulfur dioxide (SO₂), carbon monoxide (CO), nitrogen dioxide (NO₂), and particulate matter (PM). PM is one of the main objects of study when one intends to protect people from exposure to pollutants. This study contributes to the analysis of PM_2.5 in 21 stations in the state of São Paulo monitored by the Environmental Company of São Paulo State (CETESB). It employs cluster analysis, a prominent data mining method for detecting patterns and discovering similarities which is important for assessing air pollution, especially in a geographically vast area such as that of the state of São Paulo, which does not follow a single pattern. Another data mining technique (association rules) supports the analysis of the relationship between pollutants and meteorological variables, as it allows identifying changes between elements that occur together, in a wide variety of data. Our objectives include determining stations with similar behaviors and exploring the temporal variety of the pollutant as it relates to the dominant meteorological factors in the periods of high concentration. The clustering algorithm automatically separates stations according to their monthly averages of PM_2.5 concentration between 2017 and 2019. The clusters of stations that showed the highest pollution rates essentially included urban centers with emissions by industries and vehicles, while those with the lowest rates were located further inland. A cyclical behavior in pollutant variation was also observed in the three years under study and for both clusters. For the months with the highest concentration of PM_2.5, association rule learning was applied to connect air temperature, relative humidity, and wind speed with PM_2.5 and carbon monoxide (CO) concentrations. The obtained results are useful to analyze the temporal and geolocation profiles of pollution by particulate matter, since they identify the behavior of the meteorological factors that predominate in periods of greater concentration.

Downloads

Download data is not yet available.

References

ABE, K.; MIRAGLIA, S. Avaliação de impacto à saúde do programa de controle de poluição do ar por veículos automotores no município de São Paulo, Brasil. Revista Brasileira de Ciências Ambientais (Online), n. 47, p. 61-73, 2018. https://doi.org/10.5327/Z2176-947820180310

AGRAWAL, R.; SRIKANT, R. Fast Algorithms for Mining Association Rules in Large Databases. In: INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 20., 1994. Proceedings… 1994. p. 487-499.

AMEER, S.; SHAH, M. A.; KHAN, A.; SONG, H.; MAPLE, C.; ISLAM, S. U.; ASGHAR, M. N. Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities. IEEE Access, v. 7, p. 128325-128338, 2019. https://doi.org/10.1109/access.2019.2925082

ANDRADE, M.; MIRANDA, R. M.; FORNARO, A.; KERR, A.; OYAMA, B.; ANDRE, P. A.; SALDIVA, P. Vehicle emissions and PM2.5 mass concentrations in six Brazilian cities. Air Quality, Atmosphere and Health, v. 5, p. 79-88, 2012. https://doi.org/10.1007/s11869-010-0104-5

ARAÚJO, J.; ROSÁRIO, N. Poluição atmosférica associada ao material particulado no estado de São Paulo: análise baseada em dados de satélite. Revista Brasileira de Ciências Ambientais (Online), v. 55, n. 1, p. 32-47, 2020. https://doi.org/10.5327/Z2176-947820200552

AUSTIN, E.; COULL, B. A.; ZANOBETTI, A.; KOUTRAKIS, P. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environment International, v. 59, p. 244-254, 2013. https://doi.org/10.1016/j.envint.2013.06.003

BATISTA, A. F. M.; CHIAVEGATTO, A. D. P. Machine Learning aplicado à Saúde. Workshop: Machine Learning. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADO À SAÚDE, 19., 2019. Proceedings... Sociedade Brasileira de Computação, 2019. Available at: <https://sol.sbc.org.br/livros/index.php /sbc/catalog/view/29/95/245-1>. Accessed on: Jul. 20, 2020.

BISHT, M.; SEEJA K.R. Air Pollution Prediction Using Extreme Learning Machine: A Case Study on Delhi (India). In: SOMANI, A.; SRIVASTAVA, S.; MUNDRA, A.; RAWAT, S. (eds.). Proceedings of First International Conference on Smart System, Innovations and Computing. Smart Innovation, Systems and Technologies. Singapore: Springer, 2018. v. 79. p. 181-189.

BRAZIL. Ministério do Meio Ambiente. Conselho Nacional do Meio Ambiente. Resolução nº 491, de 19 de novembro de 2018. Brasil, 2018. Available from: <http://www2.mma.gov.br/port/conama/legiabre.cfm?codlegi=740>. Accessed on: Jun. 10, 2019.

CARDOSO, K. M.; PAULA, A.; SANTOS, J. S.; SANTOS, M. L. P. Uso de espécies da arborização urbana no biomonitoramento de poluição ambiental. Ciência Florestal, v. 27, n. 2, p. 535-547, 2017. https://doi.org/10.5902/1980509827734

CASTRO, L. N.; FERRARI, D. G. Introdução a Mineração de Dados. Conceitos Básicos, Algoritmos e Aplicações. São Paulo: Saraiva, 2016. 351 p.

CÉSAR, A. C. G.; NASCIMENTO, L. F. C.; MANTOVANI, K. C. C.; VIEIRA, L. C. P. Fine particulate matter estimated by mathematical model and hospitalizations for pneumonia and asthma in children. Revista Paulista de Pediatria, v. 34, n. 1, p. 18-23, 2016. https://doi.org/10.1016/j.rppede.2015.12.005

COMPANHIA AMBIENTAL DO ESTADO DE SÃO PAULO (CETESB). Relatório de Qualidade do Ar no estado de São Paulo. São Paulo: Governo do Estado de São Paulo / Secretaria do Meio Ambiente / Companhia Ambiental do Estado de São Paulo, 2019. Available from: <https://cetesb.sp.gov.br/ar/wp-content/uploads/sites/28/2019/05/Relat%C3%B3rio-de-Qualidade-do-Ar-2017.pdf>. Accessed on: May 8, 2019.

COMPANHIA AMBIENTAL DO ESTADO DE SÃO PAULO (CETESB). Winter Operation Report. Available at: <https://cetesb.sp.gov.br/ar/wp-content/uploads/sites/28/2020/03/Relatório-Operação-Inverno-2019.pdf>. Accessed on: Apr. 12, 2020.

DIMITRIOU, K. Upgrading the estimation of daily PM10 concentrations utilizing prediction variables reflecting atmospheric processes. Aerosol and Air Quality Research, v. 16, n. 9, p. 2245-2254, 2016. https://doi.org/10.4209/aaqr.2016.05.0214

DU, X.; VARDE, A. S. Mining PM2.5 and traffic conditions for air quality. In: INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, 7., 2016. Proceedings… ICICS, 2016. p. 33-38. https://doi.org/10.1109/IACS.2016.7476082

GONÇALVES, F. L. T.; CARVALHO, L. M. V.; CONDE, F. C.; LATORRE, M. R. D. O.; SALDIVA, P. H. N.; BRAGA, A. L. F. The efects of air pollution and meteorological parameters on respiratory morbidity during summer in São Paulo City. Environment International, v. 31, n. 3, p. 343-349, 2005. https://doi.org/10.1016/j.envint.2004.08.004

GUERRA, F. P.; MIRANDA, R. M. Influência da meteorologia na concentração do poluente atmosférico PM2,5 na RMRJ e na RMSP. In: CONGRESSO BRASILEIRO DE GESTÃO AMBIENTAL, 2., 2011. Proceedings... 2011.

GUIDETTI, B.; PEREDA, P. Air Pollution Consequences in São Paulo: Evidence for Health. 2018. 20 p.

HAN, J.; KAMBER, M. Data Mining: Concepts and Techniques. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006.

HAN, J.; KAMBER, M.; PEI, J. Data Mining: Concepts and Techniques. 3ª ed. Burlington: Morgan Kaufmann, 2011.

HUANG, P.; ZHANG, J.; TANG, Y.; LIU, L. Spatial and temporal distribution of PM2.5 pollution in Xi’an city, China. International Journal of Environmental Research and Public Health, v. 12, n. 6, p. 6608-6625, 2015. https://doi.org/10.3390/ijerph120606608

INSTITUO NACIONAL DE PESQUISAS ESPACIAIS (INPE). Boletins de Informações Climáticas do CPTEC/INPE, ano 24, n. 1-12, 2019. Available from: <http://infoclima1.cptec.inpe.br>. Accessed on: May 8, 2019.

JIN, X.; HAN, J. K-Medoids Clustering. In: SAMMUT, C.; WEBB, G. I. (Eds.). Encyclopedia of Machine Learning and Data Mining. Boston: Springer, 2017. p. 697-700. https://doi.org/10.1007/978-1-4899-7687-1_432

KAUFMAN, L.; ROUSSEEUW, P. J. Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley Series in Probability and Statistics, 2005.

KWEDLO, W. A clustering method combining differential evolution with the K-means algorithm. Pattern Recognition Letters, v. 32, n. 12, p. 1613-1621, 2011. https://doi.org/10.1016/j.patrec.2011.05.010

LI, Z.; ZHOU, W.; LIU, X.; QUIAN, Y.; WANG, C.; XIE, Z.; MA, H. Research on Association Rules Mining of Atmospheric Environment Monitoring Data. In: HONG, W.; LI, C.; WANG, Q. (eds.). Technology-Inspired Smart Learning for Future Education. NCCSTE 2019. Singapore: Springer, 2020. (Communications in Computer and Information Science, v. 1216.) https://doi.org/10.1007/978-981-15-5390-5_8

MACHIN, A. B.; NASCIMENTO, L. F. C. Efeitos da exposição a poluentes do ar na saúde das crianças de Cuiabá, Mato Grosso, Brasil. Cadernos de Saúde Pública, v. 34, n. 3, p. 1-9, 2018. https://doi.org/10.1590/0102-311X00006617

MITSA, T. Temporal data mining. In: MITSA, T. Temporal Data Mining. New York: Chapman and Hall, 2010. p. 46-48. https://doi.org/10.1201/9781420089776

MOISAN, S.; HERRERA, R.; CLEMENTS, A. A dynamic multiple equation approach for forecasting PM2,5 pollution in Santiago, Chile. International Journal of Forecasting, v. 34, n. 4, p. 566-581, 2018. https://doi.org/10.1016/j.ijforecast.2018.03.007

MORAES, S. L.; ALMENDRA, R.; SANTANA, P.; GALVANI, E. Meteorological variables and air pollution and their association with hospitalizations due to respiratory diseases in children: A case study in São Paulo, Brazil. Cadernos de Saúde Pública, v. 35, n. 7, p. 1-16, 2019. https://doi.org/10.1590/0102-311x00101418

MUELLER, A. Fast sequential and parallel algorithms for association rule mining: a comparison. Thesis (M.S.) – Department of Computer Science, University of Maryland, College Park, 1995.

NEIROTTI, P.; MARCO, A.; CAGLIANO, A. C.; MANGANO, G.; SCORRANO, F. Current trends in smart city initiatives: Some stylised facts. Cities, v. 38, p. 25-36, 2014. https://doi.org/10.1016/j.cities.2013.12.010

NODARI, A. S.; SALDANHA, C. B. Episódios críticos de Poluição Atmosférica no município de Porto Alegre/RS. In: INTERNATIONAL SYMPOSIUM ON ENVIRONMENTAL QUALITY, 10., 2016. Available at: <http://www.abes-rs.uni5.net/centraldeeventos/_arqTrabalhos/trab_20160910113702000000650.pdf>. Accessed on: Feb. 20, 2019.

NOGAROTTO, D. C. Avaliação de modelos de regressão de trajetórias para a previsão de poluentes atmosféricos. 145f. Thesis (Doctoring) – Faculdade de Tecnologia, Universidade Estadual de Campinas, Limeira, 2019. Available at: <http://www.repositorio.unicamp.br/handle/REPOSIP/334421>. Accessed on: May 22, 2020.

PEDREGOSA, F.; VAROQUAUX, G.; GRAMFORT, A.; MICHEL, V.; THIRION, B.; GRISEL, O.; BLONDEL, M.; PRETTENHOFER, P.; WEISS, R.; DUBOURG, V.; VANDERPLAS, J.; PASSOS, A.; COURNAPEAU, D.; BRUCHER, M.; PERROT, M.; DUCHESNAY, E.. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, v. 12, n. 85, p. 2825-2830, 2011. Available from: <http://www.jmlr.org/papers/v12/pedregosa11a.html>. Accessed on: Mar. 5, 2020.

PLAIA, A., BONDI, A. L. Single imputation method of missing values in environmental pollution datasets. Atmospheric Environment, v. 40, n. 38, p. 7316-7330, 2006. https://doi.org/10.1016/j.atmosenv.2006.06.040

POLEZER, G.; TADANO, Y. S.; SIQUEIRA, H. V.; GODOI, A. F. L.; YAMAMOTO, C. I.; ANDRÉ, P. A.; PAULIQUEVIS, T.; ANDRADE, M. F.; OLIVEIRA, A.; SALDIVA, P. H. N.; TAYLOR, P. E.; GODOI, R. H. M. Assessing the impact of PM2.5 on respiratory disease using artificial neural networks. Environmental Pollution, v. 235, p. 394-403, 2018. https://doi.org/10.1016/j.envpol.2017.12.111

QUALAR (2019). Qualidade do Ar. Dados meteorológicos. CETESB. Available from: <https://cetesb.sp.gov.br/ar/qualar>. Accessed on: May 8, 2019.

REINHARDT, T. E.; OTTMAR, R. D.; CASTILLA, C.; Smoke Impacts from Agricultural Burning in a Rural Brazilian Town. Journal of the Air & Waste Management Association, v. 51, n. 3, p. 443-450, 2011. https://doi.org/10.1080/10473289.2001.10464280

SADAT, Y. K.; KARIMIPOUR, F.; SADAT, A. K. Investigating the relation between prevalence of asthmatic allergy with the characteristics of the environment using association rule mining. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, v. 40, n. 2W3, p. 169-174, 2014. https://doi.org/10.5194/isprsarchives-XL-2-W3-169-2014

SAIDE, P. E.; CARMICHAEL, G. R.; SPAK, S. N.; GALLARDO, L.; OSSES, A.; MENA-CARRASCO, M.; PAGOWSKI, M. Forecasting urban PM10 and PM2. 5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF–Chem CO tracer model. Atmospheric Environment, v. 45, n. 16, p. 2769-2780, 2011. https://doi.org/10.1016/j.atmosenv.2011.02.001

SANTOS, F. S.; PINTO, J. A.; MACIEL, F. M.; HORTA, F. S.; ALBUQUERQUE, T. T. A.; ANDRADE, M. F. Avaliação da influência das condições meteorológicas na concentração de material particulado fino (MP2,5) em Belo Horizonte, MG. Engenharia Sanitária e Ambiental, v. 24, n. 2, p. 371-381, 2019. https://doi.org/10.1590/s1413-41522019174045

SANTOS, T. C.; CARVALHO, V. S. B; REBOITA, M. S. Avaliação da influência das condições meteorológicas em dias com altas concentrações de material particulado na Região Metropolitana do Rio de Janeiro. Engenharia Sanitária e Ambiental, v. 21, n. 2, p. 307-313, 2016. https://doi.org/10.1590/s1413-41522016139269

SÃO PAULO. Decreto nº 59.113, de 23 de abril de 2013. Estabelece novos padrões de qualidade do ar e dá providências correlatas. Com retificações posteriores. São Paulo, 2013. Available from: <https://www.al.sp.gov.br/repositorio/legislacao/decreto/2013/decreto-59113-23.04.2013.html>. Accessed on: Dec., 2019.

SEINFELD, J. H.; PANDIS, S. N. Atmospheric Chemistry and Physics from Air Pollution to Climate Change. 3rd ed. New York: Wiley, 2016.

SOUZA, F. T.; RABELO, W. S. A data mining approach to study the air pollution induced by urban phenomena and the association with respiratory diseases. In: INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, 2016. Proceedings… 2016. p. 1045-1050. https://doi.org/10.1109/ICNC.2015.7378136

WORLD HEALTH ORGANIZATION (WHO). Nine out of ten people worldwide breathe polluted air, but more countries are taking action. WHO, 2019. Available from: <https://www.who.int/news-room/detail/02-05-2018-9-out-of-10-people-worldwide-breathe-polluted-air-but-more-countries-are-taking-action>. Accessed on: May 8, 2019.

XIAO, C.; CHANG, M.; GUO, P.; YUAN, M.; XU, C.; SONG, X.; XIONG, X.; LI, Y.; LI, Z. Characteristics analysis of industrial atmospheric emission sources in Beijing–Tianjin–Hebei and Surrounding Areas using data mining and statistics on different time scales. Atmospheric Pollution Research, v. 11, n. 1, p. 11-26, 2020. https://doi.org/10.1016/j.apr.2019.08.008

YANAGI, Y.; ASSUNÇÃO, J. V.; BARROZO, L. V. The impact of atmospheric particulate matter on cancer incidence and mortality in the city of São Paulo, Brazil Influência do material particulado atmosférico na incidência e mortalidade por câncer no Município. Cadernos de Saúde Pública, v. 28, n. 9, p. 1737-1748, 2012. https://doi.org/10.1590/S0102-311X2012000900012

ZOU, B.; PENG, F.; WAN, N.; MAMADY, K.; WILSON, G. J. Spatial cluster detection of air pollution exposure inequities across the United States. PLoS One, v. 9, n. 3, e91917, 2014. https://doi.org/10.1371/journal.pone.0091917