MULTIVARIATE STATISTICAL ANALYSIS AND USE OF GEOGRAPHIC INFORMATION SYSTEMS IN RAW WATER QUALITY ASSESSMENT

This study aimed to apply a methodology for evaluating raw water qualityand its relationship with land uses and occupations through multivariatestatistical analysis and Geographic Information System. Hydrogenic potential,water temperature, dissolved oxygen, biochemical oxygen demand,chemical oxygen demand, total nitrogen, total phosphorus, and E. coli weremonitored from August 2012 until March 2013. The geoprocessing toolenabled delimiting the contribution areas of each sampling site, as well asthe individual identification of land use of each area. Principal ComponentAnalysis resulted in: domestic sewage, domestic sewage/agriculture, andindustrial discharge. Significant correlations were identified between thevariable urban area and hydrogenionic potential (ρ = 0.446; p = 0.049),dissolved oxygen (ρ = -0.625; p = 0.003), total nitrogen (ρ = 0.649; p = 0.002),and E. coli (ρ = 0.932; p < 0.001). The methodology enabled to identify thecontribution of land use factors as to water quality.


INTRODUCTION
Environmental conditions, especially within hydrographic basins, can be assessed employing records of events on the earth surface, using geotechnology tools capable of providing information of the local geography and, together with management processes of urban occupation, bring a new meaning to urban planning. Thus, it is possible to systematically analyze human-environment interaction processes, even if a landscape partition is used as the analysis unit. Hence, geotechnologies related to remote sensing and geographic information systems (GIS) are increasingly interconnected, considering their applications in different fields of knowledge (FLORENZANO, 2005). Robust data sets, based on technological contributions, allow for the inference on environmental issues, as well as the production of good quality cartographic material, such as land use maps, occupation and land parceling, deforestation, agricultural activities, silting and pollution of water bodies and soil erosion losses, which can be the decision-making basis in environmental management processes (PORTO & HARTWIG, 2016).
Adding statistics to this scenario, it is of great value to analyze data, to relate them and, if possible, to group them, allowing a more cohesive and punctual interpretation of factors. A comprehensive water quality-monitoring program encompasses spatial and temporal assessment of many relevant variables, generating a complex and, in most cases, difficult to interpret data matrix. Thus, the application of different multivariate statistical techniques, such as Principal Component Analysis (PCA), can facilitate the interpretation of complex data and enable the identification of possible factors that are unfavorably influencing water systems (SINGH; MALIK; SINHA, 2005;PAZ;FREITAS;NICOLA, 2006;DINIZ;SOARES;CABRAL, 2012;FIGUEIREDO FILHO;SILVA JÚNIOR;ROCHA, 2012;SHEELA et al., 2012;RUŽDJAK & RUŽDJAK, 2015). Hence, GIS have been increasingly used in integrated environmental assessments (LIU & DE SMEDT, 2005;CARDOSO et al., 2015, ELLIOTT et al., 2016. Use of visualization and mapping tools within the GIS platform enables the extraction of georeferenced information from the crossing and analysis of various thematic maps, which provide information on various components of the environment, such as soil type, geology, geomorphology, land use, vegetal cover or declivity. Using these tools allows the relationships between quan-titative and qualitative variables of the environment to be known, which may help in the identification of risk areas and elaboration of zoning plans (FLORENZANO, 2005;CARDOSO et al., 2015;ELLIOTT et al., 2016).
Water quality has become an environmental issue of primary concern, mainly due to the vulnerability and increasing pollution of water resources, caused by factors such as rapid and disorderly urban and industrial growth, which end up compromising the restoration capacity of water bodies. The negative impacts caused by pollution promote an imbalance of natural flows and cycles and cause a series of significant environmental impacts. In this perspective, studies focused on the analysis of spatial and temporal transformations occurring on the elements and attributes of the environmental system may show many anthropogenic mechanisms capable of causing negative impacts, being able to be used in the planning and conservation of these areas (CHEN et al., 2016;TRENTIN, 2009).
In general, studies that establish water quality profiles related to anthropogenic factors evaluate these relationships considering only a qualitative approach. These studies use data from GIS without establishing a statistical relationship with water quality data. However, one can also observe researches that deal with the importance of discussing statistics with other study factors using quantitative methods. Farhan et al. (2017), for instance, used GIS and PCA to assess basins in an integrated manner, determining their prioritization; Alvarado et al. (2016), whose research presents the use of multi-criteria decision analysis (MCDA), with an integrated discussion of factors, as a decision tool to facilitate the prioritization process of consumer wells that would need more protection before the risk of contamination. Also doing a connected discussion of assessments, Rahman et al. (2016) conducted a study that aimed to determine and evaluate spatial and temporal changes in groundwater using GIS, linear regression, Mann-Kendall Trend Test, and Sen's slope estimator. Regarding water quality, Bhutiani et al. (2015) evaluated the environmental impact of sociocultural practices on the water quality of Ganga River, in India. In this study, the physical-chemical parameters that contributed to the temporal variation and pollution in the river were identified, and the PCA and CA were used to identify anthropogenic factors (industrial, urban, sewage, agriculture, land use, and mining activities) and natural factors (soil erosion, inclement weather).
Hence, the development and application of methodologies capable of integrating data from different areas of knowledge are of great value. This study aimed to apply a methodology to evaluate the raw water quality and its relationship with land uses and occupations (ur-ban use) through multivariate statistical analysis (PCA) and GIS. Thus, the micro basin of Santa Bárbara stream (MSBS) was taken as a case study, due to its importance for the Southern of Brazil, in the state of Rio Grande do Sul, especially in the municipality of Pelotas. Santa Bárbara Dam represents the main source of drinking water in the municipality and is currently characterized by being an area of urban-industrial expansion.

Multivariate statistical analyses
For an integrated analysis, the statistical techniques of PCA were useful. In order to determine if the water quality data presented normal distribution, Kolmogorov-Smirnov normality test should be applied. As the data set presented non-normal distribution by the Kolmogorov-Smirnov normality test (p < 0.05), the Kruskal-Wallis non-parametric test (p < 0.05) was used to determine the differences in the concentrations of parameters between SS, followed by the post-hoc Student-Newman-Keuls. Kolmogorov-Smirnov test was performed using the IBM SPSS Statistics 24 software, while Kruskal-Wallis and Student-Newman-Keuls tests used the BioEstat 5.0 software.
Finally, Spearman's correlation analysis was used to identify significant correlations (p < 0.05) between land use (urbanization) and water quality parameters.

Geoprocessing techniques
Variable land use classification may be elaborated from the photographic interpretation of satellite images. The interpretation of photos is a technique that consists of extracting the photograph qualitative information by means of visual interpretation. Imaging classification methods can be divided initially into two types: automatic and manual. Automatic classifications are based on the extrapolation of calibrated samples using the GIS software. In this study, the use of soil and land cover changes in the soil and land cover systems was carried out in the same manner as in previous studies (ZHANG et al., 2014).

CASE STUDY: MSBS Study area description
Pelotas is a Brazilian city located in the south of Rio Grande do Sul State, with a total area of around 1,610 km², which is home to the third largest population of the state, estimated at 342,873 inhabitants (IBGE, 2016). The municipal drinking water is supplied by the Serviço Autônomo de Saneamento de Pelotas (SANEP, acronym in Portuguese), which, in 1968, when damming the Santa Bárbara stream, built Santa Bárbara Dam, whose 352 ha flood has an estimated volume of 10 billion liters of water. The dam provides raw water to Santa Bárbara Water Treatment Station (WTS), whose total capacity is 40 million liters of water per day, supplying eight districts, which corresponds to 80% of the city urban area (SANEP, 2016).
Pelotas has 67% of their houses served by sewage collection networks and two Sewage Treatment Stations (STS), which together treat 40% of the sewage collected from the urban area. The urban drainage system is composed of pump houses and collector and conductor channels, and Santa Bárbara Stream is one of the main drains, where the effluents from the industrial district (SANEP, 2016) are launched The MSBS is located in the southern portion of the municipality of Pelotas, Rio Grande do Sul, Brazil, at the intersection of BR 471 and BR 116 highways ( Figure 1).
The sampling sites (SS) 1, 2, 3, and 4 are located in four areas of the MSBS. In addition, SS2, SS3 and SS4 are in the Santa Bárbara stream. SS1 (29º38'28.54"S and 51º06'38.9" W) is the least urbanized site, located in Passo do Cunha Stream, which is one of the MSBS tributaries. Its spring is in the North of the dam and receives effluents, predominantly agricultural, originated from dairy farming, fruit growing, poultry farming, and afforestation. This site presents a depth of 40 cm and margins with arboreal vegetation and without the presence of solid urban and industrial residues. The water has a clean and clear color, with no unpleasant odors. The sediment presents a light red color, also without unpleasant odor. SS2 (29º38'55.22"S and 51º10'13.41" W) is the closest site to the source of Santa Bárbara stream, with a depth of 20 cm and without riparian vegetation. No solid urban and industrial waste was identified in it, and its water is muddy, clear, with no unpleasant odors and there is a clear coloration sediment, which also does not have unpleasant odors. This site presents agricultural and pastoral activity, distributed into small farms. SS3 (29º39'31.8"S and 51º6'31.52"W) is in an urbanized area and has an average depth of 150 cm. Its waterway is bordered on the right by an avenue that connects BR 116 road to the center of Pelotas, where upstream on the right is the industrial district, a stabilization pond and, on the left, an irregular deposition of solid waste, already deactivated. Near the sampling site, there are about 20 low-income irregular households, and the channel receives, in this area, without treatment, domestic and industrial effluents, from activities such as rice processing, mechanical maintenance of vehicles, small candy industries, and agricultural trade, among others. As there is no minimum flow, this point receives only pollutants, presenting a very silted channel. The water color varies from dark green to black, being completely muddy, fetid, and viscous. SS4 (29º42'22.62"S and 51º05'17.93"W) is located near the river mouth and characterized by a significant amount of effluent discharge and degraded riparian vegetation. This site is near the margins of BR 392 that connects Pelotas to the port of Rio Grande. At this site, there is heavy traffic of trucks, and it still receives contributions from the activity of rice planting (SIMON; TRENTIN; CUNHA, 2010).

Water sample collection
Water samples were collected monthly (August 2012 to March 2013), in periods without precipitation, to avoid the influence of this variable in the data analysis. The collection period includes all seasons of the year, periods of highest and lowest rainfall in the basin and all temperature ranges observed over a year, according to Figure 2 (INMET, 2017).

Elaboration of micro basin of Santa Barbara stream land use map
Variable land classification use was elaborated from photo interpretation of a satellite image extracted from Google Earth Pro, in the locality of MSBS, with a spatial resolution of 6 m (March 11th, 2016 and Datum WGS 84). In the ArcGIS software, the image was georeferenced from known points and, then, demarcated with the basin boundary, using the clip tool for the procedure. The layers for digitizing the land use spots were created manually, with the aim of reducing spectral confusion between some classes and granular

RESULTS AND DISCUSSION
Descriptive statistics Table 1 shows the results of statistical tests, as well as the descriptive statistics regarding water quality parameters monitored at the four points of water collection in the MSBS.
According to Table 1, significant differences were identified between the sampling sites in pH, DO, TN and E. coli. It, therefore, indicates that, among the parameters monitored in the study period, these are the ones that present greater influence regarding differences in the quality of water between sampling points. In SS3 and SS4, the E. coli parameter extrapolated the maximum detection limit of the method. Figure 3 shows the graphs in the box-plot format of the analyzed parameters, in which differences/similarities between sampling sites can be seen more clearly.
The DO, BOD and TP were compared with CONAMA

Resolution 357/2005, of the Conselho Nacional do
Meio Ambiente (CONAMA, acronym in Portuguese) (BRASIL, 2005). DO in SS3 and SS4 presented mean and median values of less than 4 mg L -1 and, therefore, can be classified as class IV (the worst class established by CONAMA Resolution 357/2005), with water uses only for landscape harmony and navigation. For TP, the average values found in the four sampling sites exceeded by more than 40 times the maximum value established by class III, which is 0.15 mg L -1 for lotic environments, so the four points studied can be framed as class IV.

The contrast between DO and TN concentrations in SS3
are quite visible, which may be associated with the contribution of high organic loads at this site, possibly from domestic sewage and/or nitrogen fertilizer leaching, which contribute to the depletion of DO (ALVES et al., 2018).
Another fact that corroborates this hypothesis is the high concentrations of E. coli, which present a significant increase in SS2 and reach even higher values in SS3 and SS4.

Principal component analysis
The PCA was applied to the data set in order to identify the main factors responsible for variations in MSBS wa-ter quality. The KMO test value was 0.53, which is higher than the acceptable critical threshold (HAIR et al.,

Soil classes Characteristics
Water courses These include water courses, lakes, and reservoirs.

Wetlands
Wetlands are areas where the water table lies on the earth's surface, or above them for most of the time. Cultivation areas Areas used to produce food and fiber.
Native woods Areas of easy location in aerial photographs due to their texture, coloration and irregularities in the canopy composition.

Forestry
Areas destined to the cultivation of exotic trees to the region and that present economic value.

Pastures
Areas where potential natural vegetation is predominant of grasses, graminoid plants, other grasses, pastures, or shrubs. Quarry Areas of shallow soil, often featuring open pit mines, quarries and gravel mines.

Transition areas
Those that do not fit into the characteristics of other land use classes.

Urban areas
Comprising areas of intensive use, with most part of the land covered by structures.

2009). Bartlett's sphericity test was statistically significant
(p > 0.01). In both cases, the tests suggest that data are adequate for the statistical treatment, so the PCA allowed the identification of three principal components (PC), which explain 71.0% of the total data variance ( ic loads, such as domestic sewage. The main processes that affect the oxygen concentration in water can be represented by physical (temperature) and biological parameters (oxygen consumption by living organisms), as the oxygen solubility decreases with increasing temperature, which leads to oxygen depletion at high temperatures. The presence of microorganisms in the water leads to a reduction of oxygen concentrations due to the consumption of this substance by microorganisms that live in water, which decompose the biodegradable organic matter at an aerobic process (JHA; OJHA; BHA-TIA, 2007;VEGA et al., 1998). The microbiological action on organic loads, through nitrification processes, leads to the depletion of DO (VON SPERLING, 2005;RUŽDJAK & RUŽDJAK, 2015;Zhong et al., 2018).
PC2 explains 20.8% of the total data variance and presents high positive factor loads for the WT and TP. The parameter TP is suggestive of pollution by anthropogenic sources. The contribution of phosphorus can     occur through effluents from domestic and industrial effluents, fertilizers and leachate from animal farms, in addition to the dissolution process of soil compounds, but on a much smaller scale (LIBÂNIO, 2008). Thus, if phosphorus containing wastes are constant during hot and dry periods, when water volumes tend to be lower, the phosphorus concentrations increase in the water body, mainly due to water volume reduction. This fact explains the relation between TP and WT in the PC2.
PC3 explains 15.8% of the total data variance and presents a positive high factor load for the COD and a negative high factor load for BOD. The increase in COD in water bodies is mainly due to industrial waste, while the increase in BOD is related to domestic sewage emissions (VON SPERLING, 2005). PC3 also presents the BOD, negatively related with COD, which is possibly due to the increase of concentrations of toxic substances (from industrial wastes, representing the COD) that inhibit bacteria action on the organic load decomposition.

Land use in the micro basin of Santa Barbara stream
Cartographic maps enabled drawing the maps of Figure 4, which presents land uses of the contribution areas of each sampling site monitored in the MSBS.
An important aspect to be considered as to the land use within the MSBS is the cultivated area advance, which showed a high growth, from 10 km 2 in 1953 to 32 km 2 in 2016. However, even though there is almost linear growth in five decades (approximately 4.5 km 2 per decade), there was a rupture in the evolution of cultivation areas between 2006 and 2016, where growth practically stabilized. Cultivation areas stand out north of Santa Bárbara Dam and are associated with smallholdings, such as farmhouses, which produce food for subsistence and a complement to family income (SIMON; TREN-TIN; CUNHA, 2010).
Among the types of land use, none grew as much in the last decade as forestry, which in 2006 occupied an area of approximately 22 km 2 and it increased to 35.7 km 2 in 2016 (SIMON; TRENTIN; CUNHA, 2010). Regarding pasture areas, there was a signif-icant drop in this crop, which reduced from 24 km 2 in 1953 to 1.8 km 2 in 2016. This land use is intrinsically linked to cattle raising, whose crisis in the sector, worsened by the closure of slaughterhouses, cooperated for the decline of grazing activity. According to the Economics and Statistics Foundation (FUNDAÇÃO DE ECONOMIA E ESTATÍSTICA, 1981), the agricultural census of 1950 indicated a herd of 152,577 animals in Pelotas, while in 2006 the Brazilian Institute of Geography and Statistics (IBGE, 2016) counted 73,233 animals, that is, Pelotas's herd was reduced in almost half.
After 1965, there was an increase in the area of water bodies. Thus, the impoundment of Santa Bárbara stream in 1968, which originated Santa Bárbara Dam, is a highlight. In contrast, during 1953 to 2016, there was a marked decrease in wetlands, leaving only a third of the original area. The pressure on wetlands occurred due to drainage and landfills for expansion of urban areas and the capture of water to supply the population, negatively impacting soil quality and contributing to the degradation of water resourc-    es (SACCO et al., 2015). As shown in Figure 5, urban expansion increases in the contribution area of each sampling site and grows as the sampling sites move away, from SS1 to SS4. Figure 5 also shows the percentage of land use in each contribution area, where urbanization growth can be observed. Table 3 shows the correlations identified between land use (urbanization) and water quality parameters monitored in the study area.
As seen in Table 3, pH, DO, TN and E. coli showed significant correlations with the variable urban areas, that is, pH and TN presented moderate positive correlations, DO presented moderate negative correlation, whereas E. coli had a strong positive correlation. These results demonstrate the significant relation between land use (specifically urbanization) and MSBS water quality. Therefore, the greater the urbanization, the greater the degradation of MSBS water quality.

CONCLUSIONS
This study proposed the application of a methodology that integrates statistical tools and GIS in the evaluation of raw water quality and its relation to land uses and occupations (urbanization) through multivariate statistical analysis of a micro basin, as well as the application of this methodology, using the MSBS, in Southern Brazil, as a case study. Significant statistical differences were identified for pH, DO, TN and E. coli, compared to SS1 and SS2. They demonstrate that SS3 and SS4 present significant higher levels of contamination, which were attributed to human activities, due to the urbanization process. The PCA resulted in three PC, which together account for 71.0% of the total data variance associated with anthropogenic contributions of domestic sewage (PC1), domestic sewage/agriculture (PC2), and industrial discharge (PC3).
The land use maps elaborated through GIS enabled the identification of the main factors that might be contributing to the water quality degradation of the MSBS, among which was the urbanization, which occupies gradually larger areas from SS1 to SS4. Spearman's correlation analysis allowed the identification of statistically significant correlations between urban areas and pH, DO, TN and E. coli, which also stood out in the PCA. *Correlation is significant at 5% level (bilateral); **correlation is significant at 1% level (bilateral); relation between correlation coefficient and correlation intensity (positive and negative): 0.1-0.3 (weak), 0.4-0.6 (moderate), and 0.7-0.9 (strong) (FIELD, 2009); WT: water temperature; DO: dissolved oxygen; BOD: biochemical oxygen demand; COD: chemical oxygen demand; TN: total nitrogen; TP: total phosphorus.

Land use
Water quality parameter Statistics In this study, a quantitative approach was applied to establish associations between water quality and land use. This methodology can be extrapolated to any basin, as well as other water pollution parameters that can be associated with different factors besides land, such as population density, income, areas with or without sanitary sewage, that is, socioeconomic factors that can contribute to the contamination of water resources.
Through the established methodology, it was possible to identify the contribution of anthropogenic activities, that is, urbanization to water quality degradation. This study results prove that the use of visualization and mapping tools within the GIS platform can serve as an important tool to obtain spatial information useful for the development of environmental preservation strategies. Regarding the MSBS, treatment of domestic sewage must be a top priority for maintaining water quality in order to ensure safe supply to the population.