ESTIMATION OF CALCIUM CARBONATE IN ANTHROPOGENIC SOILS ON FLYSCH DEPOSITS FROM DALMATIA (CROATIA) USING VIS-NIR SPECTROSCOPY

This study aimed to evaluate the ability to use Vis-NIR spectroscopy to predict CaCO3 in the soil and to determine the contribution of the spectral ranges and wavelengths to the prediction. A total of 180 topsoil samples (0-25 cm) of anthropogenic soils derived from Flysch deposits in Dalmatia (Croatia) were analyzed for CaCO3 and scanned in the laboratory with an ASD FieldSpec spectroradiometer (350-2500 nm). The partial last square regression (PLSR) with leave-one-out cross-validation method was used for calibrating the Vis-NIR spectra and CaCO3 measured in the laboratory. The CaCO3 content in investigated soils varies within a very wide range from 186.0 to 894.7 g kg and has a high an average value of 547.2 g kg and normal near symmetrical frequency distribution. Prediction parameters, the coefficient of determination (R), the ratio of performance to deviation (RPD) and the range error ratio (RER) were 0.86, 2.42 and 11.4, respectively indicating that created PLSR model was able to predict CaCO3 content in soil with moderately successful accuracy. The prediction error of the CaCO3 measured as the root mean square error of prediction (RMSEP) was 57.9 g kg. These results suggest that Vis-NIR spectroscopy in combination with PLSR is acceptable as a rapid method for quality control (screening) of the CaCO3 content in investigated soils.

8.6, respectively. The Vis-NIR technique was not as precise as conventional chemical analyses but provides an opportunity to analyse a large number of samples in a short time. CaCO 3 significantly influences the reflectance characteristics of a soil and has spectral activity in the NIR spectral region (700-2500 nm). The strongest diagnostic vibrational absorptions are at 2300-2350 nm and other three weaker bands occur near 2120-2160 nm, 1997-2000 nm and 1850-1870 nm (Clark, 1999). The soil spectrum characterizes complex absorption patterns with a large number of predictor variables that are highly collinear, and therefore analyses of diffuse reflectance spectra require the use of multivariate calibrations (Martens and Naes, 1989).The most common calibration method for analyses of CaCO 3 (and soil) spectra is partial last square regression (PLSR), developed by Wold et al. (2001). A calcium carbonate is the most common carbonate polymorph in soil particularly abundant under semi-arid and dry subhumid conditions (Khayamim et al. 2015). The Ca carbonate has a marked influence on soil chemical properties, eg. pH, cation and anion retention. Ca carbonate surfaces specially interact with phosphate anion, although CaCO 3 also controls Ca concentration in soil solution and in the soil exchange complex (Braschi et al. 2003). The high content of carbonates increases pH and favours the formation of HCO 3 ions that lead to disturbance in the availability of some plant nutrition and various chlorosis eg. iron (Ksouri et al. 2005). The presence of free calcium carbonate in calcareous soils ensures a very high soil buffer capacity (Bache, 1984). The carbonates interact with soil organic matter (SOM) in aggregate formation and stabilization processes and can thus also contribute to SOM stabilization (Virto et al. 2011). Soils derived from Flysch deposits contain a high CaCO 3 content that varied in a wide range (Miloš and Maleš, 1998). Rapid, nondestructive, inexpensive and accurate determination of carbonate content in these soils could be very useful for planning of agricultural production. The aim of this work was to estimate the ability of Vis-NIR diffuse reflectance spectroscopy in combination with PLSR for the prediction of CaCO 3 content in surface horizon of anthropogenic terraced soils derived from Flysch deposits and to determine the contribution of the spectral ranges and wavelengths to the prediction.

MATERIAL AND METHODS Study area and soil data
The study area is situated in the central part of the Adriatic coastal area of Croatia near the city of the Split wider region, centred around 43°32′ N; 16°29′ E. This coastal region has a Mediterranean climate characterized by hot summers and mild, moderately rainy winters classified as Csa. The mean annual air temperature of the Split for the period between 1981 and 2010 was 15.9 °C, the mean annual precipitation for the same time was 1052 mm. Geologically, this area was built of Eocene Flysch marls, sandstones and siltstones with lenses of calcirudites and calcarenites (Marinčić et al. 1971;Marinčić et al. 1976). These sediments characterized a high proportion and wide range of carbonate component. According to Miščević and Vlastelica (2014) and Vlastelica (2015), CaCO 3 varies in the range of 42% to 79% and 32% to 89%, respectively. Water impermeable geological base and sloping terrain, mainly between 10-30% make this area vulnerable to the erosion. So, terracing is basic measures to the soil protection. Investigated soils are rich in carbonates, have alkaline reaction, very low to medium humus content and silty loam texture (Miloš and Maleš, 1998). According to the World Reference Base for Soil Resources (IUSS Working Group WRB, 2014) investigated soils we classified as Terric Anthrosols (Calcaric, Siltic/Loamic, Escalic). Current agricultural production is characterized by the small, mixed and dislocated parcels of the olive groves, vineyards, Mediterranean orchards and abandoned terraced soil. For PLSR predictions we used laboratory and spectral measurement of the CaCO 3 content in a total of 180 top-soil samples selected from a Soil spectral library of Dalmatia, Croatia described by Miloš (2013). The CaCO 3 content was analysed using Scheibler calcimeter (JDPZ, 1966).

Spectra measurements, data pre-processing and selection of the optimal PLSR model
The spectra measurements of air-dried and sieved (2mm) soil samples were obtained in a laboratory using a portable TerraSpec 4 Hi-Res Mineral Spectrometer with a wavelength range of 350-2500 nm that were recorded output on a 1 nm interval. The correction with a standardized white Spectralon® panel (Analytical Spectral Devices, Boulder, CO, USA) with 100% reflectance was made prior to the first scan and after every ten samples. The PLSR model was optimized by spectral data pre-processing treatments that included (i) a wavelength reduction to 5 nm for the whole region 350-2500 nm using Savitzky-Golay smoothing algorithm and (ii) first-order derivative algorithm with a second order polynomial fit (Savitzky and Golay, 1964). Furthermore, to eliminate the noise at edges of each spectrum the spectral range o f the soil spectra was reduced to 400 -2490 nm range. The PLS regression with leave-one-out crossvalidation method (Martens and Naes, 1989;Wold et al. (2001) was used for calibrating the spectra and CaCO 3 content measured in the laboratory. The optimum number of factors in the PLSR model was obtained using leave-one-out cross-validation method (Efron and Tibshirani, 1994).

Model Performance Evaluation
The performance of the PLSR models was evaluated based on four parameters: first, the root mean square error of prediction (RMSEP); second, the ratio of performance to deviation (RPD); third, the range error ratio (RER) and fourth, the coefficient of determination (R 2 ). RMSEP is the average prediction error of the validation samples around the regression line. RMSEP is defined as the square root of the average of squared differences between predicted and measured Y values of the validation samples (Equation 1).
where and are the measured and predicted values of sample i, respectively, and N is the number of samples. The RPD is defined Williams (1987) as the ratio between the reference data standard deviation (SD) and the standard error of the prediction (SEP) given with Equation (2): where SDv is the standard deviation of the validation dataset. The standard error of prediction (SEP) is the standard deviation of differences between the reference values and the predicted values in the validation set (Equation 3). The SEP is the RMSEP corrected for bias (Equation 3). Bias is the average value of the difference between predicted and measured values (Equation 4).
The range error ratio (RER, Equation 5) is the ratio of the difference between the largest and smallest values observed in the reference data set and the SEP (Starr et al. 1981).
where Max and Min are the maximum and the minimum values in the reference dataset.
Classification of prediction success is according to the thresholds given by Malley et al. (2004) which are tabulated in Table 1.  Table 2 shows the descriptive statistics of the carbonate content (CaCO 3 ) analysed using conventional laboratory method analysis (reference dataset) and their calibrated and cross-validated PLSR predictions for the 180 soil samples. The CaCO 3 content for the whole dataset varies within a very wide range from 186.0 to 894.7 g kg -1 . A high an average values of CaCO 3 (547.2 g kg -1 ) shows that the analyzed soils are rich in carbonates. The skewness value for CaCO 3 reference data set of 0.09 and it graphically displays shows normal and near symmetrical distribution (Table 2; Figure 1).   In the visible range, the mean first-derivative reflectance spectra ( Figure  2b) shows adsorption peak around 465 nm and a weak concave shape at the wavelengths around 565-665 nm. They indicate the presence of the chromophorous constituents mainly, Fe oxides and darkness of the organic constituents (Ben-Dor et al. 1999).

RESULTS AND DISCUSSION Soil and Spectral Properties
The mean raw and first derivative spectra (Figure 2a and b) show strong water and OHabsorption in the NIR near 1400 and 1900 nm (Ben-Dor and Banin, 1995;Clark, 1999). Figure 2b shows characteristic carbonate band with an absorption peak of calcite at 2335 nm as a result of the vibrational combinations and overtones of the CO 3 . According to Clark (1999) carbonates have a strong diagnostic vibrational absorption band at 2300 to 2350 nm and three weaker bands occur near 2120 to 2160 nm, 1970-2000 nm and 1850-1870 nm. Figure 2b shows a few other prominent absorption peaks between 2200-2300 nm and around 2440 nm. This is due to metal-OH combination indicating vibrational stretching of H-O-H and OHions in secondary clay minerals (Clark 1999;Viscarra Rossel et al. 2006b). These absorptions indicate the presence and the combined effect of secondary minerals such as smectite, illite and vermiculite (Viscarra Rossel et al. 2006b).  Table 3 shows the calibration and cross-validation results of the PLS regression models for the CaCO 3 content. The prediction error of the CaCO 3 measured as RMSEP was 57.9 g kg -1 (Table 3). It can be considered that 2 times the RMSEP represents about 95% confidence interval of the test set mean. So for that confidence limit, there is a 95% chance that the mean value of the CaCO 3 predicted model lies between 431.4 and 663.0 g kg -1 . The most commonly used parameters for evaluation of prediction accuracy of the CaCO 3 model (R 2 , RPD and RER; Table 3) indicated moderately successful prediction according to thresholds given by Malley et al. (2004).

Performance of calibration and validation models
The predicted parameters (R 2 , RPD and RER) in the combination with a high value of standard prediction error, measured as RMSEP (Table 3) suggest that the created model is suitable as a quality control method (screening) of the CaCO 3 content in investigated soil.
Our results show better prediction accuracy of CaCO 3 content compared to study of Volkan Bilgili et al. (2010) that reported R 2 0.71, RPD 1.84 and RER 11.02 with a significantly narrower range of carbonates (25.7-98.7 g kg -1 CaCO 3 ) and lower mean value of 55.1 g kg -1 CaCO 3 . Leone at al. (2012) also achieved a lower prediction accuracy of PLSR model compared to our results (R 2 0.79 and RPD 2.07) with carbonates ranging from 0.0 to 636.0 g kg -1 CaCO 3 and mean value of 70.9 g kg -1 CaCO 3 . Gomez et al. (2013) obtained R 2 0.71 and RPD 1.89 with carbonate range of 0.5-375 g kg -1 CaCO 3 and mean value of 65 g kg -1 CaCO 3 , that also show lower prediction accuracy compared to our model. Some researchers reported even lower values of validation parameters e.g. Summers et al. (2011) R 2 0.69 and RPD 2.1 and Khayamim et al. (2015) R 2 0.58. However, some authors obtained better validation parameters of CaCO 3 prediction models compared to ours. For example, Canasveras et al. (2012) achieved R 2 0.93 and RPD 3.5 with carbonate content variation from 20-969 g kg -1 CaCO 3 and a mean value of 559 g kg -1 CaCO 3 . Carmon and Ben Dor (2017) reported R 2 of 0.94 for range of 0.0 to 74.27 % CaCO 3 , while Gras et al. (2014) obtained even higher R 2 of 0.99 and RPD 8.6 for data set with carbonate range of 0.0-84.9 g/100 g of soil and mean value of 16.1 g/100 g of soil. The possible factors of the relatively large differences in the accuracy of the CaCO 3 content estimation are related mainly to nature of soil as a very complex mix of the mineral and organic compounds, parent material and calibration methods.
Importance of the spectral ranges and wavelengths Figure 3 illustrates the importance, measured with regression coefficients, of each wavelength to the prediction model of CaCO 3 content. The highest values of the regression coefficients had wavelengths in the NIR spectral range at 2325 nm to 2365 nm with peak at 2340 nm. This is stated in accordance with previously established spectral activity for calcite in the NIR spectral range (700-2500 nm) with the strongest diagnostic vibrational absorptions at 2300-2350 nm (Clark 1999).
The wavelengths retained as a significant (p <0.01) are marked in black. Furthermore, Figure 3 shows a high regression coefficient (contribution to the model) of the wavelengths between 2430 nm to 2470 nm and between 2215 nm to 2285 nm. The high values of regression coefficients of these absorptions can be related to the presence of secondary clay minerals (Clark 1999;Viscarra Rossel et al. 2006b). In visible range, the most significant wavelengths were obtained between 455 and 475 nm, that can be related to the presence of the Fe oxides and organic constituents (Ben-Dor et al. 1999).

CONCLUSION
This study showed that: -the CaCO 3 content in anthropogenic soils derived from Flysch deposits varied within a very wide range (186.0 to 894.7 g kg -1 ) with a mean value of 547.2 g kg -1 and showed normal near symmetrical frequency distribution -the PLSR model for quantitative prediction of CaCO 3 content in investigated soils with R 2 0.86, RPD 2.42 and RER 11.9 was moderately successful -the created model is suitable as a quality control (screening) of the CaCO 3 content in terraced soil derived from Flysch deposits -the largest contribution to the CaCO 3 prediction model gives wavelengths indicating the spectral activity of calcite and clay minerals.