COMPARISON OF THE EFFICIENCY OF CLASSICAL AND FUZZY REGRESSION MODELS FOR CROP YIELD FORECASTING WITH CLIMATOLOGICAL ASPECT

This paper presents the application of fuzzy concepts in the field of crop yield forecasting. In this regard, classical and fuzzy, using symmetric and nonsymmetric triangular fuzzy number coefficients; regression models for wheat and oil seeds yield forecasting were used in Zanjan, West and East Azarbaijan Provinces (1984/2013). The predominance of various climatological parameters was determined using maximum correlation coefficient between climatological parameters and crop yield. The sensitivity analysis of climatological parameters indicated the diversity of climatological parameters in different Provinces. RRMSE criteria decreased 37.76% in symmetric fuzzy regression compare to classical and 15.6% in non-symmetric fuzzy compare to symmetric fuzzy regression. Based on error criteria, fuzzy regression has better performance in relation to the classical regression. There were not major differences between the performance of symmetric and non-symmetric fuzzy regression.


INTRODUCTION
The knowledge concerning crop yield is one of the most important challenges in recent years since accurate crop yield forecasting is essential for the planning and policy making of related agricultural organizations. Climate variability is one of the most significant factors influencing annual crop production, even in high-yield and high-technology agricultural areas. Therefore, more and more attention has been paid to the risks associated with climate change, which will increase uncertainty with respect to food production (Kang et al., 2009).Crop production depends on climate change therefore, representation the dependency using efficient method is necessary. Some models have been widely applied the dependency of crop production to the climate factors. The first models used for large-scale yield simulation were statistical. Average yields from large areas for many years were regressed on time to reveal a general trend in crop yields (Basso et al., 2013).Furthermore, the most investigated statistical crop-yield-weather models are multivariate regression models. An agrometeorological crop yield forecasting using a multiple regression was introduced by Gommes (2001) to develop an approach used by the FAO and a number of developing countries for crop forecasting that would provide a good compromise between input requirements and ease of validation (Gommes, 2001). However, considering the inherent and irreparable disadvantages of the multiple regression models, such as variable interdependence or multi-collinearity, stringent linearity and normality assumptions, a more scientific methodology to incorporate weather data into crop yield models, is still under exploration, and remains of great importance to the government, and private sector insurers, and reinsurers (de Leona and Jalaob, 2013).Some regression forms of model which are used for crop yield forecasting are mentioned in the following part.
De Leona & Jalaob applied multiple linear regression for corn yield predicting in Quezon Province (de Leona and Jalaob, 2013). They presented new research possibilities for the application of modern classification methodologies to the problem of yield prediction. Four climatic variables such as temperature, solar radiation, rainfall and humidity as well as data about weather disturbance are gathered. 14 agronomic variables related to corn production are gathered. Full attribute set has better performance in corn yield prediction. Their result indicated that corn yield is greatly affected by planting practices; particularly by the application of right amount of fertilization (de Leona and Jalaob, 2013).An artificial neural network (ANN) approach was used to model the wheat production. From an extensive data collection involving 40 farms in Canterbury, New Zealand, the average wheat production was estimated at 9.9 t/ha. The final developed ANN model was capable of predicting wheat production under different conditions and farming systems using direct and indirect technical factors. The final ANN model could predict wheat production based on farm conditions, machinery condition and farm inputs in Canterbury with an error margin of ±9% (±0.89 t/ha) (Safa et al., 2015). Kumar used adaptive neuro fuzzy inference system (ANFIS) technique based on time series of 27 years to forecast rice yield in India (Kumar, 2011). The visual observation based on the graphical comparison between observed and predicted values and the qualitative performance assessment of the model indicates that ANFIS can be used effectively for crop yield forecasting (Kumar, 2011).Kumar and Kumar provided a number of modified techniques for time series based forecasting for the yield of any crop year (Kumar and Kumar, 2012). The study can contribute to the inventory management of wheat yield and management of storage space. They used the data of previous years and proposing a new method by using the fuzzy time series forecasting technique. The research results were remarkably near to the actual annual production. The time series work almost perfectly if there is no such a sudden rise or fall in production (Kumar and Kumar, 2012).
The present study was carried out to develop regression models in order to forecast wheat and oil seeds yield in some Provinces which are located in the north-west of Iran. In this regard, classical and fuzzy regression models were compared for crop yield forecasting. The inapplicability of fuzzy regression for crop yield forecasting in the previous researches is the reason of using fuzzy regression model in this study. Calibration and validation periods were divided in two parts: (1984/2003) and (2004/2013). Effective meteorological parameters were selected based on the maximum correlation coefficient of climatological parameters and crop yield. Optimization method was used to determine the coefficient of symmetric and non-symmetric membership function.

MATERIALS AND METHODS
Several regression-based methods have been used for crop yield forecasting so that classical and fuzzy regression models were used in this research.
Regression analysis is one of the most widely used methods in yield forecasting and various regression models and techniques have been developed. This technique predicts the response variable, i.e. yield, in terms of explanatory variables such as weather, soil properties, input, and technology (de Leona and Jalaob, 2013).
Classical Regression: Regression analysis is the art and science of fitting straight lines to data patterns. In a linear regression model, the intended variable is predicted from other variables using a linear equation which can be explained such as Equation 1.
Where Y is a dependent variable,x 1 , x 2, … , x n are independent variables,A 0 ,A 1 ,...,A n are coefficients of equation.
Fuzzy Regression: In conventional regression analysis, the deviations between observed and estimated values are assumed to be due to random errors. However, most cases, these are due to the indefiniteness of the structure of a system or imprecise observations. Thus, uncertainty in this type of regression model becomes fuzziness and not randomness. Fuzzy linear regression is a fuzzy type of classical regression analysis in which some elements of the model are represented by the fuzzy number.
Fuzzy linear regression was originally introduced by researchers ( Asai et al., 1982). They formulated a linear regression model with fuzzy response data, crisp predictor data and fuzzy parameters as a mathematical programming problem. Diamond proposed the approach of fuzzy least squares to determine fuzzy parameters by defining a metric between two fuzzy numbers (Diamond, 1988). However most of the articles on fuzzy regression analysis use the linear programming to estimate the parameters. In this regard, each additional observation results in several additional constraint sand the linear programming problem become unwieldy very quickly, especially if the fuzzy triangular numbers involved are not symmetric (Arabpour and Tata, 2008;Ghosh;. Modeling fuzzy linear systems has been addressed in fuzzy linear regression analysis. The following model shows the dependence of the output variable on the input variables.
Ã is a set of fuzzy numbers.
Then the regression analysis problem is defined as: given a set of crisp data points (x 1 ,y 1 ), (x 2 ,y 2 ), …,(x n ,y n ) we find a set of fuzzy parameters Where L i a is the lower limit, U i a is the upper limit and C i a is the point The property of the symmetry of the fuzzy coefficient i Ã enables us to establish the following two relations as Equation 5 and 6.
The objective of the fuzzy regression method with non-fuzzy data is to determine the parameter i Ã so that the fuzzy output set,   j y is associated with a membership value greater than h.
Where the value of h is chosen for the purpose of generating the bestfitting model.
In this regard, the goal is to find the fuzzy coefficients that minimize the above-mentioned spread of fuzzy output for all the data sets. The cost function, Z, to be minimized can be written as equation 8. Asai and co-workers formulated a linear programming problem (LPP) to determine the fuzzy number coefficients i Ã of the fuzzy linear model (Asai et al., 1982). Therefore, the minimization of the objective function in the LPP is equivalent to the minimization of the total fuzziness of the linear model f(x, Ã ).
If the triangular, is not symmetric, minimally three parameters are need.  Figure 2.
Where y is a dependent parameter, x is an independent parameter, h is the confidence level parameter, P i a is the point in which s is the left-side spread from the peak point P i a (Yen et al., 1999).
The sensitivity analysis must be conducted on two parameters of symmetric and non-symmetric membership function of fuzzy regression: confidence level parameter and skew factor.
The main objective of the research is the comparison of classical and fuzzy regression performance in the field of crop yield forecasting. In this regard some criteria were used which their mathematical forms are brought in Equation 10 and 11. The minimum values of criteria are related to the best performance of model.
Where O i are observation data, S i are simulation data, RRMSE is relative root mean square error and MRE is mean relative error.
Case Study: According to the objective of study which is crop yield forecasting based on climatological parameters, forecasting of wheat and oil seeds yield in Zanjan, East and West Azarbaijan Provinces was investigated. The analysis of crop fluctuations due to the impact of climate change is one of the major issues in the mentioned Provinces. The climate of Provinces based on De Marton classification in 1984/2013 periods is semi-arid (De Marton climate index of East Azarbaijan =10.65, West Azarbaijan=14.1, Zanjan=11.83). Figure 3 shows the location of Provinces in Iran.

RESULTS AND DISCUSSION
Determination the effective climatological parameters: Climatological parameters have a significant impact on crop yield variations. In this research, the investigated climatological parameters to evaluate the wheat and oil seeds yield forecasting are: air temperature, maximum and minimum temperature, wind speed, air pressure, vapor pressure, relative humidity, maximum and minimum relative humidity, precipitation, sunshine hours, number of cloudy days and dew point temperature. The results of the correlation between climatological parameters and crop yield are presented in Table 1.
Based on the maximum correlation coefficient between crop yield and climatological parameters, four parameters were selected. In East Azarbaijan Province, the maximum correlation coefficient of wheat yield is related to wind speed, sunshine hours, minimum temperature and air temperature, in addition the maximum correlation coefficient of oil seeds yield is related to wind speed, minimum temperature, air temperature and maximum temperature. In Zanjan Province, the maximum correlation coefficient of wheat yield is related to the air pressure, maximum temperature, maximum relative humidity and air temperature. Furthermore the maximum correlation coefficient of oil seeds yield is related to the number of cloudy days, maximum temperature, air pressure and air temperature. In West Azarbaijan Province, the maximum correlation coefficient of wheat yield is related to minimum temperature, air temperature, air pressure and wind speed. Moreover, the maximum correlation coefficient of oil seeds is related to minimum temperature, wind speed, air temperature and maximum temperature. Based on the results, in each crop and province, it can be mentioned that air temperature has a more important role in increasing the correlation coefficient. To investigate the impact of indices and climatological parameters on the wheat yield of Hamadan Providence in the research of Sabziparvar and coworkers, the multivariate correlation of 90% scenarios is significant with the range of 0.67-0.97 (Sabziparvar et al., 2012).The relative humidity, minimum and maximum temperature have the maximum impact on wheat yield in the study of researchers in India (Parekh and Suryanarayana, 2012).
Cropyield forecasting using regression models: Modeling crop time series is the next step after effective climatological parameters selection. Modeling of this research is based on classical and fuzzy regression. Classical regression was conducted regarding four climatological parameters for each province and crop. Symmetric and non-symmetric membership functions were used for fuzzy regression modeling which confidence level parameter must be determined in this regard. Confidence level parameter determination is based on the model performance investigation using different values of confidence level parameter and at least the parameter selection is related to the model performance with minimum error or maximum efficiency.
The method used to convert the output variable from fuzzy state to the deterministic one is the center of area method. The results of optimization which is indicative of fuzzy regression coefficients were investigated with different values of confidence level parameter. The variation of fuzzy regression performance based on the confidence level parameter is low but the coefficient variation of the first climatological parameter for wheat yield performance in East Azarbaijan and for forth climatological parameter of the oil seeds in West Azarbaijan are illustrated in Figure4.
The fuzziness of each variable has a close relation with the spread parameter of membership function. According to Figure 4, the sensitivity analysis indicated that changing the value of confidence level parameter will not change the center of each i Ã but will influence the values of the spread. In fact, the variation of the spread of fuzzy number coefficient is influenced by confidence level parameter. The spread increasing of fuzzy number coefficient and no changes of center with increasing the confidence level parameter are the results of the research conducted by researchers (Yen et al., 1999). In the case of non-symmetric membership function modeling, the sensitivity analysis is based on two steps: in the first step sensitivity analysis is related to the skew factors and in the second step, sensitivity analysis is related to the confidence level parameter based on the selected skew factors of the first step. The results of oil seeds skew factors in East Azarbaijan with confidence level parameter equal to 0.5 are presented in Table 2. According to the sensitivity analysis in Table2, k 0 =1.9,k 1 =2.3,k 2 =2.6,k 3 =k 4 =1.9 have the minimum error and they can be selected as the major skew factors. In the research of Yen et al. (1999) in the case of nonsymmetric membership functions as the skew factor increases, the value of the spread L S 0 decreases and the center, P a 0 , increases. These results can be found in this research which is illustrated in Figure 5.
Comparison of the efficiency of classical and fuzzy regression models for crop yield... 245 Yen et al.(1999) indicated that the variation of skew factors have influence on only P a 0 but not on the other coefficients. The results were investigated in the case of wheat yield and Zanjan Province in Table 3.
The peak point of constant parameter changes with skew factor variations but skew factor variations cannot change the peak point of other coefficients.  Comparison regression models: After fuzzy coefficient determination, wheat and oil seeds yield were determined using two types of modeling; namely classical and fuzzy regression which the results of wheat modeling of East Azarbaijan are illustrated in Figure 6.
It is clear that the differences between observation and classical regression yield are high. Using symmetric and non-symmetric fuzzy regression, the differences between observation and modeling data are reduced. Some criteria were used for modeling methods comparison and the results are presented in Table 4.  RMSE criteria decreased 37.76% in symmetric fuzzy regression in comparison to the classical ones and 15.6% in non-symmetric fuzzy in comparison to the symmetric fuzzy regression; MRE criteria decreased 28.14% in symmetric fuzzy regression in relation to the classical ones and 13.91% in nonsymmetric fuzzy in relation to the symmetric fuzzy regression. Error criteria decreasing from classical to fuzzy regression model is obvious, therefore the results of forecasting are improved using the efficient regression models. In this regard, some researches such as Safa et al. 2015 and Kumar 2011 used another regression based models like ANN with improvement forecasted results.
For wheat case, MRE criteria decreased 68.866% in symmetric fuzzy regression in relation to the classical ones and 7.69% in non-symmetric fuzzy in relation to the symmetric fuzzy regression. For oil seeds case, MRE criteria decreased 37.86% in symmetric fuzzy regression in relation to the classical and 16.19% in non-symmetric fuzzy in relation to the symmetric fuzzy regression. The average RRMSE of wheat in all regression models is 0.48, 0.22 and 0.23 for Zanjan, East and West Azarbaijan Provinces, respectively. The average RRMSE of oil seeds in all regression models is 0.3, 0.36 and 0.38 for Zanjan, East and West Azarbaijan Provinces, respectively.
Except oil seeds of Zanjan, classical regression model has the highest error. In the comparison between symmetric and non-symmetric regression, in most cases, non-symmetric fuzzy regression has the minimum error. In the oil seeds of Zanjan, two missing data exist in the validation period and the lack of similar trend of criteria with the other Provinces and crops can be the result of missing data.

CONCLUSIONS
Crop yield estimation using efficient method is one of the major issues in the agricultural policy. In this study, regarding the issue, regression modeling was performed using fuzzy concepts. Increasing of confidence level parameter increased the spread parameter of fuzzy regression that has not any impact on the center parameter. The left-side spread decreases with the increase of skew factor and the peak point is increased. Based on the mentioned criteria, fuzzy regression improved crop yield forecasting. The difference between symmetric and nonsymmetric performance is low but the non-symmetric is acceptable. The decreasing value of error for wheat is less than that for oil seeds. The average error of all regression models in East and West Azarbaijan is similar, which is less than that in Zanjan. The suggestion of this study is the comparison of fuzzy and artificial neural network models or another regression based models which had the improvement results.