Modelling the unsaturated hydraulic conductivity of a sandy loam soil using Gaussian process regression

Unsaturated soil hydraulic conductivity is a main parameter in agricultural and environmental studies, necessary for predicting and managing water and solute transport in soils. This parameter is difficult to measure in agricultural fields; thus, a simple and practical estimation method would be preferable, and quantitative methods (analytical and numerical) to predict the field parameters should be developed. Field experiments were conducted to collect water quality data to model the unsaturated hydraulic conductivity of a sandy loam soil. A mini disk infiltrometer (MDI) was used to measure soil infiltration rate. Input variables included electrical conductivity and the sodium adsorption ratio of irrigation water. Suction rate (pressure head), soil bulk density, and soil moisture content acted as inputs, with unsaturated soil hydraulic conductivity as output. The performance of Gaussian process regression (GPR) was analysed, with multiple linear regression (LR) and multi-layer perceptron (MLP) models used for comparison. Three performance criteria were compared: correlation coefficient (r), root mean square error (RMSE), and mean absolute error (MAE). The simulations employed the Waikato environment for knowledge analysis (WEKA) open source tool. The results indicate that the GPR with Pearson VII function-based universal kernel (PUK kernel), cache size 250007, Omega 1.0 and Sigma 1.0 performs better than other kernels when evaluating test split data, with a correlation coefficient of 0.9646. The RMSEs for GPR (PUK kernel), MLP, and LR were 1.16 × 10−04, 1.87 × 10−04, and 2.22 × 10−04 cm·s−1, respectively. Predictive data mining algorithms (DMA) enable an estimate of unknown values based on patterns in a database. Therefore, the present methodology can be put to use in predictive tools to manage water and solute transport in soils, as the GPR model provides much greater accuracy than the LR and MLP models in predicting the unsaturated hydraulic conductivity of a sandy loam soil.


INTRODUCTION
Water management is vital to improve the efficiency and sustainability of agricultural systems, as water is scarce in semi-arid regions such as Saudi Arabia.Soil hydraulic conductivity is a main parameter in agricultural and environmental studies (Gonçalves, et al., 2007).Unsaturated soil hydraulic conductivity controls water movement (Fatehnia et al., 2014), and measuring it is a challenging task, requiring costly, time-consuming, and skilled experimentation (Wosten and Van Genuchten, 1988;Malaya and Sreedeep, 2013).Various techniques have been developed to measure unsaturated hydraulic conductivity in the laboratory and in the field (Klute and Dirksen, 1989).Unfortunately, laboratory studies using repacked soil may have limited use in predicting the effects of water characteristics on soil hydraulic properties (Menneer et al., 2001).Additionally, the number of measurements of unsaturated hydraulic conductivity required to adequately characterize an area can be prohibitive.Thus, it is better to have means to estimate, in a simple and practical manner, the unsaturated hydraulic conductivity (Mbonimpa et al., 2004).The unsaturated hydraulic conductivity of soil could be estimated based on soil texture, the hydraulic conductivity of the soil, soil water properties, the amounts of gypsum and lime present, and the actual and apparent distributions of particle size (Zhuang et al., 2001).Moosavi and Sepaskhah (2012a) developed pedotransfer functions for prediction of unsaturated hydraulic conductivity.The most influential physical soil characteristics in prediction of soil hydraulic conductivity using pedotransfer functions were the soil particle fractions, bulk density, total soil porosity, and initial and near-saturated volumetric soil water content.Mainly, the unsaturated hydraulic conductivity measurements were achieved at diverse tensions of soil moisture (0.2, 0.15, 0.1, 0.06, 0.03, and 0 m).The study results indicated that the pedotransfer function predictions of unsaturated soil hydraulic conductivities at all of the soil tensions were accurate enough for most applications, except for the measured unsaturated soil hydraulic at a tension of 0.1 m and to some extent at a tension of 0.03 m, which were less accurate than the other unsaturated soil hydraulic predictions.Neshat and Farhad (2012) carried out an experiment, using calculations to estimate the unsaturated hydraulic conductivity of a soil, to derive a relationship between the soil's unsaturated hydraulic conductivity and its physical properties.Amer et al. (2009) proposed an equation to predict unsaturated hydraulic conductivity based on water viscosity, acceleration due to gravity, water density, ratio of total volume of pores, and the radius of equivalent cylindrical pore size.To predict the unsaturated hydraulic conductivity of soil, Moosavi and Sepaskhah (2012b) used an artificial neural network model with input parameters of sand, silt, clay, bulk density, soil organic matter, and initial and saturated volumetric water content.The study results showed that an artificial neural network model could accurately estimate the unsaturated hydraulic conductivity, and silt, clay, sand, bulk density, and soil organic matter were the most influential input variables.
Water quality has substantial effects on soil hydraulic conductivity and infiltration (Crescimanno et al., 1995;Springer 122 et al., 1999).Xiao et al. (1992) studied the effect of irrigation water quality on the unsaturated hydraulic conductivity of undisturbed soil in the field.Results showed that, within the operating soil suction range of disc permeameters (0-1.6 KPa), the higher the electrical conductivity of irrigation water, the higher the soil unsaturated hydraulic conductivity.Unsaturated hydraulic conductivity doubled when the electrical conductivity of irrigation water increased from 0.1 to 6.0 dS•m −1 .Also, a high irrigation water sodium adsorption ratio (SARw) has an inverse effect on soil unsaturated hydraulic conductivity.Soil unsaturated hydraulic conductivity decreased with increasing SARw, especially when higher soil suction is present.Moosavi and Sepaskhah (2012c) reported that irrigating with low-quality water may change soil hydraulic properties due to excessive electrical conductivity and water sodium-adsorption ratio.Field experiments were performed with applied soil water tensions of 0-0.2 m to study water quality effects on hydraulic properties of a sandy clay loam soil.The mean unsaturated hydraulic conductivity varied as quadratic or power equations with changes in water electrical conductivity and water SARw, and application of water with a higher electrical conductivity and increased sodium absorption ratio led to lower hydraulic conductivity volumes as the applied tension was increased.The findings indicated that in these types of soils the use of saline waters with an electrical conductivity < 10 dS•m −1 can improve soil hydraulic properties.
With in-situ infiltration measurements via a mini disk infiltrometer, Schacht and Marschner (2015) studied the impact of treated wastewater versus fresh water on hydraulic conductivity of agricultural irrigation.The study reported that the mean hydraulic conductivity values decreased at all treated wastewater sites by 42.9-50.8%,compared with fresh water irrigation sites.Singh et al. (2017) also indicated that the water quality has an effect on the soil infiltration rate, which can be predicted based on cumulative time, the type of impurities in the water, the concentration of impurities in the water, and soil moisture content, by random forest regression.
Soil moisture content and soil bulk density have significant effects on soil unsaturated hydraulic conductivity.Bhatnagar et al. (1979) determined unsaturated hydraulic conductivity in the laboratory for some red and black soils, following water movement into a horizontal column of homogenous soil with uniform packing.A highly significant positive relationship was found between moisture content and hydraulic conductivity values in all soils studied.It was also concluded that the unsaturated hydraulic conductivity decreases rapidly with a decrease in moisture content; this decrease depends on the soil constituents and properties, and differences between soil types were clear.However, the effect of compaction on unsaturated hydraulic conductivity was not consistent.At the same water content value, unsaturated hydraulic conductivity was sometimes higher or lower in the compacted soil samples, compared with uncompacted soil (Andrade, 1971).In another study, the unsaturated hydraulic conductivity decreased with increasing bulk density (Dec et al., 2008).
Unsaturated flow should be estimated precisely, as its evaluation has important implications for transient infiltration processes due to the high nonlinearity of soil water characteristics.However, the methods available to obtain soil hydraulic parameters can be difficult and time-consuming to implement in practice (Angulo-Jaramillo et al., 2000).Thus, researchers have been developing analytical and numerical methods to calculate parameters that are difficult to measure in the field (Mollerup et al., 2008).Predictive data mining algorithms enable the estimation of unknown values based on patterns discovered from a database (MahaLakshmi, 2012).The main aim of the data mining process is to retrieve the data from a dataset and transform it into a more meaningful form with the help of algorithms (Jamil, 2016).Elbisy (2006) applied artificial neural network models (feed-forward back propagation, and radial basis function, RBF) to predict the field-saturated soil hydraulic conductivity of sandy soil based on basic saline and alkaline soil data.The results indicated that the back propagation neural network is more accurate than the RBF neural network.Moreover, the support vector machine methodology was successfully applied to develop pedo-transfer functions (PTFs) that used different input predictors to estimate soil hydraulic parameters (Twarakavi et al., 2009).Elbisy (2015) explored the use of data mining algorithms (support vector machine) to predict the field saturated soil hydraulic conductivity of sandy soil, based on basic soil properties of saline and alkaline soil datasets.Data inputs were hydraulic conductivity, clay/silt ratio, liquid limit, hydrocarbonate anions, chloride ions, and calcium carbonate content.The influence of three kernel functions (linear, radial basis, and sigmoid) on the performance of the support vector machine model (SVM) was investigated using field data.The radial basis model performed satisfactorily, with a modelling efficiency of 0.972 and a correlation coefficient of 0.976.The excellent performance of the support vector machine (SVM) with the radial basis model (RBF) demonstrated its potential as a useful tool for the indirect estimation, with maximum obtainable prediction accuracy, of soil hydraulic conductivity of sandy soil.Sihag et al. (2017) predicted the unsaturated hydraulic conductivity of soil using adaptive neuro fuzzy inference system (ANFIS), multi-linear regression (LR), and artificial neural network (ANN).Laboratory experiments were carried out on 46 samples of sand, rice husk ash and fly ash mixture.The results suggest improved performance by Gaussian membership function than triangular and generalized bell-shaped membership-based ANFIS.LR is better than ANN and Gaussian membership functionbased ANFIS for unsaturated hydraulic conductivity.Sihag (2018) developed fuzzy logic and ANN-based models for estimating the unsaturated hydraulic conductivity of soil.A mini disk infiltrometer is useful for determining infiltration characteristics.The mini disk infiltrometer (Decagon Devices, Inc.) at a suction rate (pressure head) varying from 1 to 6 cm was used to determine the unsaturated hydraulic conductivity of soil of sandy soil.All the measurements were done on predetermined initial condition of different proportions of rice husk ash and fly ash mixed with sand.For modelling, randomly selected (70%) data was applied for training and residual (30%) for the test.The prediction with ANN approach works well, with a correlation coefficient value of 0.8662 (RMSE, 4.5607 cm•h −1 ).
The increasing availability of large quantities of management data in agricultural activities enables datadriven approaches, which are gaining attention.There are various ways data-driven techniques can be applied, and each incorporates different assumptions about the nature of the underlying processes.Gaussian process regression (GPR) is a probabilistic and non-parametric model (Azman and Kocijan, 2007) and hence can model complex systems whilst handling uncertainty in a principled manner (Richardson et al., 2017).GPR has good nonlinear mapping ability.It can reflect the inherent nonlinearity, avoid the deficiency of traditional methods in nonlinearity, and can improve the accuracy and reliability of predictive results, thus making it an effective method to improve predictive accuracy (Dingwen, 2012).Gaussian process regression (GPR) has been successfully adopted for solving different problems.It was employed for predicting soil electrical resistivity based on soil thermal resistivity, percentage sum of the gravel and sand size fractions, and degree of saturation.The developed GPR was compared with an artificial neural network.The results showed that GPR is an efficient tool for predicting soil electrical resistivity (Samui, 2014).Moreover, GPR has been used for predicting stream water temperature.The proposed approach was compared with traditional modelling schemes on measurements obtained from the Drava River, Croatia.The presented methodology can be used as a basis for predictive tools for water resource managers (Grbić et al., 2013).In addition, in the study of Holman et al. (2014), GPR was employed for estimating reference crop evapotranspiration from alternative meteorological data sources and results showed that GPR models provide much greater accuracy than baseline least-square regression models.Sihag et al. (2018) applied the artificial neural network (ANN) approach to estimate the infiltration rate of the soil.The performance of ANN was employed with other types of artificial intelligence approaches (GPR, gene expression programming (GEP)), and generalized neural network (GRNN)).The GPR, GRNN, and GEP models provided good estimation performance, but the ANN model performed better than these types of artificial intelligence approaches (correlation coefficient of up to 0.9816).Vand et al. ( 2018) applied diverse infiltration models using support vector machine, GPR, and multiple linear approaches to predict the infiltration rates of some Iranian fields.The study concluded that the Pearson VII kernel function performed well in comparison to radial basis kernel function, in both support vector machine as well as GPR, in predicting the infiltration rate of soil.
Hence, in this study, field experiments using different water qualities were conducted to collect data that represent the unsaturated hydraulic conductivity of sandy loam soil.This field data was used for modelling the unsaturated hydraulic conductivity of the soil based on water and soil properties (i.e., electrical conductivity and the sodium-adsorption ratio of the irrigation water, soil moisture content, soil bulk density, and suction rate).In particular, this study aimed to analyse the performance of Gaussian process regression (GPR) in predicting unsaturated hydraulic conductivity.A multiple linear regression (LR) and a multi-layer perceptron (MLP) model were also used as baseline for comparison with the Gaussian process regression (GPR) model.

Soil and water sample characteristics
Experiments were conducted in a field located in Huraimla Governorate, Riyadh, Saudi Arabia (coordinates: 25.11° N, 46.12°E, captured using a Garmin GPS 60 with positional accuracy < 15 m).Three soil samples were taken from the top 20 cm of the soil.Soil samples were analysed in the laboratory of the Soil Department, College of Food and Agriculture Sciences, King Saud University, Riyadh, Saudi Arabia.The experimental field was classified as sandy loam soil, with sand content of 67%, silt content of 28% and clay content of 5%, organic matter of 1.95%, soil electrical conductivity of 2.65 dS•m −1 , and soil pH of 8.9.The soil water content (%, dry basis (db)) during field experiments was measured using an electric oven for 24 h at 105°C.Soil bulk density was calculated based on dried soil mass and volume of the core sample.
Table 1 shows the chemical characteristics of water samples, electrical conductivity (ECw), sodium adsorption ratio (SARw), and pH of irrigation water used in the field experiments to study the interaction effect of irrigation water and sandy loam soil.The unsaturated hydraulic conductivity was measured using a mini disk infiltrometer (MDI, Decagon Devices Inc., Pullman, Washington, USA).It consists of two chambers (water reservoir and bubble chamber), connected via a Mariette tube to provide a constant water pressure head of −0.5 to −7 cm (equivalent to −0.05 to −0.7 kPa).The bottom of the MDI contains a porous sintered steel disk.The water-filled tube is placed on the soil surface, resulting in water infiltrating into the soil, with the volume of water and speed of infiltration dependent on the sorptivity and hydraulic conductivity of the soil.Pressure heads (suction rates) of −1, −2, −3, −4, −5, and −6 cm were chosen for this study.At all test sites, the infiltration tests were conducted without any modification of the soil surface nor addition of water; similar soil water content and soil bulk density were observed in all undisturbed spots, and no rainfall occurred during the test period.The mini disk infiltrometer (MDI) measurements (Fig. 1) were taken 7 times for each water quality, and the average value used.
The respective measuring spots were typically several metres apart.During the measurement, the volume of the water in the reservoir chamber was documented at regular intervals.Infiltration was computed using Eq. 2, from the cumulative infiltration records versus time following Zhang (1997), Carsel and Parrish (1988), and Decagon Devices Inc. ( 2012) recommendations.
( ) where I is the cumulative infiltration (cm), t is the time (s), and C 1 (cm•s −1 ) and C 2 (cm•(s −1 ) −0.5 ) are parameters.C 1 is related to hydraulic conductivity and C 2 is related to soil sorptivity.The hydraulic conductivity (Ki) of the soil is then computed from Eq. 3.
where C 1 is the slope of the curve of the cumulative infiltration versus the square root of time and (A) is a value relating the Van Genuchten parameters for a given soil type to the suction rate and radius of the infiltrometer disk.The values of A can be calculated by Eq. 4 and Eq. 5 (Carsel and Parrish, 1988).
(4) ( ) where n and α are the Van Genuchten parameters for the soil, r 0 is the disk radius and h o is the suction at the disk surface.The Van Genuchten parameters for the 12 texture classes were obtained from Carsel and Parrish (1988).Sporadically occurring negative values for hydraulic conductivity indicate unsteadiness of the particular measurement and were ignored in the further calculation (Schacht and Marschner, 2015).

Datasets
The collected dataset contains a total of 48 field measurement instances having 4 attributes.The data were randomized, and the Waikato environment for knowledge analysis (WEKA) tool was used to obtain a percentage of the data for building the model (85%, 41 points), and the rest (15%, 7 points) were used for testing.The input variables in this study are SARw, ECw, soil moisture content, soil bulk density, and suction rate.Descriptive statistics for input and output variables are shown in Table 2 for the entire dataset.

Predictive data mining techniques examined in this research
The predictive data mining techniques examined in this research were Gaussian process regression, linear regression, and the multi-layer perceptron neural networks, and simulations were done using the WEKA open-source tool (Garner, 1995).The WEKA machine learning workbench provides an environment for automatic classification, regression, clustering, and common data mining problems in bioinformatics research.It has a userfriendly graphical interface to compare the various algorithm results (Frank et al., 2004).In the training phase, a model is constructed from the training instances selected by WEKA and in the testing phase, the model is used to assign a label to an unlabelled test instance.Linear regression analyses the relationship between several input variables, and a straight line is fitted to the input variables in the best manner possible.With a good fit, a linear regression model can be used to predict future values of the output variable.WEKA performs standard least-squares linear regression and implements ridge regression (Witten and Frank, 2005).Ridge regression is used to solve problems that are not well-posted, meaning that problems will have weak stability of algorithms to be solved (Wormstrand, 2011).In WEKA, a fixed small ridge parameter of 1.00 × 10 −08 was used, and no attribute selection criterion was designated to perform linear regression.

Multi-layer perceptron (MLP) model
The MLP is an optimum feed-forward artificial neural network (ANN), trained with the back-propagation algorithm, that consists of neurons with substantially weighted interconnections where signals always travel in the direction of the output layer.These neurons are mapped as sets of input data onto a set of proper outputs with hidden layers (Turkan et al., 2016).The input signals are sent by the input layer to the hidden layer without executing any operations.Then, the hidden and output layers multiply the input signals by a set of weights, and either linearly or non-linearly transform the results into output values.The connection between units in following layers has an associated weight (Turkan et al., 2016), and these weights are optimized to compute reasonable prediction accuracy (Elish, 2014;Lek and Park, 2008).A typical MLP with one hidden layer can be described mathematically as follows (Turkan et al., 2016): Equation 6 defines summing products of the inputs (X i ) and weight vectors (a ij ) and a bias term of hidden layer (a 0j ).Also, in Eq. 7, the outputs of hidden layer (Z j ) are obtained by transforming this sum, defined in Eq. 6, by using the activation function g.
( ) The most widely used activation function is the sigmoid function (Karlik and Olgac, 2011), defined in Eq. 8 for the input x.The hidden and output layers are based on this sigmoid function.
( ) Eq. 9 defines summing the products of the hidden layer's outputs (Z j ) and weight vectors (b jk ) and the bias term of the output layer (b k0 ).
( ) In Eq. 10, the outputs of the output layer (Y k ) are obtained by transforming the sum calculated in Eq. 9, and using the sigmoid function g, defined in Eq. 8.
( ) Figure 2 shows the MLP created in the WEKA tool and applied as an artificial neural network (ANN) based on the multilayer perceptron (MLP) algorithm in this study.The same dataset was used as in the linear regression (LR) run.A neural net with 3 nodes in the hidden layer was created by WEKA, as shown in Fig. 2. The neural net was trained for 500 epochs; with a learning rate of 0.3 and a momentum of 0.2 (the WEKA defaults).The number of epochs gives how long the neural net will run, while the learning rate and momentum indicate how the weights are adjusted (Wormstrand, 2011).The error per epoch was 8.2743 × 10 −03 cm•s −1

Gaussian process
A Gaussian process is a collection of random variables, where any Gaussian process finite number has a joint Gaussian distribution (Rasmussen, 2003).A Gaussian process is completely specified by its mean function, and covariance and variance functions (Rasmussen and Williams, 2006).The details of GPR were obtained from Rasmussen (2003).Based on Samui and Jagan ( 2013) and Saini and Chandramouli ( 2013), the following noise dataset can be considered by Eq. 11.
where x is input, y is output and N is the number of data points.In this study, ECw, SARw, SR, MC and BD are used as input variables for the GPR.The output of GPR is χ.So, x value can be calculated by Eq. 12.
It is assumed that the above data are generated from Eq. 13: ( ) ( ) where ε is the Gaussian noise term, ξ is Gaussian distribution (zero mean, variance σ 2 ).
The joint distribution of Y is given by Eq. 14: ( ) ( ) where K(x, x) is the kernel function and I is the identity matrix.For a test input x * , GPR defines a Gaussian predictive distribution over the output y * with mean determined by Eq. 15 and variance by Eq. 16.
To develop the GPR model, a suitable covariance function is required.In this study, the 4 kernel functions available in WEKA are used: the normalized polynomial kernel, the polynomial kernel, the RBF kernel, and the Pearson VII kernel.

Criteria for evaluating the accuracy of the selected predictive models
Experimentally, this study evaluated and compared the prediction accuracy of the selected predictive models based on three performance measurements frequently used in previous studies: correlation coefficient (r), root mean square error (RMSE), and mean absolute error (MAE).These performance measurements are formulated as shown in Table 3, with optimal values.Y i is the observed unsaturated soil hydraulic conductivity, the predicted unsaturated soil hydraulic conductivity is Ŷ i , Yu is the mean of the observed unsaturated soil hydraulic conductivity, Yo is the mean of the predicted unsaturated soil hydraulic conductivity, and Nt is the number of data points in the testing dataset.
The WEKA linear regression model result for unsaturated soil hydraulic conductivity is calculated by Eq. 17.
It can be seen from Eq. 17 that unsaturated hydraulic conductivity increases with increasing ECw and decreases with increasing SARw; these findings are in agreement with those obtained by Moosavi and Sepaskhah (2012c) , who indicated that use of saline waters with an ECw of < 10 dS•m −1 can improve soil hydraulic properties in sandy clay loam soils and that irrigation waters with SARw < 20 (meq•L −1 ) 1/2 may not adversely affect hydraulic attributes when the water is first applied; although higher SARw may negatively affect them.Andrade (1971) reported a very large decrease in soil hydraulic conductivity as water content decreased, and that the effect of compaction on unsaturated hydraulic conductivity (KU) was not consistent and at the same value of water content; unsaturated hydraulic conductivity was sometimes higher in the compacted samples.However, the positive correlation between KU and BD in this study can be attributed to the KU measurements, taken on undistributed soil with different soil moisture content.Also, in this study, the unsaturated soil hydraulic conductivity decreased with increased suction rate (SR), and this finding was in agreement with those obtained by Moosavi and Sepaskhah (2012c), Simunek et al. (1999) and Matula et al. (2015).

DISCUSSION
According to the water quality analysis, HCO 3 may not cause irrigation problems, as its concentration was within the range of recommended guidelines for irrigation water quality, of 0-10 meq•L −1 (Ayers and Westcot, 1994;Shahinasi and Kashuta, 2008).Also, chloride content was within tolerance for irrigation water, under the recommended limit of 30 meq•L −1 .Although the sulfate concentrations in the study area vary considerably, only 6 water samples fell within the acceptable limits of 0-20 meq•L −1 for irrigation water.W7 and W8 exceed sulfate concentration limits, with values of 22 meq•L −1 and 27.04 meq•L −1 , respectively, (Table 1).The pH values were within the permissible limit for irrigated agriculture water, 6.5-8.4 (Ayers and Westcot, 1994).Hence, the investigated water presented no restrictions for irrigation use.
The two most common water quality factors which influence the movement of water into soil (infiltration) are salinity and the sodium content relative to the Ca and Mg content.High salinity water will increase infiltration.Low  The infiltration rate generally increases with increasing salinity and decreases with either decreasing salinity or increasing Na content relative to Ca and Mg.Therefore, the two factors, salinity and SAR, provide information on the ultimate effect of the water quality on the water infiltration rate (Nata et al., 2009).On almost all soils, the range of water SAR that can be used for irrigation, with a low risk of the emergence of harmful levels of exchangeable Na, is 0-10 (Ayers and Westcot, 1994).
To study the impact of SARw on unsaturated soil hydraulic conductivity of a sandy loam soil, a pressure head of −4 cm was employed as a mean value.Figure 3 shows the relationship between SARw and unsaturated soil hydraulic conductivity of sandy loam soil at a suction rate of −4 cm.It is clear that unsaturated soil hydraulic conductivity decreased linearly, with high correlation (R 2 = 0.8999) with an increase of SARw, and this finding agrees with data presented by Xiao et al. (1992).Figure 4 illustrates the relationship between suction rate and unsaturated soil hydraulic conductivity of sandy loam soil at SARw of 2.46 (meq•L −1 ) 1/2 (ECw was 4.72 dS•m −1 , average MC and BD were 12.12% db and 1.63 g•cm −3 , respectively).A polynomial relationship was found, with R 2 of 0.9698; the unsaturated soil hydraulic conductivity decreased with increase of suction rate (Fig. 4) and this finding agrees with data presented by Moosavi and Sepaskhah (2012c), Simunek et al. (1999) andMatula et al. (2015).

Prediction model performance
The objective of a learning algorithm is to develop a model with good generalization, so there can be a suitable practical model (Munir and Winarko, 2015).Table 4 shows the WEKA information and kernel used in the GPR model.Also, Fig. 5 shows the time spent building each of the selected predictive models.The GPR-Pearson VII kernel function model with a cache size of 250007, Omega of 1.0, and Sigma of 1.0 took the least time to build compared with other kernels.
The measured performance of the prediction models in terms of r, RMSE, and MAE, for all testing data, is presented in Table 5, which shows that all the listed models had good prediction performance.The RMSE statistics indicate only the model's ability to predict away from the mean.The MAE is the most natural and unambiguous measure of the average error The relationship between suction rate and unsaturated soil hydraulic conductivity of sandy loam soil at sodium adsorption ratio (SARw) of 2.46 (meq•L −1 ) 1/2 (ECw was 4.72 dS•m −1 and average MC and BD were 12.12% db and 1.63 g•cm −3 , respectively). Figure 6 illustrates the relationship between the predicted and actual unsaturated soil hydraulic conductivity for all predictive models for 7 testing data points.The figure shows fair relationships between predicted and actual values.Apparently, the GPR-Pearson VII kernel function gives the best representation of actual experimental data, with the highest R 2 at 0.9646 (Table 5).This approach provides great prediction capacity and does not require knowledge of the input parameters, but its prediction capability is limited by the information content of the data.

CONCLUSIONS
This research was mainly conducted to evaluate the potential for using data mining techniques for predicting the unsaturated hydraulic conductivity of a sandy loam soil based on water and soil properties.In particular, data mining algorithms of Gaussian processes, artificial neural network based on multilayer perceptron (MLP), and linear regression were generated and individually tested.The analytical results suggest that all of the tested models can provide good prediction accuracy, with correlation coefficients (r) ranging from 0.9162 to 0.9646.The Gaussian processes regression model with Pearson VII kernel function showed the best prediction accuracy as an individual data mining model.With the demonstrated potential of using data mining models to predict the unsaturated hydraulic conductivity of a sandy loam soil, future research can adopt this approach to study other variables in the field of managing water and solute transport in soils that cannot be more easily measured.

Figure 6
The relationship between the predicted and actual unsaturated soil hydraulic conductivity, 3 predictive models for 7 testing data points

Figure 1
Figure 1 Mini disk infiltrometer used for field infiltration measurements

Figure 2
Figure 2The MLP created in the WEKA tool for prediction of unsaturated soil hydraulic conductivity of sandy loam soil

Figure 3
Figure 3The relationship between water sodium adsorption ratio and unsaturated soil hydraulic conductivity of sandy loam soil at a suction rate of −4 cm Scheme:weka.classifiers.functions.Gaussianprocesses -L 1.0 -N 0 -K "weka.classifiers.functions.supportVector.NormalizedPolyKernel -C 250007 -E 2.0" Figure 5The time spent building the selected predictive models

TABLE 3 Criteria for evaluating the accuracy of the selected predictive models
1 MAE https://doi.org/10.4314/wsa.v45i1.14Available on website http://www.wrc.org.zaISSN 1816-7950 (Online) = Water SA Vol. 45 No. 1 January 2019 Published under a Creative Commons Attribution Licence 127 salinity water, or water with high Na to Ca and Mg ratio, will decrease infiltration.Both factors can operate concurrently.