INFLUENCE OF SAMPLE DENSITY, MODEL SELECTION, DEPTH, SPATIAL RESOLUTION, AND LAND USE ON PREDICTION ACCURACY OF SOIL PROPERTIES IN INDIANA, USA
Digital soil mapping (DSM) combines field and laboratory data with environmental factors to predict soil properties. The accuracy of these predictions depends on factors such as model selection, data quality and quantity, and landscape characteristics. In our study, we investigated the impact of sample density and the use of various environmental covariates (ECs) including slope, topographic position index, topographic wetness index, multiresolution valley bottom flatness, and multiresolution ridge top flatness, as well as the spatial resolution of these ECs on the predictive accuracy of four predictive models; Cubist (CB), Random Forest (RF), Regression Kriging (RK), and Ordinary Kriging (OK). Our analysis was conducted at three sites in Indiana: the Purdue Agronomy Center for Research and Education (ACRE), Davis Purdue Agriculture Center (DPAC), and Southeast Purdue Agricultural Center (SEPAC). Each site had its unique soil data sampling designs, management practices, and topographic conditions. The primary focus of this study was to predict the spatial distribution of soil properties, including soil organic matter (SOM), cation exchange capacity (CEC), and clay content, at different depths (0-10cm, 0-15cm, and 10-30cm) by utilizing five environmental covariates and four spatial resolutions for the ECs (1-1.5 m, 5 m, 10 m, and 30 m).
Various evaluation metrics, including R2, root mean square error (RMSE), mean square error (MSE), concordance coefficient (pc), and bias, were used to assess prediction accuracy. Notably, the accuracy of predictions was found to be significantly influenced by the site, sample density, model type, soil property, and their interactions. Sites exhibited the largest source of variation, followed by sampling density and model type for predicted SOM, CEC, and clay spatial distribution across the landscape.
The study revealed that the RF model consistently outperformed other models, while OK performed poorly across all sites and properties as it only relies on interpolating between the points without incorporating the landscape characteristics (ECs) in the algorithm. Increasing sample density improved predictions up to a certain threshold (e.g., 66 samples at ACRE for both SOM and CEC; 58 samples for SOM and 68 samples for CEC at SEPAC), beyond which the improvements were marginal. Additionally, the study highlighted the importance of spatial resolution, with finer resolutions resulting in better prediction accuracy, especially for SOM and clay content. Overall, comparing data from the two depths (0-10cm vs 10-30cm) for soil properties predications, deeper soil layer data (10-30cm) provided more accurate predictions for SOM and clay while shallower depth data (0-10cm) provided more accurate predictions for CEC. Finally, higher spatial resolution of ECs such as 1-1.5 m and 5 m contributed to more accurate soil properties predictions compared to the coarser data of 10 m and 30 m resolutions.
In summary, this research underscores the significance of informed decisions regarding sample density, model selection, and spatial resolution in digital soil mapping. It emphasizes that the choice of predictive model is critical, with RF consistently delivering superior performance. These findings have important implications for land management and sustainable land use practices, particularly in heterogeneous landscapes and areas with varying management intensities.
History
Degree Type
- Doctor of Philosophy
Department
- Agronomy
Campus location
- West Lafayette