ECOFI: a database of sugar and energy cane field trials

Agroecological studies on sugarcane dealing with genotype, by environment and by management interactions commonly generate complex datasets. To facilitate the use of these datasets, a relational database, named ECOFI, was designed from the analysis of the content and the structure datasets of multidisciplinary experiments with sugarcane and energy cane. The database described in this paper includes data from 58 trials carried out in 11 countries from 1986 to 2016, including 24 trials in Reunion Islands and 15 in Guadeloupe. Taking into account plots within the trials and crop cycle, it includes 725 crop cycles in total, with 60 different cultivars. The datasets contain data for crop observations (e.g. dry mass), soil (water contents), weather (all essential meteorological parameters) and management (sowing, cultivars and harvest). Additionally the datasets contain metadata to qualify observations. This dataset provides an adequate experimental set to calibrate or validate crop model simulations under genotype x environment interaction.


BACKGROUND AND ORIGINAL PURPOSE:
This article presents the result of a collaborative effort by a multidisciplinary team of computer scientists, biostatisticians, agronomists, weed experts and crop modelers who have been confronted with data and metadata management issues in agroecological research on sugarcane. Relational databases are widely used in agriculture to manage field experimental data as well as pest and weed management information (Bechini and Stöckle 2007). The ECOFI database (experimental database on ecophysiological and environmental observations) has been developed (Auzoux et al., 2017) to solve the problem of security and persistent storage of data, to standardize the annotation of heterogeneous data, to improve analysis and facilitate access to data for sugarcane modelling in Reunion and in other sugar-producing countries in Africa and Central America. First, the database was built with the aim of providing simulation data to the sugarcane growth model MOSICAS (Martiné et al., 2007). The datasets are divided into several text files which are freely available under doi:10.18167/DVN1/1GCL8F and located in the CIRAD's dataverse, an online repository with shared data.

STRUCTURE OF THE ECOFI DATASETS:
Unlike a traditional database that is modeled as a function of the problem, ECOFI database can manage all experimental data collected through agroecological studies that take account of many factors varying in time and space. The database consists of 23 tables linked to each other ( Figure 1). The description of the different tables is presented in Table 1 to facilitate data access.

WEATHER AND SOIL DATA:
Weather and soil data are described in each site. The weather data are given at a daily time step or averaged every 10 days. Available meteorological data are: minimum, mean and maximum temperature; minimum, mean and maximum relative humidity; total and maximum wind speed; global radiation; potential evapotranspiration (Penman-Monteith); rainfall. In some case, some meteorological data are missing when not measured (in particular for relative humidity). The database made it possible to associate two set of rainfall data for a specific site (one in "Wdat" and the other in "Raindat"). The variable units are described in "Tables". Soil data are described in "Genesoil" and "Soillayer" (defining multiplied soil layers) and limited to variables used for the soil water balance assessment: soil water capacity, field capacity, wilting point, saturation, bulk density.

MANAGEMENT DATA:
Management information of each trial is detailed in "Plot" (planted area of each plot, number of buds and row spacing), "Plotcycle" (crop cycle: plant or ratoon crops, irrigation management) and "Varcycle" (planted variety, starting and ending dates of the cycle). If not rainfed, the irrigation management are detailed in "Irrdata" and "Irrigation" where daily water amount at specific dates is indicated as well as the efficiency of the irrigation (taking into account water loss). In case the irricode is "NC", it means that the crop is irrigated but that the irrigation amount and frequency is not available. Two types of irrigation were used in the different trials (sprinkler or drip irrigation). In total, 60 different varieties were used in the different trials (41 sugarcane and 19 energy cane). Nevertheless, the number of crop cycle for each variety is heterogeneous, with a maximum of 95 crop cycle for the R570 variety.

PLANT AND SOIL OBSERVATION:
The ECOFI database provides measurements in the different trials at various levels: plot (e.g. soil water content and aboveground dry matter), plant (e.g. aboveground fresh matter per plant), stem (e.g. stem sugar content), leaf (e.g. fresh matter of leaves) or limb level (e.g. width of leaf blades). All measured plant, soil and weather variables are described in the "Variables" table with their respective unit, type and description.

ADDING NEW DATA:
In the event that users would like to add observations, the new trial must be created in "Trial" with its location in the table "Site". The elementary plots within the trial are described in "plot" with potentially different planting densities and row spacing. Finally, "Plotcycle" and "Varcycle" describe the cane cycle and the variety planted. Observations can then be added to "Obsplant" and "Obssoil" after having specifying the new definitions in the metadata table "Variables", if required. The observed variables are standardized as much as possible using terminology from controlled vocabulary in order to make the data interoperable. If the information is available, the user can also associate soil and climate data with the new dataset.

DATA APPLICATION:
The ECOFI database present various uses for meta-analyze and cropping system models: data exploration, data management and model calibrations. A strong point of the database in data mining is the large number of varieties included. In Figure 2 we present an example with the growth differences in dry mass between varieties obtained from the database. ECOFI could be used to further explore the effects of the interaction between climate and variety, or with management (irrigation, row spacing...). The database has thus been used in the context of projects on sugarcane (e.g. AGMIP) or energy cane (e.g. SYPECAR).

Figure 2.
Growth rates in aerial dry matter (T ha -1 d -1 ) depending on the cane variety. Growth rate was calculated from the database as the aerial dry matter divided by the number of days since the beginning of the crop cycle. Data are presented using a beanplot with median (black line), normal density (gray background) and populations (small black lines).The horizontal dashed line indicated the overall median. The number of plot cycle available for each variety with aerial dry matter is indicated.
In addition, the main utility of the ECOFI database is the linked with crop models. Many operations are required to extract experimental datasets and create input files for cropping system models. Data manipulation is tedious and difficult to automate for modeling. Through simple queries, ECOFI allows the automatic creation of input files for crop models. ECOFI directly provides simulations input for the sugarcane growth model MOSICAS (Martiné et al., 2007). In particular it is very useful for the calibration of new varieties and has been used to calibrate energy (Martiné et al., 2016 and sugarcane varieties . Nevertheless this database has recently been used to simulate sugarcane crop growth with the STICS model (Christina et al., 2018, Chaput et al., 2019.

ACKNOWLEDGMENTS
All the partners and main contributors of the different trials present in the database are described in the "Project" table. We would like to acknowledge the project partners (in particular South African Research Institute, Mauritius Sugarcane Industry Research Institute and eRcane institute) as well as the PhD students who have made it possible to collect and share experimental data (in particular C. Suguitani, M. Aabad, D. Sabatier, M. Gouy).