#This file contains functions related to managing raw_str_map files
#' Merges two raw_str_map files and returns either a list or a file
#'
#' All entries in the new.raw_str_map file replace those on the source.raw_str_map file
#'
#' Both files must be relative to the current data.dir/raw_str_maps
#'
#' @param source.raw_str_map.file the filename of the source raw_str_map. It must be relative the raw_str_maps of the current data.dir
#' @param new.raw_str_map.file the filename of the mask raw_str_map. It will replace any entries of the source file. It must be relative the raw_str_maps of the current data.dir
#' @param return.file If set to T, a temporary full file path that contains the merge is returned. Otherwise a list with the contents of the merge is returned
#'
#' @return FALSE in case of problem / if return.file=T, the temporary full path of a file that contains the merged result in json / A list with the contents of the merge if return.file=F
#' Gets a fadn.raw.csv (csv file from DG-AGRI) and transforms it accordingly to fadn.raw.rds
#'
#' It saves two files:
#' - One that contain a wide format of the data, i.e. in tabular format that is identical to the csv data. This is uncompressed data.
#' - One that holds the same information in compressed data. It is a list that contains $data.char and $data.num data.tables in long format. 0 values are removed and only the col.id is the index on both data.tables
#'
#' @param file.path the full path of the csv file (the filename must be included)
#' @param sepS the separator of the csv files (by default ",")
#' @param fadn.year the year the csv files refers to (e.g. 2001)
#' @param fadn.country the three letter country code the csv files refers to (e.g. "ELL")
#' @param keep.csv if TRUE, copy the csv files to the CSV directory; else do not copy
#'
#' @return Saves the fadn.raw.rds file and returns TRUE if everything goes well
#' @import data.table
#'
#' @export
#' @examples
convert.to.fadn.raw.rds <- function(file.path="",
sepS=",",
fadn.year= NA,
fadn.country = NA,
keep.csv = F,
col.id = "ID") {
library(data.table)
#if file exist
if(!file.exists(file.path)) {
cat(paste0("File ",file.path," does not exist. Exiting ...\n"))
return(FALSE)
}
# check for fadnUtils.data.dir
if(is.null(get.data.dir())) {
cat("You have first to set the fadnUtils.data.dir using set.data.dir function. Exiting ....\n")
#' Converts an fadn.raw.rds file to fadn.str.rds file using a raw_str_map.json file
#'
#' The raw_str_map.json specification is as follows:
#'
#' {
#' "id": { "COLUMN in every list member in RDS": "COLUMN IN CSV", ....},
#' "info": { "COLUMN in info RDS": "COLUMN IN CSV", ....},
#' "livestock": {}
#' "crops": {
#' "CROP NAME 1": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' "CROP NAME 2": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' ....
#' }
#' }
#'
#'
#' The structure of the str.dir:
#' - A data.dir can hold more than one extractions.
#' - Each extraction has a short name (20 or less characters, whitespace is not allowed)
#' - Each extraction is stored in the data.dir/rds/<extraction_name>
#' - That folder contains the following files:
#' + raw_str_map.json: the raw_str_map
#' + fadn.str.<4-digit YEAR>.<3-letter COUNTRY>.rds: the extracted data
#'
#' Notes:
#' 1) The computed RDS file contains a list structure with the following keys: info, costs, livestock-animals and crops
#' All are data.tables. For all of them, the first columns are those that are contained in the "id" object
#' "info" and "costs" are in table format, i.e. each farm is one row and data is on columns, as defined in the
#' related raw_str_map.json file.
#' "crops" and "livestock-animals" are in wide data format (https://tidyr.tidyverse.org/), where one farm lies accross many rows, and each
#' row is a farm-crop-variableName-value combination
#'
#' 2) In $id, $info and $costs, "COLUMN IN CSV" can have two forms
#' i) a single column name in the fadn.raw csv file or a combination, e.g. "K120SA+K120FC+K120FU+K120CV-K120BV"
#' ii) the form of an object {"source": "the column in the csv", "description": "a description of what this column is about"}
#'
#' 3) We attach certain attributes that are useful for identifying informations:
#' i) In $info and $costs, the attribute "column description" provide information of the formula and the description of each column
#' ii) In $crops and $livestock-animals, the attribute "$crops.descriptions" and "$livestock.descriptions", provide the description of each CROP contained there
#' iii) In $crops and $ the attribute "$column.formulas" provide the formulas used in order to derive the VALUE
#'
#'
#'
#'
#' @param fadn.country string with the country to extract the str data
#' @param fadn.year the year to extract the structured data
#' @param raw_str_map.file the full path to the raw_str_map file.
#' @param str.short_name the short name of the str data. No spaces and text up to 20 characters
#' @param DEBUG if TRUE, prints more details on the conversion process
#'
#' @return Saves the rds.str.fadn and returns TRUE if everything goes well
#' Load all rds.raw.FADN data for selcted years and countries (rbinds them)
#'
#' It adds two columns: load.YEAR and load.COUNTRY in each row. This can be used to group per year,country the data
#'
#' @param countries a character vector with all the 3-letter codes of the selected countries, e.g. c("ELL", "ESP").
#' If "all" is included, all available countries are loaded
#' @param years a numeric vector with the years selected. If "all" is included, all available years are loaded
#' @param col.filter a character vector with the columns to load. If NULL, all columns are loaded. E.g columns=c('ILOTH_VET_V', 'ILVOTH_V','id')
#' @param row.filter a string giving an expression that will be evaluated in order to select rows. If NULL, all rows are returned. E.g. filter='TF8==1'
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN)
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot).

More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format.
# Installation
You can install the development version from Gitlab with:
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
1. FADN data in csv format: the data for loading
2. A json file for extracting the variables
### Folder Structure
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:
```base
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
\-- readme.txt
```
* csv: CSV files are stored here
* fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
* rds: placing r-data in the "rds" directory
* spool: keeping related files
## 2. Import CSV FADN data
First, we will import the data into an R-friendly format using the fadnUtils package.
### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.
# write a new json file without unmatched variables
rewrite_json = T,
# save the new json in extraction_dir
extraction_dir = new.extraction.dir)
```
#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
```{r}
convert.to.fadn.str.rds(fadn.country = "BEL",
fadn.year = 2009,
str.name = new.str.name # extraction_dir
)
convert.to.fadn.str.rds(fadn.country = "BEL",
fadn.year = 2009,
raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
str.name = new.str.name, # extraction_dir
force_external_raw_str_map = T,
DEBUG = F
)
```
#### Files Structure in `rds` folder
After conversion, we can see the `rds` folder:
*`fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
*`test`: extraction_dir for saving the structured r-data and extracting json file
*`fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
*`raw_str_map.json`: default json file
*`rewrite_2014_after_copy.json`: modified json file after checking the variables
```base
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
+-- fadn.str.2009.BEL.rds
+-- raw_str_map.json
\-- rewrite_2014_after_copy.json
```
## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.
### Load raw r-data for the country `BEL` and year `2009`