#This file contains functions related to managing raw_str_map files
#' Merges two raw_str_map files and returns either a list or a file
#'
#' All entries in the new.raw_str_map file replace those on the source.raw_str_map file
#'
#' Both files must be relative to the current data.dir/raw_str_maps
#'
#' @param source.raw_str_map.file the filename of the source raw_str_map. It must be relative the raw_str_maps of the current data.dir
#' @param new.raw_str_map.file the filename of the mask raw_str_map. It will replace any entries of the source file. It must be relative the raw_str_maps of the current data.dir
#' @param return.file If set to T, a temporary full file path that contains the merge is returned. Otherwise a list with the contents of the merge is returned
#'
#' @return FALSE in case of problem / if return.file=T, the temporary full path of a file that contains the merged result in json / A list with the contents of the merge if return.file=F
#' Gets a fadn.raw.csv (csv file from DG-AGRI) and transforms it accordingly to fadn.raw.rds
#'
#' It saves two files:
#' - One that contain a wide format of the data, i.e. in tabular format that is identical to the csv data. This is uncompressed data.
#' - One that holds the same information in compressed data. It is a list that contains $data.char and $data.num data.tables in long format. 0 values are removed and only the col.id is the index on both data.tables
#'
#' @param file.path the full path of the csv file (the filename must be included)
#' @param sepS the separator of the csv files (by default ",")
#' @param fadn.year the year the csv files refers to (e.g. 2001)
#' @param fadn.country the three letter country code the csv files refers to (e.g. "ELL")
#' @param keep.csv if TRUE, copy the csv files to the CSV directory; else do not copy
#'
#' @return Saves the fadn.raw.rds file and returns TRUE if everything goes well
#' @import data.table
#'
#' @export
#' @examples
convert.to.fadn.raw.rds <- function(file.path="",
sepS=",",
fadn.year= NA,
fadn.country = NA,
keep.csv = F,
col.id = "ID") {
library(data.table)
#if file exist
if(!file.exists(file.path)) {
cat(paste0("File ",file.path," does not exist. Exiting ...\n"))
return(FALSE)
}
# check for fadnUtils.data.dir
if(is.null(get.data.dir())) {
cat("You have first to set the fadnUtils.data.dir using set.data.dir function. Exiting ....\n")
#' Converts an fadn.raw.rds file to fadn.str.rds file using a raw_str_map.json file
#'
#' The raw_str_map.json specification is as follows:
#'
#' {
#' "id": { "COLUMN in every list member in RDS": "COLUMN IN CSV", ....},
#' "info": { "COLUMN in info RDS": "COLUMN IN CSV", ....},
#' "livestock": {}
#' "crops": {
#' "CROP NAME 1": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' "CROP NAME 2": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' ....
#' }
#' }
#'
#'
#' The structure of the str.dir:
#' - A data.dir can hold more than one extractions.
#' - Each extraction has a short name (20 or less characters, whitespace is not allowed)
#' - Each extraction is stored in the data.dir/rds/<extraction_name>
#' - That folder contains the following files:
#' + raw_str_map.json: the raw_str_map
#' + fadn.str.<4-digit YEAR>.<3-letter COUNTRY>.rds: the extracted data
#'
#' Notes:
#' 1) The computed RDS file contains a list structure with the following keys: info, costs, livestock-animals and crops
#' All are data.tables. For all of them, the first columns are those that are contained in the "id" object
#' "info" and "costs" are in table format, i.e. each farm is one row and data is on columns, as defined in the
#' related raw_str_map.json file.
#' "crops" and "livestock-animals" are in wide data format (https://tidyr.tidyverse.org/), where one farm lies accross many rows, and each
#' row is a farm-crop-variableName-value combination
#'
#' 2) In $id, $info and $costs, "COLUMN IN CSV" can have two forms
#' i) a single column name in the fadn.raw csv file or a combination, e.g. "K120SA+K120FC+K120FU+K120CV-K120BV"
#' ii) the form of an object {"source": "the column in the csv", "description": "a description of what this column is about"}
#'
#' 3) We attach certain attributes that are useful for identifying informations:
#' i) In $info and $costs, the attribute "column description" provide information of the formula and the description of each column
#' ii) In $crops and $livestock-animals, the attribute "$crops.descriptions" and "$livestock.descriptions", provide the description of each CROP contained there
#' iii) In $crops and $ the attribute "$column.formulas" provide the formulas used in order to derive the VALUE
#'
#'
#'
#'
#' @param fadn.country string with the country to extract the str data
#' @param fadn.year the year to extract the structured data
#' @param raw_str_map.file the full path to the raw_str_map file.
#' @param str.short_name the short name of the str data. No spaces and text up to 20 characters
#' @param DEBUG if TRUE, prints more details on the conversion process
#'
#' @return Saves the rds.str.fadn and returns TRUE if everything goes well
#' Load all rds.raw.FADN data for selcted years and countries (rbinds them)
#'
#' It adds two columns: load.YEAR and load.COUNTRY in each row. This can be used to group per year,country the data
#'
#' @param countries a character vector with all the 3-letter codes of the selected countries, e.g. c("ELL", "ESP").
#' If "all" is included, all available countries are loaded
#' @param years a numeric vector with the years selected. If "all" is included, all available years are loaded
#' @param col.filter a character vector with the columns to load. If NULL, all columns are loaded. E.g columns=c('ILOTH_VET_V', 'ILVOTH_V','id')
#' @param row.filter a string giving an expression that will be evaluated in order to select rows. If NULL, all rows are returned. E.g. filter='TF8==1'
#' The check.column function checks the variables if they exist in a json-file matching the variables
#' in the fadn.raw.rds or fadn.raw.csv (csv-file from FADN-AGRI),
#' returning a list of variables which are not in the raw data file. Then a new json file without unmatched variables can be saved in the extraction_dir.
#' A txt-file (my_logfile.txt) is created in a specific directory (spool.dir) where stores the output messages.
#'
#' @param importfilepath A fadn.raw.rds or fadn.raw.csv file address.
#' @param jsonfile A json file address.
#' @param rewrite_json Logical, if TRUE (default), a new json file without unmatched variables will be saved. The string "rewrite" will be added in front of the original file name, and they are separated through "_". For example, the name of original json file is A.json, the new json file will be saved as rewrite_A.json.
#' Otherwise, do not rewrite json file.
#' @param extraction_dir Extraction_dir is the folder for extracting the data.
#'
#'
#'
#' @return A list of multiple objects. The objects are in the json-file, which have the unmatched variables.
#' This function checks the node of chosen object/category for the json file and find out the variables
#' which are in json file but not in fadn.raw data file.
#' Returning two lists: unmatched variables/column names and modified json.
#' If unmatched variable exists, this variable will be deleted from the json list.
#'
#' @param var A object or category of raw json.
#' @param rds All variables/column names in fadn.raw.rds file.
#'
#' @details A json file has 6 parent objects/categories: "id", "info", "costs", "crops", "subsides", "livstock". This function checks all objects inside the parent object.
#'
#'
#'
#' @author Xinxin Yang
#'
#' @return A list of multiple objects. This list combines no machted variables and the modified json for the chosen object/category.
nested_var<-function(var,rds)
{
res=NULL
newjson=NULL
col_names=names(var)
cat("Number of the totoal objects: ",length(col_names),"\n")
#' Gets a fadn.raw.csv (csv file from DG-AGRI) and transforms it accordingly to fadn.raw.rds
#'
#' It saves two files:
#' - One that contain a wide format of the data, i.e. in tabular format that is identical to the csv data. This is uncompressed data.
#' - One that holds the same information in compressed data. It is a list that contains $data.char and $data.num data.tables in long format. 0 values are removed and only the col.id is the index on both data.tables
#'
#' @param file.path the full path of the csv file (the filename must be included)
#' @param sepS the separator of the csv files (by default ",")
#' @param fadn.year the year the csv files refers to (e.g. 2001)
#' @param fadn.country the three letter country code the csv files refers to (e.g. "ELL")
#' @param keep.csv if TRUE, copy the csv files to the CSV directory; else do not copy
#'
#' @return Saves the fadn.raw.rds file and returns TRUE if everything goes well
#' @import data.table
#'
#' @export
#' @examples
convert.to.fadn.raw.rds<-function(file.path="",
sepS=",",
fadn.year=NA,
fadn.country=NA,
keep.csv=F,
col.id="ID"){
library(data.table)
#if file exist
if(!file.exists(file.path)){
cat(paste0("File ",file.path," does not exist. Exiting ...\n"))
return(FALSE)
}
# check for fadnUtils.data.dir
if(is.null(get.data.dir())){
cat("You have first to set the fadnUtils.data.dir using set.data.dir function. Exiting ....\n")
#' Converts an fadn.raw.rds file to fadn.str.rds file using a raw_str_map.json file
#'
#' The raw_str_map.json specification is as follows:
#'
#' {
#' "id": { "COLUMN in every list member in RDS": "COLUMN IN CSV", ....},
#' "info": { "COLUMN in info RDS": "COLUMN IN CSV", ....},
#' "livestock": {}
#' "crops": {
#' "CROP NAME 1": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' "CROP NAME 2": {"description": "description of crop name", "columns": {"VARIABLE NAME": COLUMN IN CSV", ....} },
#' ....
#' }
#' }
#'
#'
#' The structure of the str.dir:
#' - A data.dir can hold more than one extractions.
#' - Each extraction has a short name (20 or less characters, whitespace is not allowed)
#' - Each extraction is stored in the data.dir/rds/<extraction_name>
#' - That folder contains the following files:
#' + raw_str_map.json: the raw_str_map
#' + fadn.str.<4-digit YEAR>.<3-letter COUNTRY>.rds: the extracted data
#'
#' Notes:
#' 1) The computed RDS file contains a list structure with the following keys: info, costs, livestock-animals and crops
#' All are data.tables. For all of them, the first columns are those that are contained in the "id" object
#' "info" and "costs" are in table format, i.e. each farm is one row and data is on columns, as defined in the
#' related raw_str_map.json file.
#' "crops" and "livestock-animals" are in wide data format (https://tidyr.tidyverse.org/), where one farm lies accross many rows, and each
#' row is a farm-crop-variableName-value combination
#'
#' 2) In $id, $info and $costs, "COLUMN IN CSV" can have two forms
#' i) a single column name in the fadn.raw csv file or a combination, e.g. "K120SA+K120FC+K120FU+K120CV-K120BV"
#' ii) the form of an object {"source": "the column in the csv", "description": "a description of what this column is about"}
#'
#' 3) We attach certain attributes that are useful for identifying informations:
#' i) In $info and $costs, the attribute "column description" provide information of the formula and the description of each column
#' ii) In $crops and $livestock-animals, the attribute "$crops.descriptions" and "$livestock.descriptions", provide the description of each CROP contained there
#' iii) In $crops and $ the attribute "$column.formulas" provide the formulas used in order to derive the VALUE
#'
#'
#'
#'
#' @param fadn.country string with the country to extract the str data
#' @param fadn.year the year to extract the structured data
#' @param raw_str_map.file the full path to the raw_str_map file.
#' @param str.short_name the short name of the str data. No spaces and text up to 20 characters
#' @param DEBUG if TRUE, prints more details on the conversion process
#'
#' @return Saves the rds.str.fadn and returns TRUE if everything goes well
#' Load all rds.raw.FADN data for selcted years and countries (rbinds them)
#'
#' It adds two columns: load.YEAR and load.COUNTRY in each row. This can be used to group per year,country the data
#'
#' @param countries a character vector with all the 3-letter codes of the selected countries, e.g. c("ELL", "ESP").
#' If "all" is included, all available countries are loaded
#' @param years a numeric vector with the years selected. If "all" is included, all available years are loaded
#' @param col.filter a character vector with the columns to load. If NULL, all columns are loaded. E.g columns=c('ILOTH_VET_V', 'ILVOTH_V','id')
#' @param row.filter a string giving an expression that will be evaluated in order to select rows. If NULL, all rows are returned. E.g. filter='TF8==1'
warning("Either provide explicitly a fadnUtils.data.dir to the function orfirst to set the fadnUtils.data.dir using set.data.dir function. Exiting ....")
#This file contains functions related to managing raw_str_map files
#' Merges two raw_str_map files and returns either a list or a file
#'
#' All entries in the new.raw_str_map file replace those on the source.raw_str_map file
#'
#' Both files must be relative to the current data.dir/raw_str_maps
#'
#' @param source.raw_str_map.file the filename of the source raw_str_map. It must be relative the raw_str_maps of the current data.dir
#' @param new.raw_str_map.file the filename of the mask raw_str_map. It will replace any entries of the source file. It must be relative the raw_str_maps of the current data.dir
#' @param return.file If set to T, a temporary full file path that contains the merge is returned. Otherwise a list with the contents of the merge is returned
#'
#' @return FALSE in case of problem / if return.file=T, the temporary full path of a file that contains the merged result in json / A list with the contents of the merge if return.file=F