--- title: "fadnUtils" author: "" output: #rmarkdown::html_vignette word_document vignette: > %\VignetteIndexEntry{fadnUtils} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure \@ref(fig:foo)). More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format. ```{r foo, echo=FALSE, fig.cap="Overview of how the user interacts with the package.",fig.align = 'center', out.width = '100%'} knitr::include_graphics("pic/workflow.png") ``` # Installation `fadnUtils` and Related R packages can be installed. ```{r results='hide', message=FALSE, warning=FALSE} requiredPackages = c('fadnUtils','data.table', 'devtools','jsonlite', 'ggplot2') for(p in requiredPackages){ if(!require(p,character.only = TRUE)) install.packages(p) library(p,character.only = TRUE) } ``` # Usage in Brief After loaded the packages, you will have a functinal R package on your computer. Then, we will talk about using your package online. 1. Create a working directory - a user-defined data directory 1. Import CSV FADN data - convert the csv data into raw r-data - convert raw r-data into str r-data 1. Load r-data and structured r-data 1. Perform analysis ## 1. Create a working directory Frist, User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`. ```{r} # using a local directory CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils" create.data.dir(folder.path = CurrentProjectDirectory) set.data.dir(CurrentProjectDirectory) get.data.dir() ``` ### Required files We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. 1. FADN data in csv format: the data for loading 2. A json file for extracting the variables ### Folder Structure A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this: ```base CurrentProjectDirectory/ +-- csv +-- fadnUtils.metadata.json +-- rds \-- spool \-- readme.txt ``` * csv: CSV files are stored here * fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data * rds: placing r-data in the "rds" directory * spool: keeping related files ## 2. Import CSV FADN data First, we will import the data into an R-friendly format using the fadnUtils package. ### Convert the csv data into raw r-data The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data. ```{r} convert.to.fadn.raw.rds( file.path = "D:/public/yang/MIND_STEP/Fake_Data/BEL2009.csv", sepS = ",", fadn.country = "BEL", fadn.year = 2009, #keep.csv = T # copy csv file in csv.dir col.id = "id" ) ``` At any time, we can check for the current data dir, what csv files (countries, year) are loaded. ```{r} show.data.dir.contents() ``` ### Convert raw r-data into structured r-data Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package: 1. setting a structured data in the `structured` directory, 2. checking the `raw_str_map.file` that all variables can be converted. 3. converting the structured data successfully into `structured` directory. #### Set a `structured` directory for saving the structured data We set a `test` folder to placing the structured data. ```{r} rds.dir = paste0(get.data.dir(),"/rds/") # set a structured name for for saving the structured r-data in rds.dir new.str.name = "test" # set a extraction_dir dir.create(paste0(rds.dir, new.str.name)) new.extraction.dir = paste0(rds.dir, new.str.name) ``` #### Check the variables in the `raw_str_map.file` Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted. ```{r results='hide', message=FALSE, warning=FALSE} list_vars = check.column( # a rds file or a csv file importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"), # a json file jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json", # write a new json file without unmatched variables rewrite_json = T, # save the new json in extraction_dir extraction_dir = new.extraction.dir) ``` #### Convert the raw data into structured r-data using the checked json file Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`. ```{r} convert.to.fadn.str.rds(fadn.country = "BEL", fadn.year = 2009, raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file str.name = new.str.name, # extraction_dir force_external_raw_str_map = T, DEBUG = F ) ``` #### Files Structure in `rds` folder After conversion, we can see the `rds` folder: * `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009" * `test`: extraction_dir for saving the structured r-data and extracting json file * `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009" * `raw_str_map.json`: default json file * `rewrite_2014_after_copy.json`: modified json file after checking the variables ```base rds +-- fadn.raw.2009.BEL.compressed.rds +-- fadn.raw.2009.BEL.rds +-- fadn.raw.2010.BEL.compressed.rds +-- fadn.raw.2010.BEL.rds +-- fadn.raw.2011.BEL.compressed.rds +-- fadn.raw.2011.BEL.rds +-- fadn.raw.2012.BEL.compressed.rds +-- fadn.raw.2012.BEL.rds \-- test +-- fadn.str.2009.BEL.rds +-- raw_str_map.json \-- rewrite_2014_after_copy.json ``` ## 3. Load raw r-data and structured r-data In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder. ### Load raw r-data for the country `BEL` and year `2009` ```{r} my.data.2009.raw = load.fadn.raw.rds( countries = "BEL", years = 2009 ) ``` ### Load structured data for the country `BEL` and year `2009` We can load structured from country `BEL` and year `2009`. ```{r} my.data.2009.str = load.fadn.str.rds( countries = "BEL", years = 2009, extraction_dir = "test" # Location of the str r-data ) ``` ### Load structured data from all available countries and years. The following is an example of loading structured data all available countries and years. ```{r} my.str.data = load.fadn.str.rds( extraction_dir = "test") ``` ## 4. Perform analysis Here are some examples to perform data. ### Collection the common id We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`. ```{r, message=FALSE} # Collection the common id from loaded structured r-data collected.common.id_str = collect.common.id(my.str.data) ``` ### Plot To build a basic plot, we will use the `ggplot` function using the plotting package `ggplot2`. ```{r} crops.data = my.str.data$crops #catering for easier access at next steps #this contains the number of crops for each farm-country-year/ # Be carefule, we hav to filter to count only the LEVL variable crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)] # This displays the quantiles of the number of crops crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)] ggplot(crops.data.Ncrops,aes(y=N,x=1)) + geom_boxplot() + facet_grid(YEAR~COUNTRY) + theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank() )+ ylab("Number of Crops") ``` ### Some other examples ```{r} # sample and representend number of farms my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)), by=.(COUNTRY,YEAR)] # only for full sample (common id over years in selected data) my.str.data$info[id %in% collected.common.id_str[[1]], list(Nobs_sample=.N, Nobs_represented=sum(WEIGHT)), by=.(COUNTRY,YEAR)] ``` **Notices:** Please read `Use_CASE.R` for more details on using fadnUtils.