No preview for this file type
No preview for this file type
man/figures/README-unnamed-chunk-14-1.png

8.69 KiB

readme.md 0 → 100644
<!-- README.md is generated from README.Rmd. Please edit that file -->
# fadnUtils
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN)
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot). ![plot](inst/examples/pic/workflow.png)
More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format.
# Installation
You can install the development version from Thuenen or IIASA Gitlab with:
``` r
# Thuenen gitlab
devtools::install_git("https://git-dmz.thuenen.de/mindstep/fadnutilspackages", force = TRUE)
# IIASA gitlab
devtools::install_git("https://gitlab.iiasa.ac.at/mind-step/fadnutilspackage")
```
Then the Related R packages can be installed.
``` r
requiredPackages = c('fadnUtils','data.table', 'devtools','jsonlite', 'ggplot2')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
```
## Loading required package: fadnUtils
## fadnUtils is loaded.
## Loading required package: data.table
## Loading required package: devtools
## Loading required package: usethis
## Loading required package: jsonlite
## Loading required package: ggplot2
# Usage in Brief
After loaded the packages, you will have a functional R package on your computer. Then, we will talk about using your package online.
1. Create a working directory
- a user-defined data directory
2. Import CSV FADN data
- convert the csv data into raw r-data
- convert raw r-data into str r-data
3. Load r-data and structured r-data
4. Perform analysis
## 1. Create a working directory
Frist, User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`.
``` r
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
```
## This is already a data.dir structure. Doing nothing.
``` r
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
## [1] "D:/public/yang/MIND_STEP/New_test_fadnUtils"
### Required files
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
1. FADN data in csv format: the data for loading
2. A json file for extracting the variables
### Folder Structure
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:
``` base
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
\-- readme.txt
```
- csv: CSV files are stored here
- fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
- rds: placing r-data in the "rds" directory
- spool: keeping related files
## 2. Import CSV FADN data
First, we will import the data into an R-friendly format using the fadnUtils package.
### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.
``` r
fadn.data.dir <- "D:/public/data/fadn/lieferung_20210414/csv/"
# load data for country BEL and year 2009
convert.to.fadn.raw.rds(
file.path = paste0(fadn.data.dir, "BEL2009.csv"),
sepS = ",",
fadn.country = "BEL",
fadn.year = 2009
#keep.csv = T # copy csv file in csv.dir
)
```
## File D:/public/data/fadn/lieferung_20210414/csv/BEL2009.csv does not exist. Exiting ...
## [1] FALSE
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
``` r
show.data.dir.contents()
```
### Convert raw r-data into structured r-data
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package:
1. setting a structured data in the `structured` directory,
2. checking the `raw_str_map.file` that all variables can be converted.
3. converting the structured data successfully into `structured` directory.
#### Set a `structured` directory for saving the structured data
We set a `test` folder to placing the structured data.
``` r
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name))
new.extraction.dir = paste0(rds.dir, new.str.name)
```
#### Check the variables in the `raw_str_map.file`
Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.
``` r
list_vars = check.column(
# a rds file or a csv file
importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"),
# a json file
jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
# write a new json file without unmatched variables
rewrite_json = T,
# save the new json in extraction_dir
extraction_dir = new.extraction.dir)
```
#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
## [1] "Doing id ..."
## [1] "Doing info ..."
## [1] "Doing crops ..."
## ..........................................
##
## D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json copied to D:/public/yang/MIND_STEP/New_test_fadnUtils/rds//test/raw_str_map.json
## [1] "Doing id ..."
## [1] "Doing info ..."
## [1] "Doing crops ..."
## ..........................................
#### Files Structure in `rds` folder
After conversion, we can see the `rds` folder:
- `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
- `test`: extraction\_dir for saving the structured r-data and extracting json file
- `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
- `raw_str_map.json`: default json file
- `rewrite_2014_after_copy.json`: modified json file after checking the variables
``` base
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
+-- fadn.str.2009.BEL.rds
+-- raw_str_map.json
\-- rewrite_2014_after_copy.json
```
## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.
### Load raw r-data for the country `BEL` and year `2009`
``` r
my.data.2009.raw = load.fadn.raw.rds(
countries = "BEL",
years = 2009
)
```
### Load structured data for the country `BEL` and year `2009`
We can load structured from country `BEL` and year `2009`.
``` r
my.data.2009.str = load.fadn.str.rds(
countries = "BEL",
years = 2009,
extraction_dir = "test" # Location of the str r-data
)
```
### Load structured data from all available countries and years.
The following is an example of loading structured data all available countries and years.
``` r
my.str.data = load.fadn.str.rds( extraction_dir = "test")
```
## 4. Perform analysis
Here are some examples to perform data.
### Collection the common id
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.
``` r
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
## Tranforming list to data table....
## [1] 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
## 15 year(s) is/are selected.
### Plotting
To build a basic plot, we will use the `ggplot` function using the plotting package `ggplot2`.
``` r
crops.data = my.str.data$crops #catering for easier access at next steps
#this contains the number of crops for each farm-country-year/
# Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]
# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]
ggplot(crops.data.Ncrops,aes(y=N,x=1)) +
geom_boxplot() +
facet_grid(YEAR~COUNTRY) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()
)+
ylab("Number of Crops")
```
![](README_files/figure-markdown_github/unnamed-chunk-13-1.png)
### Some other examples
``` r
# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
```
## COUNTRY YEAR Nobs_sample Nobs_represented
## 1: NED 2004 1397 60644
## 2: NED 2005 1446 60598
## 3: NED 2006 1491 60644
## 4: BEL 2007 1168 33315
## 5: BGR 2007 1871 146769
## ---
## 329: SUO 2018 722 34114
## 330: SVE 2018 1010 28884
## 331: SVK 2018 559 4144
## 332: SVN 2018 890 44392
## 333: UKI 2018 2848 100916
``` r
# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
list(Nobs_sample=.N,
Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
```
## COUNTRY YEAR Nobs_sample Nobs_represented
## 1: NED 2004 446 20358.73
## 2: NED 2005 446 20209.66
## 3: NED 2006 446 19606.76
## 4: NED 2007 446 17748.39
## 5: NED 2008 446 17196.91
## 6: NED 2009 446 16564.05
## 7: NED 2010 446 17407.43
## 8: NED 2011 446 17928.86
## 9: NED 2012 446 16539.63
## 10: NED 2013 446 17078.27
## 11: NED 2014 446 17901.31
## 12: NED 2015 446 16973.80
## 13: NED 2016 446 16961.13
## 14: NED 2017 446 19275.99
## 15: NED 2018 446 17685.72
**Notices:** Please read `inst/examples/FADN_USE_CASE.R` and `use_case.docx` for more details on using fadnUtils.