Newer
Older
<!-- README.md is generated from README.Rmd. Please edit that file -->
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN)
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot). 
More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format.
You can install the development version from Thuenen or IIASA Gitlab with:
``` r
# Thuenen gitlab
devtools::install_git("https://git-dmz.thuenen.de/mindstep/fadnutilspackages", force = TRUE)
# IIASA gitlab
devtools::install_git("https://gitlab.iiasa.ac.at/mind-step/fadnutilspackage")
```
Then the Related R packages can be installed.
library(fadnUtils)
```
## fadnUtils is loaded.
``` r
requiredPackages = c('data.table', 'devtools','jsonlite', 'ggplot2')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
```
## Loading required package: data.table
## Loading required package: devtools
## Loading required package: usethis
## Loading required package: jsonlite
## Loading required package: ggplot2
After loaded the packages, you will have a functional R package on your computer. Then, we will talk about using your package online.
1. Create a working directory
- a user-defined data directory
2. Import CSV FADN data
- convert the csv data into raw r-data
- convert raw r-data into str r-data
3. Load r-data and structured r-data
4. Perform analysis
5. Translate between various NUTS Version (FADN Region, NUTS1, NUTS2, NUTS2)
Working directory is a path that sets the location of any files your save from R. User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`.
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
```
## This is already a data.dir structure. Doing nothing.
``` r
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
1. FADN data in csv format: the data for loading
2. A json file for extracting the variables
### Folder Structure
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
\-- readme.txt
```
- csv: CSV files are stored here
- fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
- rds: placing r-data in the "rds" directory
- spool: keeping related files
First, we will import the data into an R-friendly format using the fadnUtils package.
### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.
fadn.data.dir <- "D:/public/data/fadn/lieferung_20210414/csv/"
fadn.year = 2009
#keep.csv = T # copy csv file in csv.dir
## File D:/public/data/fadn/lieferung_20210414/csv/BEL2009.csv does not exist. Exiting ...
## [1] FALSE
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
show.data.dir.contents()
```
### Convert raw r-data into structured r-data
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package:
1. setting a structured data in the `structured` directory,
2. checking the `raw_str_map.file` that all variables can be converted.
3. converting the structured data successfully into `structured` directory.
#### Set a `structured` directory for saving the structured data
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name), showWarnings = FALSE)
new.extraction.dir = paste0(rds.dir, new.str.name)
```
#### Check the variables in the `raw_str_map.file`
Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.
``` r
# a json file
jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
# write a new json file without unmatched variables
rewrite_json = T,
# save the new json in extraction_dir
extraction_dir = new.extraction.dir)
```
#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
``` r
convert.to.fadn.str.rds(fadn.country = "BEL",
fadn.year = 2009,
str.name = new.str.name # extraction_dir
)
convert.to.fadn.str.rds(fadn.country = "BEL",
fadn.year = 2009,
raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
str.name = new.str.name, # extraction_dir
force_external_raw_str_map = T,
DEBUG = F
)
```
- `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
- `test`: extraction\_dir for saving the structured r-data and extracting json file
- `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
- `raw_str_map.json`: default json file
- `rewrite_2014_after_copy.json`: modified json file after checking the variables
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
+-- fadn.str.2009.BEL.rds
+-- raw_str_map.json
\-- rewrite_2014_after_copy.json
```
## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.
### Load raw r-data for the country `BEL` and year `2009`
my.data.2009.raw = load.fadn.raw.rds(
countries = "BEL",
years = 2009
)
```
### Load structured data for the country `BEL` and year `2009`
my.data.2009.str = load.fadn.str.rds(
countries = "BEL",
years = 2009,
extraction_dir = "test" # Location of the str r-data
)
```
### Load structured data from all available countries and years.
The following is an example of loading structured data all available countries and years.
``` r
my.str.data = load.fadn.str.rds( extraction_dir = "test")
Here are some examples to perform data.
### Collection the common id
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
## Tranforming list to data table....
## [1] 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
## 15 year(s) is/are selected.
To build a basic plot, we will use the `ggplot` function using the plotting package `ggplot2`.
``` r
crops.data = my.str.data$crops #catering for easier access at next steps
#this contains the number of crops for each farm-country-year/
# Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]
# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]
# plot only 2007, 2008, 2009
ggplot(crops.data.Ncrops %>% filter( YEAR %in% c(2007,2008,2009)),aes(y=N,x=1)) +
geom_boxplot() +
facet_grid(YEAR~COUNTRY) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
strip.text = element_text(size = 8, angle = 90)

# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
## COUNTRY YEAR Nobs_sample Nobs_represented
## 1: NED 2004 1397 60644
## 2: NED 2005 1446 60598
## 3: NED 2006 1491 60644
## 4: BEL 2007 1168 33315
## 5: BGR 2007 1871 146769
## ---
## 329: SUO 2018 722 34114
## 330: SVE 2018 1010 28884
## 331: SVK 2018 559 4144
## 332: SVN 2018 890 44392
## 333: UKI 2018 2848 100916
``` r
# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
list(Nobs_sample=.N,
Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
```
## COUNTRY YEAR Nobs_sample Nobs_represented
## 1: NED 2004 446 20358.73
## 2: NED 2005 446 20209.66
## 3: NED 2006 446 19606.76
## 4: NED 2007 446 17748.39
## 5: NED 2008 446 17196.91
## 6: NED 2009 446 16564.05
## 7: NED 2010 446 17407.43
## 8: NED 2011 446 17928.86
## 9: NED 2012 446 16539.63
## 10: NED 2013 446 17078.27
## 11: NED 2014 446 17901.31
## 12: NED 2015 446 16973.80
## 13: NED 2016 446 16961.13
## 14: NED 2017 446 19275.99
## 15: NED 2018 446 17685.72
## 5. Translate the Nomenclature of Territorial Units for Statistics (NUTS) version
The NUTS classification changes every 3-4 years. Changes between various NUTS versions include recoding, merging, splitting of regions or boundary shift. This package provides function for plotting NUTS classification and converting between different NUTS versions.
1. Plots various regional levels (FADN Region, NUTS1, NUTS2, NUTS3).
``` r
nuts.heatmap.group(my.str.data$info, "NUTS2", countries = "DEU", onepage = FALSE)
```
 2. Converts data between the different NUTS versions in both directions.
This package contains various NUTS tables. `NUTS.convert.all()` converts FADN data between different NUTS versions.
dt_nuts <- data(package = "fadnUtils")
# names of data sets
dt_nuts$results[, "Item"]
```
## [1] "nuts1.trans" "nuts2.trans" "nuts3.trans" "region.trans"
``` r
# convert NUTS1 and NUTS2 to NUTS version 2016 for Germany.
NUTS.convert.all(data = my.str.data$info, countries = "DEU", NUTS.Year = 2016)
# convert NUTS1 and NUTS2 to NUTS version 2013 for Germany.
NUTS.convert.all(data = my.str.data$info, countries = "DEU", NUTS.Year = 2013)
# convert NUTS1 and NUTS2 to NUTS version 2013 for Germany, Poland.
NUTS.convert.all(data = my.str.data$info, countries = c("DEU", "POL"), NUTS.Year = 2013)
# convert NUTS1 and NUTS2 to NUTS version 2016 for all available FADN countries.
NUTS.convert.all(data = my.str.data$info, countries = "all", NUTS.Year = 2016)
Conversion from old NUTS1, NUTS2 to latest NUTS (NUTS 2016), you can find the example in `inst/examples/nuts_use_case.R`.
**Notices:** Please read `inst/examples/FADN_USE_CASE.R` and `use_case.docx` for more details on using fadnUtils.
# References
1. History of NUTS: <https://ec.europa.eu/eurostat/en/web/nuts/history>
2. NUTS Converter web tool: <https://urban.jrc.ec.europa.eu/nutsconverter/#/>