GitLab at IIASA

README.Rmd 10.8 KiB
Newer Older
Xinxin Yang's avatar
Xinxin Yang committed
---
output: rmarkdown::github_document
---

```{r setup, include=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
getwd()
knitr::opts_knit$set(root.dir = 'D:/public/yang/fadnutils/new-Version/')
Xinxin Yang's avatar
Xinxin Yang committed
```
<!-- README.md is generated from README.Rmd. Please edit that file -->

Xinxin Yang's avatar
Xinxin Yang committed
# fadnUtils
Xinxin Yang's avatar
Xinxin Yang committed
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN) 
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot).
![plot](inst/examples/pic/workflow.png)
Xinxin Yang's avatar
Xinxin Yang committed

More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format. 

# Installation 
Xinxin Yang's avatar
Xinxin Yang committed
You can install the development version from Thuenen or IIASA Gitlab with:
Xinxin Yang's avatar
Xinxin Yang committed
```{r , eval =FALSE}
# Thuenen gitlab
devtools::install_git("https://git-dmz.thuenen.de/mindstep/fadnutilspackages", force = TRUE)
# IIASA gitlab
devtools::install_git("https://gitlab.iiasa.ac.at/mind-step/fadnutilspackage")
```

Then the Related R packages can be installed.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
```{r, results='hide'}
Xinxin Yang's avatar
Xinxin Yang committed
library(fadnUtils)
requiredPackages = c('data.table', 'devtools','jsonlite', 'ggplot2')
Xinxin Yang's avatar
Xinxin Yang committed
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p)
  library(p,character.only = TRUE)
}
```

# Usage in Brief
Xinxin Yang's avatar
Xinxin Yang committed
After loaded the packages, you will have a functional R package on your computer. Then, we will talk about using your package online.
Xinxin Yang's avatar
Xinxin Yang committed

1. Create a working directory
    - a user-defined data directory
1. Import CSV FADN data
    - convert the csv data into raw r-data
    - convert raw r-data into str r-data
1. Load r-data and structured r-data
1. Perform analysis
1. Translate between various NUTS Version (FADN Region, NUTS1, NUTS2, NUTS2)
Xinxin Yang's avatar
Xinxin Yang committed

## 1. Create a working directory
Working directory is a path that sets the location of any files your save from R. User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`. 
Xinxin Yang's avatar
Xinxin Yang committed
```{r}
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
### Required files
Xinxin Yang's avatar
Xinxin Yang committed
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
Xinxin Yang's avatar
Xinxin Yang committed

1. FADN data in csv format: the data for loading
2. A json file for extracting the variables

### Folder Structure 
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:

```base
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
    \-- readme.txt
```
* csv: CSV files are stored here
* fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
* rds: placing r-data in the "rds" directory
* spool: keeping related files

## 2. Import CSV FADN data
First, we will import the data into an R-friendly format using the fadnUtils package.

### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.

```{r}
Xinxin Yang's avatar
Xinxin Yang committed
fadn.data.dir <- "D:/public/data/fadn/lieferung_20210414/csv/"
Xinxin Yang's avatar
Xinxin Yang committed
# load data for country BEL and year 2009
Xinxin Yang's avatar
Xinxin Yang committed
convert.to.fadn.raw.rds(
Xinxin Yang's avatar
Xinxin Yang committed
      file.path =  paste0(fadn.data.dir, "BEL2009.csv"),
Xinxin Yang's avatar
Xinxin Yang committed
      sepS = ",",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.country = "BEL",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.year = 2009
      #keep.csv = T # copy csv file in csv.dir
Xinxin Yang's avatar
Xinxin Yang committed
      )
Xinxin Yang's avatar
Xinxin Yang committed
```
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
Xinxin Yang's avatar
Xinxin Yang committed
```{r, eval=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
show.data.dir.contents()
```

### Convert raw r-data into structured r-data
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package: 

1. setting a structured data in the `structured` directory, 
2. checking the `raw_str_map.file` that all variables can be converted.
3. converting the structured data successfully into `structured` directory.

#### Set a `structured` directory for saving the structured data
We set a `test` folder to placing the structured data.

Xinxin Yang's avatar
Xinxin Yang committed
```{r, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name), showWarnings = FALSE)
Xinxin Yang's avatar
Xinxin Yang committed
new.extraction.dir = paste0(rds.dir, new.str.name)
```

#### Check the variables in the `raw_str_map.file`
 Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.
```{r results='hide', message=FALSE, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
list_vars = check.column(
              # a rds file or a csv file
Xinxin Yang's avatar
Xinxin Yang committed
              importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"),
Xinxin Yang's avatar
Xinxin Yang committed
              # a json file
              jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
              # write a new json file without unmatched variables
              rewrite_json = T,
              # save the new json in extraction_dir
              extraction_dir = new.extraction.dir)
Xinxin Yang's avatar
Xinxin Yang committed
```


#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
Xinxin Yang's avatar
Xinxin Yang committed
```{r, results='hide'}
Xinxin Yang's avatar
Xinxin Yang committed
convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        str.name = new.str.name # extraction_dir
)

convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
                        str.name = new.str.name, # extraction_dir
                        force_external_raw_str_map = T,
                        DEBUG = F
                        )
Xinxin Yang's avatar
Xinxin Yang committed
```
#### Files Structure in `rds` folder
After conversion, we can see the `rds` folder:

* `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
* `test`: extraction_dir for saving the structured r-data and extracting json file
* `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
* `raw_str_map.json`: default json file
* `rewrite_2014_after_copy.json`: modified json file after checking the variables

```base
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
     +-- fadn.str.2009.BEL.rds
     +-- raw_str_map.json
     \-- rewrite_2014_after_copy.json
```

## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.

### Load raw r-data for the country `BEL` and year `2009`
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.raw = load.fadn.raw.rds(
  countries = "BEL",
  years = 2009
)
```
### Load structured data for the country `BEL` and year `2009`
We can load structured from country `BEL` and year `2009`.
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.str = load.fadn.str.rds(
  countries = "BEL",
  years = 2009,
  extraction_dir = "test" # Location of the str r-data
)
```
### Load structured data from all available countries and years.
The following is an example of loading structured data all available countries and years. 

```{r results='hide', message=FALSE, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
my.str.data = load.fadn.str.rds( extraction_dir = "test")
Xinxin Yang's avatar
Xinxin Yang committed
```

## 4. Perform analysis
Here are some examples to perform data. 

### Collection the common id 
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.

```{r, message=FALSE}
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
### Plotting
To build a basic plot, we will use the `ggplot` function using the plotting package 
`ggplot2`.

```{r results='hide', message=FALSE, warning=FALSE}
crops.data = my.str.data$crops #catering for easier access at next steps

#this contains the number of crops for each farm-country-year/
#   Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]

# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]

# plot only 2007, 2008, 2009 
ggplot(crops.data.Ncrops %>% filter( YEAR %in% c(2007,2008,2009)),aes(y=N,x=1)) +
Xinxin Yang's avatar
Xinxin Yang committed
  geom_boxplot() +
  facet_grid(YEAR~COUNTRY) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.text = element_text(size = 8, angle = 90)
Xinxin Yang's avatar
Xinxin Yang committed
  )+
  ylab("Number of Crops")
```

### Some other examples
```{r}
# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]

# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
                 list(Nobs_sample=.N,
                      Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]
```
## 5. Translate the Nomenclature of Territorial Units for Statistics (NUTS) version
The NUTS classification changes every 3-4 years. Changes between various NUTS versions include recoding, merging, splitting of regions or boundary shift. This package provides function for plotting NUTS classification and converting between different NUTS versions. 
1. Plots various regional levels (FADN Region, NUTS1, NUTS2, NUTS3).
# nuts2 for Germany
nuts.heatmap.group(my.str.data$info, "NUTS2", countries = "DEU", onepage = FALSE)
```
![](README_files/figure-markdown_github/DEU.png)
2. Converts data between the different NUTS versions.
This package provides various NUTS tables for converting the lastest NUTS version.
```{r}
# get a list of the datasets in an R package
dt_nuts <- data(package = "fadnUtils")
```
Xinxin Yang's avatar
Xinxin Yang committed
Conversion from old NUTS1, NUTS2 to latest NUTS (NUTS 2016), you can find the example in `inst/examples/nuts_use_case.R`.
Xinxin Yang's avatar
Xinxin Yang committed

**Notices:** Please read `inst/examples/FADN_USE_CASE.R` and `use_case.docx` for more details on using fadnUtils.

1. History of NUTS: <https://ec.europa.eu/eurostat/en/web/nuts/history>

2. NUTS Converter web tool: <https://urban.jrc.ec.europa.eu/nutsconverter/#/>