GitLab at IIASA

readme.md 12.5 KiB
Newer Older
Xinxin Yang's avatar
Xinxin Yang committed

<!-- README.md is generated from README.Rmd. Please edit that file -->
Xinxin Yang's avatar
Xinxin Yang committed
# fadnUtils
Xinxin Yang's avatar
Xinxin Yang committed
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN)

The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot). ![plot](inst/examples/pic/workflow.png)
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
# Installation
Xinxin Yang's avatar
Xinxin Yang committed
You can install the development version from Thuenen or IIASA Gitlab with:

``` r
# Thuenen gitlab
devtools::install_git("https://git-dmz.thuenen.de/mindstep/fadnutilspackages", force = TRUE)
# IIASA gitlab
devtools::install_git("https://gitlab.iiasa.ac.at/mind-step/fadnutilspackage")
```

Then the Related R packages can be installed.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
``` r
Xinxin Yang's avatar
Xinxin Yang committed
requiredPackages = c('fadnUtils','data.table', 'devtools','jsonlite', 'ggplot2')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p)
  library(p,character.only = TRUE)
}
```

Xinxin Yang's avatar
Xinxin Yang committed
    ## Loading required package: fadnUtils

    ## fadnUtils is loaded.

    ## Loading required package: data.table

    ## Loading required package: devtools

    ## Loading required package: usethis

    ## Loading required package: jsonlite

    ## Loading required package: ggplot2

Xinxin Yang's avatar
Xinxin Yang committed
# Usage in Brief

Xinxin Yang's avatar
Xinxin Yang committed
After loaded the packages, you will have a functional R package on your computer. Then, we will talk about using your package online.

1.  Create a working directory
    -   a user-defined data directory
2.  Import CSV FADN data
    -   convert the csv data into raw r-data
    -   convert raw r-data into str r-data
3.  Load r-data and structured r-data
4.  Perform analysis
5.  Translate between various NUTS Version (FADN Region, NUTS1, NUTS2, NUTS2)
Xinxin Yang's avatar
Xinxin Yang committed

## 1. Create a working directory
Xinxin Yang's avatar
Xinxin Yang committed

Working directory is a path that sets the location of any files your save from R. User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`.
Xinxin Yang's avatar
Xinxin Yang committed

``` r
Xinxin Yang's avatar
Xinxin Yang committed
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
Xinxin Yang's avatar
Xinxin Yang committed
```

    ## This is already a data.dir structure. Doing nothing.

``` r
Xinxin Yang's avatar
Xinxin Yang committed
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
Xinxin Yang's avatar
Xinxin Yang committed

    ## [1] "D:/public/yang/MIND_STEP/New_test_fadnUtils"

Xinxin Yang's avatar
Xinxin Yang committed
### Required files
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
1.  FADN data in csv format: the data for loading
2.  A json file for extracting the variables

### Folder Structure
Xinxin Yang's avatar
Xinxin Yang committed

A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:

Xinxin Yang's avatar
Xinxin Yang committed
``` base
Xinxin Yang's avatar
Xinxin Yang committed
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
    \-- readme.txt
```
Xinxin Yang's avatar
Xinxin Yang committed

-   csv: CSV files are stored here
-   fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
-   rds: placing r-data in the "rds" directory
-   spool: keeping related files
Xinxin Yang's avatar
Xinxin Yang committed

## 2. Import CSV FADN data
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
First, we will import the data into an R-friendly format using the fadnUtils package.

### Convert the csv data into raw r-data
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.

Xinxin Yang's avatar
Xinxin Yang committed
``` r
Xinxin Yang's avatar
Xinxin Yang committed
fadn.data.dir <- "D:/public/data/fadn/lieferung_20210414/csv/"
Xinxin Yang's avatar
Xinxin Yang committed
# load data for country BEL and year 2009
Xinxin Yang's avatar
Xinxin Yang committed
convert.to.fadn.raw.rds(
Xinxin Yang's avatar
Xinxin Yang committed
      file.path =  paste0(fadn.data.dir, "BEL2009.csv"),
Xinxin Yang's avatar
Xinxin Yang committed
      sepS = ",",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.country = "BEL",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.year = 2009
      #keep.csv = T # copy csv file in csv.dir
Xinxin Yang's avatar
Xinxin Yang committed
      )
Xinxin Yang's avatar
Xinxin Yang committed
```
Xinxin Yang's avatar
Xinxin Yang committed

    ## File D:/public/data/fadn/lieferung_20210414/csv/BEL2009.csv does not exist. Exiting ...

    ## [1] FALSE

Xinxin Yang's avatar
Xinxin Yang committed
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
Xinxin Yang's avatar
Xinxin Yang committed

``` r
Xinxin Yang's avatar
Xinxin Yang committed
show.data.dir.contents()
```

### Convert raw r-data into structured r-data

Xinxin Yang's avatar
Xinxin Yang committed
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package:

1.  setting a structured data in the `structured` directory,
2.  checking the `raw_str_map.file` that all variables can be converted.
3.  converting the structured data successfully into `structured` directory.
Xinxin Yang's avatar
Xinxin Yang committed

#### Set a `structured` directory for saving the structured data
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
We set a `test` folder to placing the structured data.

Xinxin Yang's avatar
Xinxin Yang committed
``` r
Xinxin Yang's avatar
Xinxin Yang committed
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name), showWarnings = FALSE)
Xinxin Yang's avatar
Xinxin Yang committed
new.extraction.dir = paste0(rds.dir, new.str.name)
```

#### Check the variables in the `raw_str_map.file`
Xinxin Yang's avatar
Xinxin Yang committed

Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.

``` r
Xinxin Yang's avatar
Xinxin Yang committed
list_vars = check.column(
              # a rds file or a csv file
Xinxin Yang's avatar
Xinxin Yang committed
              importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"),
Xinxin Yang's avatar
Xinxin Yang committed
              # a json file
              jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
              # write a new json file without unmatched variables
              rewrite_json = T,
              # save the new json in extraction_dir
              extraction_dir = new.extraction.dir)
Xinxin Yang's avatar
Xinxin Yang committed
```

#### Convert the raw data into structured r-data using the checked json file
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
``` r
convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        str.name = new.str.name # extraction_dir
)

convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
                        str.name = new.str.name, # extraction_dir
                        force_external_raw_str_map = T,
                        DEBUG = F
                        )
```
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
#### Files Structure in `rds` folder
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
After conversion, we can see the `rds` folder:

Xinxin Yang's avatar
Xinxin Yang committed
-   `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
-   `test`: extraction\_dir for saving the structured r-data and extracting json file
-   `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
-   `raw_str_map.json`: default json file
-   `rewrite_2014_after_copy.json`: modified json file after checking the variables
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
``` base
Xinxin Yang's avatar
Xinxin Yang committed
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
     +-- fadn.str.2009.BEL.rds
     +-- raw_str_map.json
     \-- rewrite_2014_after_copy.json
```

## 3. Load raw r-data and structured r-data
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.

### Load raw r-data for the country `BEL` and year `2009`
Xinxin Yang's avatar
Xinxin Yang committed

``` r
Xinxin Yang's avatar
Xinxin Yang committed
my.data.2009.raw = load.fadn.raw.rds(
  countries = "BEL",
  years = 2009
)
```
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
### Load structured data for the country `BEL` and year `2009`
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
We can load structured from country `BEL` and year `2009`.
Xinxin Yang's avatar
Xinxin Yang committed

``` r
Xinxin Yang's avatar
Xinxin Yang committed
my.data.2009.str = load.fadn.str.rds(
  countries = "BEL",
  years = 2009,
  extraction_dir = "test" # Location of the str r-data
)
```
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
### Load structured data from all available countries and years.

Xinxin Yang's avatar
Xinxin Yang committed
The following is an example of loading structured data all available countries and years.

``` r
my.str.data = load.fadn.str.rds( extraction_dir = "test")
Xinxin Yang's avatar
Xinxin Yang committed
```

## 4. Perform analysis

Xinxin Yang's avatar
Xinxin Yang committed
Here are some examples to perform data.

### Collection the common id

Xinxin Yang's avatar
Xinxin Yang committed
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.

Xinxin Yang's avatar
Xinxin Yang committed
``` r
Xinxin Yang's avatar
Xinxin Yang committed
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
Xinxin Yang's avatar
Xinxin Yang committed

    ## Tranforming  list  to data table....
    ##  [1] 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
    ## 15  year(s) is/are selected.

Xinxin Yang's avatar
Xinxin Yang committed
### Plotting

Xinxin Yang's avatar
Xinxin Yang committed
To build a basic plot, we will use the `ggplot` function using the plotting package `ggplot2`.

``` r
Xinxin Yang's avatar
Xinxin Yang committed
crops.data = my.str.data$crops #catering for easier access at next steps

#this contains the number of crops for each farm-country-year/
#   Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]

# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]

# plot only 2007, 2008, 2009 
ggplot(crops.data.Ncrops %>% filter( YEAR %in% c(2007,2008,2009)),aes(y=N,x=1)) +
Xinxin Yang's avatar
Xinxin Yang committed
  geom_boxplot() +
  facet_grid(YEAR~COUNTRY) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.text = element_text(size = 8, angle = 90)
Xinxin Yang's avatar
Xinxin Yang committed
  )+
  ylab("Number of Crops")
```

Xinxin Yang's avatar
Xinxin Yang committed
![](README_files/figure-markdown_github/unnamed-chunk-13-1.png)

Xinxin Yang's avatar
Xinxin Yang committed
### Some other examples
Xinxin Yang's avatar
Xinxin Yang committed

``` r
Xinxin Yang's avatar
Xinxin Yang committed
# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]
Xinxin Yang's avatar
Xinxin Yang committed
```
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
    ##      COUNTRY YEAR Nobs_sample Nobs_represented
    ##   1:     NED 2004        1397            60644
    ##   2:     NED 2005        1446            60598
    ##   3:     NED 2006        1491            60644
    ##   4:     BEL 2007        1168            33315
    ##   5:     BGR 2007        1871           146769
    ##  ---                                          
    ## 329:     SUO 2018         722            34114
    ## 330:     SVE 2018        1010            28884
    ## 331:     SVK 2018         559             4144
    ## 332:     SVN 2018         890            44392
    ## 333:     UKI 2018        2848           100916

``` r
Xinxin Yang's avatar
Xinxin Yang committed
# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
                 list(Nobs_sample=.N,
                      Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]
```

Xinxin Yang's avatar
Xinxin Yang committed
    ##     COUNTRY YEAR Nobs_sample Nobs_represented
    ##  1:     NED 2004         446         20358.73
    ##  2:     NED 2005         446         20209.66
    ##  3:     NED 2006         446         19606.76
    ##  4:     NED 2007         446         17748.39
    ##  5:     NED 2008         446         17196.91
    ##  6:     NED 2009         446         16564.05
    ##  7:     NED 2010         446         17407.43
    ##  8:     NED 2011         446         17928.86
    ##  9:     NED 2012         446         16539.63
    ## 10:     NED 2013         446         17078.27
    ## 11:     NED 2014         446         17901.31
    ## 12:     NED 2015         446         16973.80
    ## 13:     NED 2016         446         16961.13
    ## 14:     NED 2017         446         19275.99
    ## 15:     NED 2018         446         17685.72
Xinxin Yang's avatar
Xinxin Yang committed

## 5. Translate the Nomenclature of Territorial Units for Statistics (NUTS) version

The NUTS classification changes every 3-4 years. Changes between various NUTS versions include recoding, merging, splitting of regions or boundary shift. This package provides function for plotting NUTS classification and converting between different NUTS versions.

1.  Plots various regional levels (FADN Region, NUTS1, NUTS2, NUTS3).

``` r
# nuts2 for Germany
nuts.heatmap.group(my.str.data$info, "NUTS2", countries = "DEU", onepage = FALSE)
```

![](README_files/figure-markdown_github/DEU.png) 2. Converts data between the different NUTS versions. This package provides various NUTS tables for converting the lastest NUTS version.

``` r
# get a list of the datasets in an R package
dt_nuts <- data(package = "fadnUtils")
```
Xinxin Yang's avatar
Xinxin Yang committed
**Notices:** Please read `inst/examples/FADN_USE_CASE.R` and `use_case.docx` for more details on using fadnUtils.

# References

1.  History of NUTS: <https://ec.europa.eu/eurostat/en/web/nuts/history>

2.  NUTS Converter web tool: <https://urban.jrc.ec.europa.eu/nutsconverter/#/>