GitLab at IIASA

README.Rmd 9.39 KiB
Newer Older
Xinxin Yang's avatar
Xinxin Yang committed
---
output: rmarkdown::github_document
---

```{r setup, include=FALSE}
knitr::opts_knit$set(root.dir = 'C:/Users/yang_x/Desktop/new-Version/')
```
<!-- README.md is generated from README.Rmd. Please edit that file -->

Xinxin Yang's avatar
Xinxin Yang committed
# fadnUtils
Xinxin Yang's avatar
Xinxin Yang committed
Develop by Dimitrios Kremmydas (JRC) and Xinxin Yang (THÜNEN) 
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure plot).
![plot](inst/examples/pic/workflow.png)
Xinxin Yang's avatar
Xinxin Yang committed

More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format. 

# Installation 
Xinxin Yang's avatar
Xinxin Yang committed
You can install the development version from Thuenen or IIASA Gitlab with:
Xinxin Yang's avatar
Xinxin Yang committed
```{r , eval =FALSE}
# Thuenen gitlab
devtools::install_git("https://git-dmz.thuenen.de/mindstep/fadnutilspackages", force = TRUE)
# IIASA gitlab
devtools::install_git("https://gitlab.iiasa.ac.at/mind-step/fadnutilspackage")
```

Then the Related R packages can be installed.
Xinxin Yang's avatar
Xinxin Yang committed

Xinxin Yang's avatar
Xinxin Yang committed
```{r, results='hide'}
Xinxin Yang's avatar
Xinxin Yang committed
requiredPackages = c('fadnUtils','data.table', 'devtools','jsonlite', 'ggplot2')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p)
  library(p,character.only = TRUE)
}
```

# Usage in Brief
Xinxin Yang's avatar
Xinxin Yang committed
After loaded the packages, you will have a functional R package on your computer. Then, we will talk about using your package online.
Xinxin Yang's avatar
Xinxin Yang committed

1. Create a working directory
    - a user-defined data directory
1. Import CSV FADN data
    - convert the csv data into raw r-data
    - convert raw r-data into str r-data
1. Load r-data and structured r-data
1.  Perform analysis

## 1. Create a working directory
Frist, User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`. 
```{r}
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
### Required files
Xinxin Yang's avatar
Xinxin Yang committed
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary. `inst/examples` is the folder for use cases that contain fadnUtils package examples and json files.
Xinxin Yang's avatar
Xinxin Yang committed

1. FADN data in csv format: the data for loading
2. A json file for extracting the variables

### Folder Structure 
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:

```base
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
    \-- readme.txt
```
* csv: CSV files are stored here
* fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
* rds: placing r-data in the "rds" directory
* spool: keeping related files

## 2. Import CSV FADN data
First, we will import the data into an R-friendly format using the fadnUtils package.

### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.

```{r}
Xinxin Yang's avatar
Xinxin Yang committed
fadn.data.dir <- "D:/public/data/fadn/lieferung_20210414/csv/"
Xinxin Yang's avatar
Xinxin Yang committed
# load data for country BEL and year 2009
Xinxin Yang's avatar
Xinxin Yang committed
convert.to.fadn.raw.rds(
Xinxin Yang's avatar
Xinxin Yang committed
      file.path =  paste0(fadn.data.dir, "BEL2009.csv"),
Xinxin Yang's avatar
Xinxin Yang committed
      sepS = ",",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.country = "BEL",
Xinxin Yang's avatar
Xinxin Yang committed
      fadn.year = 2009
      #keep.csv = T # copy csv file in csv.dir
Xinxin Yang's avatar
Xinxin Yang committed
      )
Xinxin Yang's avatar
Xinxin Yang committed
```
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
Xinxin Yang's avatar
Xinxin Yang committed
```{r, eval=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
show.data.dir.contents()
```

### Convert raw r-data into structured r-data
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package: 

1. setting a structured data in the `structured` directory, 
2. checking the `raw_str_map.file` that all variables can be converted.
3. converting the structured data successfully into `structured` directory.

#### Set a `structured` directory for saving the structured data
We set a `test` folder to placing the structured data.

Xinxin Yang's avatar
Xinxin Yang committed
```{r, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name))
new.extraction.dir = paste0(rds.dir, new.str.name)
```

#### Check the variables in the `raw_str_map.file`
 Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.
```{r results='hide', message=FALSE, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
list_vars = check.column(
              # a rds file or a csv file
Xinxin Yang's avatar
Xinxin Yang committed
              importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"),
Xinxin Yang's avatar
Xinxin Yang committed
              # a json file
              jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
              # write a new json file without unmatched variables
              rewrite_json = T,
              # save the new json in extraction_dir
              extraction_dir = new.extraction.dir)
Xinxin Yang's avatar
Xinxin Yang committed
```


#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
Xinxin Yang's avatar
Xinxin Yang committed
```{r, echo=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        str.name = new.str.name # extraction_dir
)

convert.to.fadn.str.rds(fadn.country = "BEL",
                        fadn.year = 2009,
                        raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
                        str.name = new.str.name, # extraction_dir
                        force_external_raw_str_map = T,
                        DEBUG = F
                        )
Xinxin Yang's avatar
Xinxin Yang committed
```
#### Files Structure in `rds` folder
After conversion, we can see the `rds` folder:

* `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
* `test`: extraction_dir for saving the structured r-data and extracting json file
* `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
* `raw_str_map.json`: default json file
* `rewrite_2014_after_copy.json`: modified json file after checking the variables

```base
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
     +-- fadn.str.2009.BEL.rds
     +-- raw_str_map.json
     \-- rewrite_2014_after_copy.json
```

## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.

### Load raw r-data for the country `BEL` and year `2009`
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.raw = load.fadn.raw.rds(
  countries = "BEL",
  years = 2009
)
```
### Load structured data for the country `BEL` and year `2009`
We can load structured from country `BEL` and year `2009`.
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.str = load.fadn.str.rds(
  countries = "BEL",
  years = 2009,
  extraction_dir = "test" # Location of the str r-data
)
```
### Load structured data from all available countries and years.
The following is an example of loading structured data all available countries and years. 

```{r results='hide', message=FALSE, warning=FALSE}
Xinxin Yang's avatar
Xinxin Yang committed
my.str.data = load.fadn.str.rds( extraction_dir = "test")
Xinxin Yang's avatar
Xinxin Yang committed
```

## 4. Perform analysis
Here are some examples to perform data. 

### Collection the common id 
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.

```{r, message=FALSE}
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
### Plotting
To build a basic plot, we will use the `ggplot` function using the plotting package 
`ggplot2`.

```{r results='hide', message=FALSE, warning=FALSE}
crops.data = my.str.data$crops #catering for easier access at next steps

#this contains the number of crops for each farm-country-year/
#   Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]

# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]

ggplot(crops.data.Ncrops,aes(y=N,x=1)) +
  geom_boxplot() +
  facet_grid(YEAR~COUNTRY) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()
  )+
  ylab("Number of Crops")
```

### Some other examples
```{r}
# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]

# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
                 list(Nobs_sample=.N,
                      Nobs_represented=sum(WEIGHT)),
                 by=.(COUNTRY,YEAR)]
```
**Notices:** Please read `inst/examples/FADN_USE_CASE.R` and `use_case.docx` for more details on using fadnUtils.
Xinxin Yang's avatar
Xinxin Yang committed