Newer
Older
# fadntuils
The fadnUtils package facilitates the efficient handling of FADN data within the R language framework. Furthermore, the package is targeted for use within the JRC D.4 context. This means that there is a specific temporal pattern of how a user interacts with the package (see Figure \@ref(fig:foo)).
More specifically, after a request for FADN data from DG-AGRI, this data is delivered to JRC D.4 in csv format.
```{r foo, echo=FALSE, fig.cap="Overview of how the user interacts with the package.",fig.align = 'center', out.width = '100%'}
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
```
# Installation
`fadnUtils` and Related R packages can be installed.
```{r results='hide', message=FALSE, warning=FALSE}
requiredPackages = c('fadnUtils','data.table', 'devtools','jsonlite', 'ggplot2')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
```
# Usage in Brief
After loaded the packages, you will have a functinal R package on your computer. Then, we will talk about using your package online.
1. Create a working directory
- a user-defined data directory
1. Import CSV FADN data
- convert the csv data into raw r-data
- convert raw r-data into str r-data
1. Load r-data and structured r-data
1. Perform analysis
## 1. Create a working directory
Frist, User sets a working directory. Make sure the relative path stays within `CurrentProjectDirectory`.
```{r}
# using a local directory
CurrentProjectDirectory = "D:/public/yang/MIND_STEP/New_test_fadnUtils"
create.data.dir(folder.path = CurrentProjectDirectory)
set.data.dir(CurrentProjectDirectory)
get.data.dir()
```
### Required files
We request FADN data from DG-AGRI, which is delivered to us in csv format. In order to work efficiently with R, we should convert the csv-data to an r friendly format, this step is done with help of a human-readable file, called `raw_str_map.file`. Both files are necessary.
1. FADN data in csv format: the data for loading
2. A json file for extracting the variables
### Folder Structure
A working directory is specified arbitrarily by the user. This structure helps data management and maintenance. The directory looks like this:
```base
CurrentProjectDirectory/
+-- csv
+-- fadnUtils.metadata.json
+-- rds
\-- spool
\-- readme.txt
```
* csv: CSV files are stored here
* fadnUtils.metadata.json: containing the mapping from the fadn.raw.rds to the fadn.str.rds data
* rds: placing r-data in the "rds" directory
* spool: keeping related files
## 2. Import CSV FADN data
First, we will import the data into an R-friendly format using the fadnUtils package.
### Convert the csv data into raw r-data
The raw data will be added to a `rds` directory. We use a convenient function from this package to convert the csv file into raw r-data.
```{r}
convert.to.fadn.raw.rds(
file.path = "D:/public/data/fadn/lieferung_20210414/csv/DEU2009.csv",
sepS = ",",
fadn.country = "DEU",
fadn.year = 2009
#keep.csv = T # copy csv file in csv.dir
)
```
At any time, we can check for the current data dir, what csv files (countries, year) are loaded.
```{r}
show.data.dir.contents()
```
### Convert raw r-data into structured r-data
Then, We convert raw data into structured data. Broadly, there are 3 steps to including data in an R package:
1. setting a structured data in the `structured` directory,
2. checking the `raw_str_map.file` that all variables can be converted.
3. converting the structured data successfully into `structured` directory.
#### Set a `structured` directory for saving the structured data
We set a `test` folder to placing the structured data.
```{r}
rds.dir = paste0(get.data.dir(),"/rds/")
# set a structured name for for saving the structured r-data in rds.dir
new.str.name = "test"
# set a extraction_dir
dir.create(paste0(rds.dir, new.str.name))
new.extraction.dir = paste0(rds.dir, new.str.name)
```
#### Check the variables in the `raw_str_map.file`
Before conversion it is recommended to use `check.column()` method, ensuring that all variables in the`raw_str_map.file` can be converted.
```{r results='hide', message=FALSE, warning=FALSE}
# list_vars = check.column(
# # a rds file or a csv file
# importfilepath = paste0(rds.dir, "fadn.raw.2009.BEL.rds"),
# # a json file
# jsonfile = "D:/public/yang/MIND_STEP/2014_after_copy.json",
# # write a new json file without unmatched variables
# rewrite_json = T,
# # save the new json in extraction_dir
# extraction_dir = new.extraction.dir)
```
#### Convert the raw data into structured r-data using the checked json file
Finally, We can convert a raw r-data to str r-data using a external json file. For more details on converting in fadnUtils packages, `see USE_CASE.R`.
```{r}
# convert.to.fadn.str.rds(fadn.country = "BEL",
# fadn.year = 2009,
# str.name = new.str.name # extraction_dir
# )
# convert.to.fadn.str.rds(fadn.country = "BEL",
# fadn.year = 2009,
# raw_str_map.file = "D:/public/yang/MIND_STEP/new_sample/test01/raw_str_map.json", # a external json file
# str.name = new.str.name, # extraction_dir
# force_external_raw_str_map = T,
# DEBUG = F
# )
```
#### Files Structure in `rds` folder
After conversion, we can see the `rds` folder:
* `fadn.raw.2009.BEL.rds`: raw r-data for country "BEL" and year "2009"
* `test`: extraction_dir for saving the structured r-data and extracting json file
* `fadn.str.2009.BEL.rds`: structured s-data for for country of "BEL" and year of "2009"
* `raw_str_map.json`: default json file
* `rewrite_2014_after_copy.json`: modified json file after checking the variables
```base
rds
+-- fadn.raw.2009.BEL.compressed.rds
+-- fadn.raw.2009.BEL.rds
+-- fadn.raw.2010.BEL.compressed.rds
+-- fadn.raw.2010.BEL.rds
+-- fadn.raw.2011.BEL.compressed.rds
+-- fadn.raw.2011.BEL.rds
+-- fadn.raw.2012.BEL.compressed.rds
+-- fadn.raw.2012.BEL.rds
\-- test
+-- fadn.str.2009.BEL.rds
+-- raw_str_map.json
\-- rewrite_2014_after_copy.json
```
## 3. Load raw r-data and structured r-data
In order to initiate any analysis with `fadnUtils`, we first need to load r-data. We can only load data for countries and years that that has already been imported into a data.dir folder.
### Load raw r-data for the country `BEL` and year `2009`
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.raw = load.fadn.raw.rds(
countries = "BEL",
years = 2009
)
```
### Load structured data for the country `BEL` and year `2009`
We can load structured from country `BEL` and year `2009`.
```{r results='hide', message=FALSE, warning=FALSE}
my.data.2009.str = load.fadn.str.rds(
countries = "BEL",
years = 2009,
extraction_dir = "test" # Location of the str r-data
)
```
### Load structured data from all available countries and years.
The following is an example of loading structured data all available countries and years.
```{r results='hide', message=FALSE, warning=FALSE}
my.str.data = load.fadn.str.rds( extraction_dir = "a")
```
## 4. Perform analysis
Here are some examples to perform data.
### Collection the common id
We can collect the common id from the loaded r-data using `collect.common.id()` function on `fadnUtils`.
```{r, message=FALSE}
# Collection the common id from loaded structured r-data
collected.common.id_str = collect.common.id(my.str.data)
```
### Plotting
To build a basic plot, we will use the `ggplot` function using the plotting package
`ggplot2`.
```{r results='hide', message=FALSE, warning=FALSE}
crops.data = my.str.data$crops #catering for easier access at next steps
#this contains the number of crops for each farm-country-year/
# Be carefule, we hav to filter to count only the LEVL variable
crops.data.Ncrops = crops.data[VARIABLE=="LEVL",.N,by=list(COUNTRY,YEAR,ID)]
# This displays the quantiles of the number of crops
crops.data.Ncrops[,as.list(quantile(N)),by=list(YEAR,COUNTRY)][order(COUNTRY)]
ggplot(crops.data.Ncrops,aes(y=N,x=1)) +
geom_boxplot() +
facet_grid(YEAR~COUNTRY) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()
)+
ylab("Number of Crops")
```
### Some other examples
```{r}
# sample and representend number of farms
my.str.data$info[,list(Nobs_sample=.N,Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
# only for full sample (common id over years in selected data)
my.str.data$info[ID %in% collected.common.id_str[[1]],
list(Nobs_sample=.N,
Nobs_represented=sum(WEIGHT)),
by=.(COUNTRY,YEAR)]
```
**Notices:** Please read `inst/examples/FADN_USE_CASE.R` for more details on using fadnUtils.