<!-- README.md is generated from README.Rmd. Please edit that file -->

# FarmDynR

Developed by Hugo Scherer and Marc Müller at Wageningen Economic
Research.

<!-- badges: start -->
<!-- badges: end -->

The goal of FarmDynR is to give the user the ability to aggregate FADN
data to create representative farms for any grouping available, generate
descriptive statistics, and to run FarmDyn from R.

Additionally, it includes useful functions to analyze the results of
FarmDyn simulations.

## Requirements

This package requires that you have GAMS and FarmDyn installed.
Hugo Scherer's avatar
Hugo Scherer committed
Additionally, the package imports from:

- gdxrrw (remember to load GAMS with `igdx()` with the path to your
Hugo Scherer's avatar
Hugo Scherer committed
  version of GAMS)

- readr

- dplyr

- tidyr

- readxl

- data.table

Hugo Scherer's avatar
Hugo Scherer committed
It also suggests:

- stringr

## Installation

You can install the development version of FarmDynR like so:

``` r
 install.packages('https://gitlab.iiasa.ac.at/mind-step/FarmDynR')
```

## Workflow

Hugo Scherer's avatar
Hugo Scherer committed
The workflow for this package is as follows:

1.  Load the individual farm FADN data into R (with desired countries
    and years)
Hugo Scherer's avatar
Hugo Scherer committed

2.  Run `fadn2fd()` with the FADN data, farmbranch desired, the mapping
    and the option to save GDX files based on the mapping - Yields will
    be calculated and the FADN data will be prepared - Outliers are
    removed

    - The mapping is a list of vectors, where each vector contains the
      names of the columns in the FADN data that correspond to the
      desired grouping. For example, if you want to group by NUTS0 and
      organic status, or NUTS0 or NUTS2, the mapping would be
      `list(c("NUTS0", "misc%OrganicCode"), "NUTS0", "NUTS2")`. The
      first vector is the grouping you want to use, and the following
      vectors are the groupings you want to aggregate to. Currently,
      depending on the farm branch selected, either dairy or arable,
      different subset conditions apply. In the case of dairy, more than
      5 cows and those farms that have the top and bottom 10 % of milk
      yields are removed, also farm typology (TF14) of 45 is selected.
      In the case of arable, the farms are selected whose land is
      covered with 80% FarmDyn crops, also the Farm typologies 15, 16,
      20 are selected.

    - From this data, the variable names are selected and/or computed
      for use in FarmDyn. At the same time, yield calculations per
      region are happening. If there is a region that does not have any
      yield it is imputed with the higher level region, country or
      European average. Once this is all done, the data is mapped using
      the provided list of mappings, so that different files can be
      generated for p_farmdata. If `save_gdx` is set to `TRUE`, the
      resulting files are in gdx format with a parameter p_farmdata and
      names depending on the provided mappings.

    - Once the gdx files are made, they have to be placed in the “dat”
      folder in FarmDyn

<!-- -->

Hugo Scherer's avatar
Hugo Scherer committed
3.  Write batch file with `writeBatch()` and run FarmDyn with
    `runFarmDynfromBatch()` using the created batch file - Optional:
    Create descriptive statistics for reporting with `fd_desc()` with
    the farmbranch.
    - This will copy the segment towards the end of `runInc.gms`
      containing the settings to run FarmDyn from batch files and
      replace the scenario description with the mapping, the farmData
      file with the format “farmData\_{ mapping }.gdx”, as is created in
      the FarmDynR package using `fadn2fd()`, and will include all the
      farmIds from this gdx file
    - This only works for one mapping, however, a list of mappings will
      not work like it does for the aggregation. However, a single
      mapping made of different categories or characteristics can be
      accepted (for example `c("NUTS0", "misc%OrganicCode")`). The
      mappings have to match the ones created with `fadn2fd()`
    - Once the batch file is created, `runFarmDynfromBatch()` will run
      FarmDyn by giving the location of the batch file, the ini and the
      xml
    - The descriptive statistics are calculated using the weighted mean
      with the (summed up) SYS02 weights

Once the simulations have been run, the next step is to analyze the
results. In FarmDynR there are many ways to do this. You can load
parameters, scalars, and marginals from dump files, or access to
results, automatically setting a column `sims` as the name of your
simulation run. `scen_analysis()` will load the results of the different
scenarios (`p_res`) found in `res_{scenario name}_until_2010.gdx` by
merging them together. `load_dumps()` does the same, but with dump files
with the pattern `dump_{scenario name}_{farmId}.gdx`. From dump files,
many other objects can be accessed, like variables (`vars_dump()`),
scalars (`load_dump_scalar()`), and marginals (`load_dump_marg()`), in
case they are needed.

Taking advantage of the `scen_diff()` and `abs_diff()` functions, the
relative and absolute differences compared to the reference scenario can
be easily calculated. Technically, these latter functions can be used
for any data, not just FarmDyn scenario results.

## Example

``` r
library(FarmDynR)

# Read in FADN data, for example using:
FADN <- setDT(
  ldply(
    FADNFiles, fread,
  )
)

# Create mapping
mapping <- list(c("NUTS0", "misc%OrganicCode"), "NUTS0", "NUTS2")

# Create FarmDyn data
fd_data <- fadn2fd(fadn, "Dairy", mapping, save_gdx = FALSE)

# Create descriptive statistics
fd_desc(fd_data, farmbranch = "dairy", csv = FALSE, dir = NULL)

# Write batch file
writeBatch("path/to/FarmDyn", mapping, farmIds = unique(fd_data[[mapping]]$farmIds))

# Run FarmDyn
runFarmDynfromBatch("path/to/batch/file",
  IniFile = "path/to/Ini.ini", XMLFile = "path/to/xml.xml",
)

# Analyze the scenarios
scenarios <- scen_analysis(res_fold=file.path("FarmDyn", "sample", "results"), scen_name = c("reference", "scenario1", "scenario2"))

# Calculate differences from reference
selected_cols <- c("colname", "othercolnames")
other_cols <- c("colname1", "colname2")

scenarios <- abs_diff(scenarios, vars_to_diff = selected_cols)
scenarios <- scen_diff(scenarios, vars_to_diff = other_cols)
```