Skip to contents

Univariate analysis for discrete risk factors in an insurance portfolio. The following summary statistics are calculated:

  • frequency (i.e. number of claims / exposure)

  • average severity (i.e. severity / number of claims)

  • risk premium (i.e. severity / exposure)

  • loss ratio (i.e. severity / premium)

  • average premium (i.e. premium / exposure)

If input arguments are not specified, the summary statistics related to these arguments are ignored.

Usage

univariate(
  df,
  x,
  severity = NULL,
  nclaims = NULL,
  exposure = NULL,
  premium = NULL,
  by = NULL
)

Arguments

df

data.frame with insurance portfolio

x

column in df with risk factor, or use vec_ext() for use with an external vector (see examples)

severity

column in df with severity (default is NULL)

nclaims

column in df with number of claims (default is NULL)

exposure

column in df with exposure (default is NULL)

premium

column in df with premium (default is NULL)

by

list of column(s) in df to group by

Value

A data.frame

Author

Martin Haringa

Examples

# Summarize by `area`
univariate(MTPL2, x = area, severity = amount, nclaims = nclaims,
           exposure = exposure, premium = premium)
#> # A tibble: 4 × 10
#>    area  amount nclaims exposure premium frequency average_severity risk_premium
#>   <int>   <int>   <int>    <dbl>   <int>     <dbl>            <dbl>        <dbl>
#> 1     2 4063270      98    819.    51896    0.120            41462.        4964.
#> 2     3 7945311     113    765.    49337    0.148            70312.       10386.
#> 3     1 6896187     146   1066.    65753    0.137            47234.        6471.
#> 4     0    6922       1     13.3     902    0.0751            6922          520.
#> # ℹ 2 more variables: loss_ratio <dbl>, average_premium <dbl>

# Summarize by `area`, with column name in external vector
xt <- "area"
univariate(MTPL2, x = vec_ext(xt), severity = amount, nclaims = nclaims,
           exposure = exposure, premium = premium)
#> # A tibble: 4 × 10
#>    area  amount nclaims exposure premium frequency average_severity risk_premium
#>   <int>   <int>   <int>    <dbl>   <int>     <dbl>            <dbl>        <dbl>
#> 1     2 4063270      98    819.    51896    0.120            41462.        4964.
#> 2     3 7945311     113    765.    49337    0.148            70312.       10386.
#> 3     1 6896187     146   1066.    65753    0.137            47234.        6471.
#> 4     0    6922       1     13.3     902    0.0751            6922          520.
#> # ℹ 2 more variables: loss_ratio <dbl>, average_premium <dbl>

# Summarize by `zip` and `bm`
univariate(MTPL, x = zip, severity = amount, nclaims = nclaims,
           exposure = exposure, by = bm)
#> # A tibble: 84 × 8
#>    zip      bm   amount nclaims exposure frequency average_severity risk_premium
#>    <fct> <int>    <dbl>   <int>    <dbl>     <dbl>            <dbl>        <dbl>
#>  1 1         5  4938135      82     550.     0.149           60221.        8983.
#>  2 1         3  3623485      86     614.     0.140           42134.        5902.
#>  3 2         8  1739654      38     249.     0.152           45780.        6981.
#>  4 1        10  2077041      73     451.     0.162           28453.        4601.
#>  5 3         1 20064123     381    2841.     0.134           52662.        7062.
#>  6 3         6  3814492      82     539.     0.152           46518.        7081.
#>  7 3         2 11182348     179    1282.     0.140           62471.        8726.
#>  8 2         1 25368747     356    2944.     0.121           71261.        8617.
#>  9 1         2 17512277     287    1835.     0.156           61018.        9542.
#> 10 2         9   574527      25     237.     0.106           22981.        2428.
#> # ℹ 74 more rows

# Summarize by `zip`, `bm` and `power`
univariate(MTPL, x = zip, severity = amount, nclaims = nclaims,
           exposure = exposure, by = list(bm, power))
#> # A tibble: 3,290 × 9
#>    zip      bm power  amount nclaims exposure frequency average_severity
#>    <fct> <int> <int>   <dbl>   <int>    <dbl>     <dbl>            <dbl>
#>  1 1         5   106       0       0      1      0                  NaN 
#>  2 1         3    74    2687       1     14.1    0.0707            2687 
#>  3 2         8    65       0       0      5      0                  NaN 
#>  4 1        10    64       0       0      7      0                  NaN 
#>  5 3         1    29   37784       3     21.9    0.137            12595.
#>  6 3         6    66  114021       2     27.6    0.0726           57010.
#>  7 3         2    43 1382215      11     61.3    0.180           125656.
#>  8 3         2    55  764498      27    146.     0.185            28315.
#>  9 3         1   100    3405       1     14.2    0.0703            3405 
#> 10 3         2    66  929945      15     97.1    0.154            61996.
#> # ℹ 3,280 more rows
#> # ℹ 1 more variable: risk_premium <dbl>