
Pricing workflow building blocks
Source:vignettes/pricing-workflow-building-blocks.Rmd
pricing-workflow-building-blocks.Rmdinsurancerating provides building blocks for common
actuarial pricing tasks in GLM-based tariff analysis. The package does
not prescribe a single pricing method. Instead, it supports practical
steps that often appear in insurance pricing work: portfolio analysis,
model interpretation, tariff refinement and model validation.
This vignette gives a compact overview of those building blocks and how they can be combined.
1. Start with portfolio experience
A pricing analysis often starts by checking how the observed portfolio behaves by risk factor. This is useful before modelling, but also later when reviewing whether fitted relativities are plausible.
factor_analysis() summarises exposure, claim frequency,
average severity, risk premium and related metrics by one or more risk
factors.
fa <- factor_analysis(
MTPL,
risk_factors = "zip",
claim_count = "nclaims",
claim_amount = "amount",
exposure = "exposure"
)
head(fa)
#> zip amount nclaims exposure frequency average_severity risk_premium
#> 1 1 116178669 1593 11080.6274 0.1437644 72930.74 10484.846
#> 2 2 59751985 1008 7782.6301 0.1295192 59277.76 7677.608
#> 3 3 58988962 1038 7587.5644 0.1368028 56829.44 7774.427
#> 4 0 821510 29 206.8438 0.1402024 28327.93 3971.644The output helps answer practical questions such as:
- where exposure is concentrated
- whether observed differences are credible or noisy
- whether a segment is driven by a small number of claims
- which risk factors may need closer modelling or refinement
For numeric variables with long or skewed tails,
outlier_histogram() can help inspect extreme observations
before fitting severity models or constructing tariff segments.
outlier_histogram(
MTPL2,
x = "premium",
upper = 100,
density = FALSE
)
2. Assess large losses
Large claims can dominate severity and pure premium analysis. In capped severity workflows, it is often useful to assess a cap first, decompose the historical claim amounts, and then decide how the excess burden should be allocated. A low threshold increases pricing responsiveness but introduces volatility. A high threshold improves stability but may understate structural differences between segments.
thresholds <- assess_excess_threshold(
claims,
claim_amount = "claim_amount",
thresholds = c(50000, 100000, 150000),
exposure = "earned_exposure",
group = "sector"
)
autoplot(thresholds, y = "premium_impact")After choosing a threshold, calculate_excess_loss()
creates a deterministic historical decomposition. It does not bootstrap
or allocate anything.
excess <- calculate_excess_loss(
claims,
claim_amount = "claim_amount",
threshold = 100000
)The allocation step is where pooling and uncertainty are handled. Portfolio pooling is stable but ignores group experience. Group pooling is responsive but can be volatile. Partial pooling balances portfolio stability, group responsiveness and the credibility of observed excess experience.
allocation <- allocate_excess_loss(
excess,
excess_amount = "excess_claim_amount",
weight = "earned_exposure",
group = "sector",
pooling = "partial",
preserve_total = TRUE
)
summary(allocation, compare_to_empirical = TRUE)
autoplot(allocation, y = "allocated_loading")
autoplot(allocation, y = "credibility")In the allocation output, allocated_excess_loss is the
absolute monetary burden assigned to a row.
allocated_loading is the corresponding loading per unit of
the chosen weight, such as earned exposure. This distinction matters
when the output is added back to pricing data.
The allocated loading can then be added to the pricing data.
excess$base_premium <- excess$technical_premium
priced <- add_excess_loading(
excess,
allocation,
base_premium = "base_premium"
)This excess loading is part of the technical risk premium. It is not intended as a commercial margin.
3. Translate continuous factors into tariff segments
Many tariffs use grouped versions of continuous variables such as
age, vehicle age or insured value. risk_factor_gam() can be
used to inspect the fitted shape of a continuous risk factor.
derive_tariff_segments() can then derive candidate segment
boundaries from that pattern.
age_gam <- risk_factor_gam(
data = MTPL,
claim_count = "nclaims",
risk_factor = "age_policyholder",
exposure = "exposure"
)
age_segments <- derive_tariff_segments(age_gam)
age_segments
#> Tariff segment boundaries:
#> [1] 18 25 32 39 51 58 65 84 95The derived segments can be added back to the portfolio with
add_tariff_segments().
portfolio <- MTPL |>
add_tariff_segments(age_segments, name = "age_policyholder_segment")
head(portfolio[, c("age_policyholder", "age_policyholder_segment")])
#> # A tibble: 6 × 2
#> age_policyholder age_policyholder_segment
#> <int> <fct>
#> 1 70 (65,84]
#> 2 40 (39,51]
#> 3 78 (65,84]
#> 4 49 (39,51]
#> 5 59 (58,65]
#> 6 71 (65,84]These functions are intended to support actuarial judgement, not replace it. Candidate segment boundaries should still be reviewed for credibility, stability and practical usability.
4. Fit and interpret a GLM
GLMs are widely used in insurance pricing because they provide an
interpretable multiplicative structure. After fitting a model,
rating_table() expresses the coefficients in tariff-table
form.
portfolio$zip <- as.factor(portfolio$zip)
freq_model <- glm(
nclaims ~ zip + age_policyholder_segment + offset(log(exposure)),
family = poisson(),
data = portfolio
)
rt <- rating_table(
freq_model,
model_data = portfolio,
exposure = "exposure"
)
head(rt$df)
#> risk_factor level est_freq_model exposure
#> 1 (Intercept) (Intercept) 0.2743790 NA
#> 2 zip 0 1.0000000 207
#> 3 zip 1 0.9944341 11081
#> 4 zip 2 0.8960053 7783
#> 5 zip 3 0.9493475 7588
#> 6 age_policyholder_segment [18,25] 1.0000000 1331Observed portfolio experience from factor_analysis() can
be attached to the rating table with
add_observed_experience(). This makes the comparison
between model relativities and observed experience explicit.
zip_experience <- factor_analysis(
portfolio,
risk_factors = "zip",
claim_count = "nclaims",
exposure = "exposure"
)
rt |>
add_observed_experience(zip_experience, metric = "frequency") |>
autoplot(risk_factors = "zip")
5. Refine tariff effects when needed
Raw model output may be statistically valid but still unsuitable for direct tariff use. Sparse levels, noisy estimates or non-monotonic adjacent effects can make a tariff hard to explain or maintain.
The refinement workflow makes these adjustments explicit:
refined_model <- prepare_refinement(freq_model) |>
add_smoothing(
model_variable = "age_policyholder_segment",
source_variable = "age_policyholder",
weights = "exposure"
) |>
add_restriction(restrictions) |>
refit()Common refinement tasks include:
- smoothing adjacent tariff levels
- fixing selected coefficients to actuarial or commercial assumptions
- applying sublevel relativities within a broader GLM factor level
- refitting the model while preserving the intended tariff structure
These tools are most useful when the statistical model already captures the main risk structure and the remaining work is tariff refinement.
6. Validate model behaviour
Pricing models should be checked before their output is used in a
tariff. insurancerating contains helpers for several common
checks:
-
check_overdispersion()for Poisson frequency models -
check_residuals()for simulation-based residual diagnostics using DHARMa -
bootstrap_performance()for predictive stability with metrics such as RMSE -
rating_grid()to inspect observed rating-grid combinations
For example:
check_overdispersion(freq_model)
#> Dispersion ratio = 1.185
#> Pearson's Chi-squared = 35522.367
#> p-value = < 0.001
#> Overdispersion detected.
check_residuals(freq_model) |>
autoplot()Validation does not make a tariff decision by itself. It gives evidence about model fit, stability and areas that may need further review.
Typical workflow
One possible workflow is:
- Inspect the portfolio with
factor_analysis()andoutlier_histogram(). - Assess large-loss thresholds with
assess_excess_threshold()where capped severity or excess-loss loadings are relevant. - Decompose and allocate excess loss with
calculate_excess_loss()andallocate_excess_loss(). - Analyse continuous risk factors with
risk_factor_gam(). - Create candidate tariff segments with
derive_tariff_segments(). - Fit GLMs for frequency, severity or pure premium.
- Interpret coefficients with
rating_table(). - Compare fitted relativities with observed experience using
add_observed_experience(). - Apply refinement where needed with
prepare_refinement(),add_smoothing(),add_restriction()oradd_relativities(). - Validate the resulting model with the model-performance helpers.
The exact order and choice of functions depends on the portfolio, product, data quality and pricing objective.