| Type: | Package |
| Title: | Actuarial Tools for Insurance Pricing Models |
| Version: | 0.8.0 |
| Author: | Martin Haringa [aut, cre] |
| Maintainer: | Martin Haringa <mtharinga@gmail.com> |
| BugReports: | https://github.com/MHaringa/insurancerating/issues |
| Description: | Provides actuarial tools and building blocks for analysing, modelling, refining, and validating insurance rating models. Designed to support common GLM-based pricing tasks and the translation of statistical model output into practical tariff structures. The package supports the construction of insurance tariff classes using a data-driven approach, based on the methodology of Antonio and Valdez (2012) <doi:10.1007/s10182-011-0152-7>. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| URL: | https://mharinga.github.io/insurancerating/, https://github.com/MHaringa/insurancerating |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | ciTools, data.table, DHARMa, dplyr, evtree, fitdistrplus, ggplot2, lifecycle, lubridate, mgcv, patchwork, rlang, scales, scam, stringr |
| Depends: | R (≥ 4.1.0) |
| Suggests: | classInt, ggbeeswarm, ggrepel, spelling, knitr, rmarkdown, testthat |
| Language: | en-US |
| VignetteBuilder: | knitr |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-02 12:19:36 UTC; martin |
| Repository: | CRAN |
| Date/Publication: | 2026-06-02 12:40:02 UTC |
Motor Third Party Liability (MTPL) portfolio
Description
A dataset containing the characteristics of 30,000 policyholders in a Dutch Motor Third Party Liability (MTPL) insurance portfolio. Includes information on policyholder characteristics, vehicle attributes, and claims.
Usage
MTPL
Format
A data frame containing 30,000 rows and 7 variables:
- age_policyholder
Age of the policyholder (in years).
- nclaims
Number of claims.
- exposure
Exposure, expressed in years. For example, if a vehicle is insured from July 1, the exposure equals 0.5 for that year.
- amount
Claim severity (in euros).
- power
Engine power of the vehicle (in kilowatts).
- bm
Bonus-malus level (0–22). Higher levels indicate worse claim history.
- zip
Region indicator (0–3).
Author(s)
Martin Haringa
Motor Third Party Liability (MTPL) portfolio (3,000 policyholders)
Description
A dataset containing the characteristics of 3,000 policyholders in a Dutch Motor Third Party Liability (MTPL) insurance portfolio. Includes information on region, claims, exposure, and premium.
Usage
MTPL2
Format
A data frame containing 3,000 rows and 6 variables:
- customer_id
Unique customer identifier.
- area
Region where the customer lives (0–3).
- nclaims
Number of claims.
- amount
Claim severity (in euros).
- exposure
Exposure, expressed in years.
- premium
Earned premium.
Author(s)
Martin Haringa
Find active portfolio rows for event dates
Description
Matches event dates, such as claim dates or portfolio snapshot dates, to the rows that were active in the portfolio on those dates.
Usage
active_rows_by_date(
portfolio,
dates,
period_start,
period_end,
date,
by = NULL,
nomatch = NULL,
mult = "all"
)
Arguments
portfolio |
A |
dates |
A |
period_start |
Character string. Name of the portfolio column with period start dates. |
period_end |
Character string. Name of the portfolio column with period end dates. |
date |
Character string. Name of the date column in |
by |
Character vector with additional columns used to match |
nomatch |
When a row (with interval say, |
mult |
When multiple rows in y match to the row in x, |
Details
This is useful when claim records or other dated events need the rating factors, premium, exposure, or policy attributes that were active at the event date. The function performs an interval join between event dates and portfolio coverage periods, optionally within matching identifiers such as a policy number.
Value
An object with the same class as portfolio.
Author(s)
Martin Haringa
Examples
library(lubridate)
portfolio <- data.frame(
begin1 = ymd(c("2014-01-01", "2014-01-01")),
end = ymd(c("2014-03-14", "2014-05-10")),
termination = ymd(c("2014-03-14", "2014-05-10")),
exposure = c(0.2025, 0.3583),
premium = c(125, 150),
car_type = c("BMW", "TESLA"))
## Find active rows on different dates
dates0 <- data.frame(active_date = seq(ymd("2014-01-01"), ymd("2014-05-01"),
by = "months"))
active_rows_by_date(
portfolio,
dates0,
period_start = "begin1",
period_end = "end",
date = "active_date"
)
## With extra identifiers (merge claim date with time interval in portfolio)
claim_dates <- data.frame(claim_date = ymd("2014-01-01"),
car_type = c("BMW", "VOLVO"))
### Only rows are returned that can be matched
active_rows_by_date(
portfolio,
claim_dates,
period_start = "begin1",
period_end = "end",
date = "claim_date",
by = "car_type"
)
### When row cannot be matched, NA is returned for that row
active_rows_by_date(
portfolio,
claim_dates,
period_start = "begin1",
period_end = "end",
date = "claim_date",
by = "car_type",
nomatch = NA
)
Add observed portfolio experience to a rating table
Description
Attach the output of factor_analysis() to a rating_table() object so it
can be shown in autoplot.rating_table(). This is useful when you want to
compare fitted GLM relativities with the observed portfolio pattern for the
same rating factor.
The observed metric is scaled before plotting. With scale = "reference"
the metric is divided by the observed value of the model reference level. If
a clear reference level cannot be found, the metric is scaled to its mean.
With scale = "mean", the metric is always scaled to its mean.
Usage
add_observed_experience(
object,
experience,
metric = "risk_premium",
label = "Observed experience",
color = NULL,
scale = c("reference", "mean")
)
Arguments
object |
A |
experience |
A |
metric |
Character; metric from |
label |
Character; legend label for the observed experience line. |
color |
Optional line color. If |
scale |
Character; scaling applied before plotting. One of
|
Value
A rating_table object with observed portfolio experience attached.
Author(s)
Martin Haringa
Examples
df <- MTPL2
df$area <- as.factor(df$area)
model <- glm(
nclaims ~ area + offset(log(exposure)),
family = poisson(),
data = df
)
observed <- factor_analysis(
df,
risk_factors = "area",
claim_count = "nclaims",
exposure = "exposure"
)
rating_table(model, model_data = df, exposure = "exposure") |>
add_observed_experience(observed, metric = "frequency") |>
autoplot(risk_factors = "area")
Add Model Predictions to a Data Frame
Description
Adds predictions (and optionally confidence intervals) from one or more
glm models to a data frame.
Usage
add_prediction(
data,
...,
predictions = NULL,
prefix = "pred",
confidence = FALSE,
interval_names = c("lower", "upper"),
alpha = 0.1,
var = NULL,
conf_int = NULL
)
Arguments
data |
A |
... |
One or more fitted model objects of class |
predictions |
Optional character vector giving names for the new
prediction columns. Must have the same length as the number of models
supplied. If |
prefix |
Character. Prefix used for automatically generated prediction
column names. Default is |
confidence |
Logical. If |
interval_names |
Character vector of length two. Names appended to the
prediction column name for lower and upper confidence interval bounds.
Default is |
alpha |
Numeric between 0 and 1. Controls the miscoverage level for
interval estimates. Default is |
var |
Deprecated. Use |
conf_int |
Deprecated. Use |
Value
A data.frame containing the original data along with additional columns for
model predictions (and confidence intervals if requested).
Author(s)
Martin Haringa
Examples
mod1 <- glm(nclaims ~ age_policyholder,
data = MTPL,
offset = log(exposure),
family = poisson())
# Add predicted values
mtpl_pred <- add_prediction(MTPL, mod1)
# Add predicted values with confidence bounds
mtpl_pred_ci <- add_prediction(MTPL, mod1, confidence = TRUE)
Add expert-based relativities to a refinement workflow
Description
Splits an existing model variable into more detailed tariff segments using supplied relativities. This is useful when the GLM is fitted on a coarser rating factor for credibility or stability, but the final tariff needs a more detailed split that is based on portfolio exposure, expert judgement or externally agreed relativities.
Usage
add_relativities(
model,
model_variable,
split_variable,
relativities,
exposure,
normalize = TRUE
)
Arguments
model |
Object of class |
model_variable |
Character string. Existing variable in the GLM. Levels of this variable can be split into more detailed tariff segments. |
split_variable |
Character string. More granular portfolio variable that
defines the detailed groups inside |
relativities |
Named list of data frames, usually created with
|
exposure |
Character string. Exposure column used for weighting and, when requested, normalisation. |
normalize |
Logical. If |
Details
model_variable is the variable already used in the GLM. split_variable is
the more detailed variable in the portfolio data that will be used to split
one or more levels of model_variable. The relativities argument should be
a named list describing those splits, usually built with relativities() and
split_level().
The step is stored on the rating_refinement object and is applied when
refit() is called. When normalize = TRUE, the supplied relativities are
normalised using exposure so that the refined split keeps the original level
effect on average. This helps prevent an expert split from unintentionally
changing the total premium level for the original model group.
When to use
add_relativities() is intended for refinement within an already reasonably
homogeneous GLM segment. It redistributes an existing coefficient across
sublevels using exposure-weighted relativities, while preserving the overall
level of the original coefficient. This is useful for mild heterogeneity,
commercial refinement, monotonic tariff differentiation, or expert-based
segmentation within a stable risk group where the original GLM coefficient is
broadly representative.
Limitations
The method is not a substitute for creating a separate risk segment when the original GLM coefficient is itself distorted. For example, suppose a broad industry segment contains many relatively stable businesses, but a few chemical companies drive most of the losses while representing little exposure. The fitted industry coefficient may then be dominated by the chemical companies' experience. Applying exposure-weighted relativities inside that segment may barely reduce the coefficient for the large exposure group, because the original coefficient is already pulled upward by the outlier subgroup.
In that situation it is often better to create a separate GLM factor level,
derive a separate tariff segment, or apply explicit segmentation or
acceptation rules, instead of relying only on add_relativities().
Value
Object of class rating_refinement.
Author(s)
Martin Haringa
Examples
portfolio <- data.frame(
claims = c(1, 2, 1, 3, 2, 4),
exposure = rep(1, 6),
construction = factor(c("residential", "commercial", "residential",
"commercial", "residential", "commercial")),
construction_detail = factor(c("flat", "shop", "house",
"office", "flat", "shop"))
)
model <- glm(
claims ~ construction + offset(log(exposure)),
family = poisson(),
data = portfolio
)
relativities <- relativities(
split_level(
"residential",
new_levels = c("flat", "house"),
relativities = c(0.95, 1.05)
)
)
refined <- prepare_refinement(model, data = portfolio) |>
add_relativities(
model_variable = "construction",
split_variable = "construction_detail",
relativities = relativities,
exposure = "exposure"
)
Add coefficient restrictions to a refinement workflow
Description
Fixes selected model levels to user-supplied relativities in a refinement workflow. This is useful when the fitted GLM coefficients need to be adjusted before the final tariff is refitted, for example to apply expert judgement, enforce a business rule, remove an implausible local effect, or make a tariff structure easier to explain.
Usage
add_restriction(model, restrictions)
Arguments
model |
Object of class |
restrictions |
Data frame with exactly two columns. The first column must have the same name as the model variable to restrict and contains the levels to adjust. The second column contains the replacement relativities. Levels that are not supplied are filled with the currently fitted GLM relativities. |
Details
add_restriction() stores a restriction step on a rating_refinement
object. It does not refit the GLM immediately. The restrictions are applied
when refit() is called.
The restrictions data frame identifies the model variable to restrict by
its first column. The second column contains the relativities that should be
used for those levels in the refined model. New code should use this function
after prepare_refinement(); the deprecated restrict_coef() wrapper is
only kept for backwards compatibility.
The restriction table may contain all levels of the model variable, or only the levels that need a manual adjustment. If only a subset is supplied, the missing levels are automatically filled with their current fitted GLM relativities. This makes it possible to fix one level explicitly while keeping the other levels at their already estimated values.
Value
Object of class rating_refinement.
Author(s)
Martin Haringa
Examples
portfolio <- data.frame(
claims = c(1, 2, 1, 3, 2, 4),
exposure = rep(1, 6),
postal_area = factor(c("A", "B", "C", "A", "B", "C"))
)
model <- glm(
claims ~ postal_area + offset(log(exposure)),
family = poisson(),
data = portfolio
)
restrictions <- data.frame(
postal_area = "C",
relativity = 1.10
)
refined <- prepare_refinement(model, data = portfolio) |>
add_restriction(restrictions)
Add smoothing to a refinement workflow
Description
Replaces a grouped or binned model effect by a smoother tariff curve in a refinement workflow. This is commonly used for numeric rating factors such as age, vehicle age, insured value or bonus-malus years, where a raw GLM factor can be too jagged for a stable and explainable tariff.
Usage
add_smoothing(
model,
model_variable = NULL,
source_variable = NULL,
degree = NULL,
breaks = NULL,
smoothing = "spline",
k = NULL,
weights = NULL,
tariff_class = NULL,
rating_variable = NULL,
x_cut = NULL,
x_org = NULL
)
Arguments
model |
Object of class |
model_variable |
Character string. Existing grouped or binned variable in the GLM. This is the model term that will be replaced by a smoothed tariff factor. |
source_variable |
Character string. Original numeric portfolio variable
underlying |
degree |
Optional single whole number. Polynomial degree, used by polynomial smoothing methods. |
breaks |
Numeric vector with the tariff segment boundaries to use after smoothing. Values must be finite and strictly increasing. |
smoothing |
Character string with the smoothing method, for example
|
k |
Optional single positive whole number. Number of basis functions for smoothing methods that use a basis dimension. |
weights |
Optional character string. Weights column, usually exposure. |
tariff_class, rating_variable |
Deprecated. Use |
x_cut, x_org |
Deprecated. Use |
Details
add_smoothing() stores a smoothing step on a rating_refinement object.
The original GLM contains model_variable, usually a factor or grouped
tariff segment. The smoother is fitted against source_variable, the original
numeric portfolio variable behind those groups. The smoothed result is then
converted back to tariff segments using breaks and applied when refit() is
called.
This makes the intended API explicit: first prepare the model with
prepare_refinement(), then add a smoothing step, optionally adjust it with
edit_smoothing(), and finally call refit(). The deprecated
smooth_coef() wrapper is only kept for backwards compatibility.
Value
Object of class rating_refinement.
Author(s)
Martin Haringa
Examples
## Not run:
library(dplyr)
age_policyholder_frequency <- risk_factor_gam(
data = MTPL,
claim_count = "nclaims",
risk_factor = "age_policyholder",
exposure = "exposure"
)
age_segments_freq <- derive_tariff_segments(age_policyholder_frequency)
dat <- MTPL |>
add_tariff_segments(age_segments_freq, name = "age_policyholder_freq_cat") |>
mutate(across(where(is.character), as.factor)) |>
mutate(across(where(is.factor), ~ set_reference_level(., exposure)))
freq <- glm(
nclaims ~ bm + age_policyholder_freq_cat,
offset = log(exposure),
family = poisson(),
data = dat
)
sev <- glm(
amount ~ zip,
weights = nclaims,
family = Gamma(link = "log"),
data = dat |> filter(amount > 0)
)
premium_df <- dat |>
add_prediction(freq, sev) |>
mutate(premium = pred_nclaims_freq * pred_amount_sev)
burn_unrestricted <- glm(
premium ~ zip + bm + age_policyholder_freq_cat,
weights = exposure,
family = Gamma(link = "log"),
data = premium_df
)
ref <- prepare_refinement(burn_unrestricted) |>
add_smoothing(
model_variable = "age_policyholder_freq_cat",
source_variable = "age_policyholder",
breaks = seq(18, 95, 5),
weights = "exposure"
)
## End(Not run)
Add derived tariff segments to portfolio data
Description
Adds the tariff segments derived by derive_tariff_segments() as a new factor
column to a portfolio data set. This is the recommended way to attach derived
tariff segments to the same portfolio rows that were used to fit the risk
factor GAM.
Usage
add_tariff_segments(data, segments, name = NULL, overwrite = FALSE)
Arguments
data |
A data frame to which the tariff segments should be added. |
segments |
Object of class |
name |
Character string. Name of the new output column. If |
overwrite |
Logical. If |
Value
A data frame with the derived tariff segment column added.
Author(s)
Martin Haringa
Examples
## Not run:
age_segments <- risk_factor_gam(
MTPL,
risk_factor = "age_policyholder",
claim_count = "nclaims",
exposure = "exposure"
) |>
derive_tariff_segments()
MTPL |>
add_tariff_segments(age_segments, name = "age_policyholder_segment")
## End(Not run)
Allocate excess loss to a pricing portfolio
Description
Large claims can distort risk-factor relativities and create unstable
premiums. allocate_excess_loss() redistributes historical excess losses
across a portfolio in a controlled and transparent way.
The function is typically used after calculate_excess_loss(). The base
premium can be modelled on capped claim amounts, while the excess part of
large claims is allocated back to the portfolio as an additional loading.
Usage
allocate_excess_loss(
data,
excess_amount,
allocation_weight,
risk_factor = NULL,
allocation_subset = NULL,
allocation = c("portfolio", "risk_factor", "partial"),
credibility = NULL,
credibility_basis = c("claims", "excess_claims", "allocation_weight"),
credibility_threshold = 50,
credibility_scale = 1,
method = c("observed", "bootstrap"),
n_bootstrap = 1000,
bootstrap_seed = NULL,
severity_noise = c("none", "lognormal", "normal"),
severity_noise_sd = 0.25,
preserve_total_excess = TRUE
)
Arguments
data |
A data.frame, typically the output of |
excess_amount |
Character string. Column containing the excess claim amount to allocate. |
allocation_weight |
Character string. Column used as allocation weight, typically exposure, premium, insured value or another earned unit. |
risk_factor |
Optional character string. Risk-factor column used for
|
allocation_subset |
Optional character string. Logical column indicating
which rows participate in the allocation. If |
allocation |
Character string. One of |
credibility |
Optional numeric scalar between 0 and 1. Used directly
when |
credibility_basis |
Character string. Experience basis used when
|
credibility_threshold |
Positive numeric scalar. Amount of experience required to reach 50 percent credibility. |
credibility_scale |
Positive numeric scalar. Multiplies the derived or supplied credibility before it is capped between 0 and 1. |
method |
Character string. Either |
n_bootstrap |
Positive whole number. Number of bootstrap samples. |
bootstrap_seed |
Optional integer seed for reproducible bootstrap allocation. |
severity_noise |
Character string. One of |
severity_noise_sd |
Non-negative numeric scalar controlling severity variation in bootstrap samples. |
preserve_total_excess |
Logical. If |
Details
Allocation methods
The allocation argument determines how the excess burden is shared.
-
"portfolio": excess losses are pooled across the entire portfolio and redistributed using the specified allocation weight. The excess burden is shared by all included risks regardless of their risk-factor level.This provides the most stable excess loading and is often appropriate when excess losses are infrequent, highly volatile or considered a portfolio- wide risk rather than a risk-factor-specific characteristic.
-
"risk_factor": excess losses are allocated separately for each risk-factor level. The excess burden observed within a group is spread across all risks in that group and is not shared with other groups.This produces the strongest link between excess loadings and observed group experience, but can lead to volatile results when excess losses are rare.
-
"partial": excess losses are allocated using a credibility-weighted combination of portfolio and risk-factor experience. Risk-factor levels with more credible experience receive excess loadings that more closely reflect their own observed excess-loss burden, while less credible groups are pooled more strongly towards the portfolio average.This approach typically provides a good balance between pricing stability and risk differentiation and is therefore often the preferred choice in practical rating applications.
Credibility weighting
For allocation = "partial", excess losses are allocated using a
credibility-weighted blend of portfolio and risk-factor experience.
The allocated loading is calculated as:
loading_g = Z_g \cdot loading_g^{risk\ factor} +
(1 - Z_g) \cdot loading^{portfolio}
where Z_g represents the credibility assigned to the risk-factor-level
experience.
If credibility is supplied, the same credibility is applied to all
risk-factor levels.
If credibility = NULL, credibility is determined separately for each
risk-factor level based on the selected credibility_basis:
Z_g = \frac{n_g}{n_g + credibility\_threshold}
where n_g is determined by credibility_basis:
-
"claims": total number of claims in the risk-factor level. -
"excess_claims": number of claims with positive excess loss. -
"allocation_weight": total allocation weight in the risk-factor level.
credibility_threshold represents the amount of experience required to
reach 50 percent credibility.
Example:
A sector with 20 claims and credibility_threshold = 50 receives:
Z = \frac{20}{20 + 50} = 0.29
Therefore 29% of the excess loading is based on the sector's own excess-loss experience and 71% is based on the portfolio-wide excess loading.
For example, with credibility_threshold = 50, a group with:
10 claims receives 17% credibility;
50 claims receives 50% credibility;
100 claims receives 67% credibility;
200 claims receives 80% credibility.
The final credibility is scaled using credibility_scale and then capped
between 0 and 1:
Z_g = min(1, max(0, Z_g \cdot credibility\_scale))
Higher values of credibility_threshold or lower values of
credibility_scale pool more strongly towards the portfolio loading.
Bootstrap allocation
With method = "observed", the function allocates the historically observed
excess loss.
With method = "bootstrap", the function repeatedly resamples observed
positive excess claim amounts. This provides a pragmatic estimate of
excess-loss volatility and the resulting uncertainty in excess loadings.
The approach is intended as a practical pricing approximation rather than a formal extreme value model.
The bootstrap affects both the total excess burden and the distribution of
excess loss across risk-factor levels. Use bootstrap_seed to make bootstrap
results reproducible.
Severity noise
severity_noise can only be used with method = "bootstrap".
If severity_noise = "none", bootstrap samples reuse the observed excess
claim amounts.
If severity_noise = "lognormal", sampled excess claims are multiplied by
lognormal noise. This is usually the most natural option for large claims,
because claim amounts remain positive and variation is multiplicative.
If severity_noise = "normal", additive normal noise is applied. This may
be useful for experimentation, but is generally less natural for large
positive claim amounts.
severity_noise_sd controls the amount of additional severity variation. As
a rough guide:
-
0.10provides limited variation; -
0.25provides moderate variation; -
0.50provides substantial variation.
Preserving the total excess loss
If preserve_total_excess = TRUE, the final allocation is rescaled so that
the sum of allocated excess loss equals the total excess loss being
allocated.
This ensures that credibility blending, bootstrap sampling or other allocation choices do not unintentionally increase or decrease the total excess burden.
Typical pricing workflow
A common workflow is:
Use
calculate_excess_loss()to separate capped and excess losses.Model the base premium using capped claim amounts.
Allocate the excess-loss burden using
allocate_excess_loss().Add the resulting excess loading back to the technical premium using
apply_excess_loading().
This approach prevents a small number of large claims from distorting risk-factor relativities while still ensuring that the excess-loss burden is reflected in the final premium.
Value
An object of class "excess_loss_allocation".
Author(s)
Martin Haringa
Examples
claims <- data.frame(
sector = rep(c("Industry", "Retail"), each = 4),
claim_amount = c(
1000, 120000, 30000, 8000,
2000, 150000, 40000, 6000
),
earned_exposure = rep(1, 8)
)
decomposed <- calculate_excess_loss(
claims,
claim_amount = "claim_amount",
threshold = 100000
)
# Pool all excess losses across the portfolio
portfolio_allocation <- allocate_excess_loss(
decomposed,
excess_amount = "excess_claim_amount",
allocation_weight = "earned_exposure",
allocation = "portfolio"
)
# Allocate excess losses separately by sector
sector_allocation <- allocate_excess_loss(
decomposed,
excess_amount = "excess_claim_amount",
allocation_weight = "earned_exposure",
risk_factor = "sector",
allocation = "risk_factor"
)
# Blend sector and portfolio experience using credibility
partial_allocation <- allocate_excess_loss(
decomposed,
excess_amount = "excess_claim_amount",
allocation_weight = "earned_exposure",
risk_factor = "sector",
allocation = "partial",
credibility_basis = "claims",
credibility_threshold = 50
)
summary(partial_allocation)
Apply excess loading to a pricing portfolio
Description
Apply an allocated excess-loss loading to a portfolio data set.
Usage
apply_excess_loading(
data,
allocation,
base_premium = "base_premium",
allocated_excess_loss = NULL,
allocated_loading = NULL,
weight = NULL,
output = c("premium", "rate")
)
Arguments
data |
A data.frame containing the base premium or base rate. |
allocation |
An object returned by |
base_premium |
Character string. Column containing the base premium amount or base rate before the excess loading is added. |
allocated_excess_loss |
Optional character string. Column in
|
allocated_loading |
Optional character string. Column in
|
weight |
Optional character string. Weight column used to convert between
premium amounts and rates when |
output |
Character string. Use |
Details
apply_excess_loading() is the final step in the excess-loss pricing workflow.
It does not cap claims, estimate excess losses or allocate the excess burden.
Instead, it takes the output of allocate_excess_loss() and adds the
allocated excess component back to the base premium or base rate.
The function is typically used after the base premium has been modelled on capped claim amounts. The excess loading then ensures that the cost of claims above the selected threshold is still reflected in the final technical premium.
Premium output
With output = "premium", the function adds the allocated excess loss in
monetary terms to the base premium:
loaded\_premium =
base\_premium + allocated\_excess\_loss
allocated_excess_loss is the row-level monetary amount of excess loss
allocated to each risk.
Rate output
With output = "rate", the function adds the allocated excess loading per
unit of weight to the base rate:
loaded\_rate =
base\_rate + allocated\_loading
Use this option when the base value represents a rate per exposure, premium unit, insured value or other allocation weight.
If the input column supplied through base_premium contains premium amounts
rather than rates, the function first converts the base premium to a rate:
base\_rate =
\frac{base\_premium}{weight}
Interpretation of allocation columns
allocated_excess_loss represents the monetary excess-loss burden allocated
to a row.
allocated_loading represents the excess loading per unit of allocation
weight.
In other words:
allocated\_excess\_loss =
allocated\_loading \cdot weight
This distinction is important when moving between premium amounts and rates.
Typical pricing workflow
A common workflow is:
Use
calculate_excess_loss()to separate capped and excess losses.Model the base premium using capped claim amounts.
Allocate the excess-loss burden using
allocate_excess_loss().Use
apply_excess_loading()to add the allocated excess component back to the base premium or base rate.
This produces a final technical premium that reflects both the modelled capped loss cost and the separately allocated excess-loss burden.
Value
A data.frame. With output = "premium", the result contains
base_premium, allocated_excess_loss, allocated_loading,
excess_loading and loaded_premium. With output = "rate", the result
contains base_rate, allocated_loading and loaded_rate.
Author(s)
Martin Haringa
Examples
claims <- data.frame(
sector = rep(c("Industry", "Retail"), each = 4),
claim_amount = c(
1000, 120000, 30000, 8000,
2000, 150000, 40000, 6000
),
earned_exposure = rep(1, 8)
)
decomposed <- calculate_excess_loss(
claims,
claim_amount = "claim_amount",
threshold = 100000
)
decomposed$base_premium <- 500
allocation <- allocate_excess_loss(
decomposed,
excess_amount = "excess_claim_amount",
allocation_weight = "earned_exposure"
)
apply_excess_loading(
decomposed,
allocation,
base_premium = "base_premium"
)
apply_excess_loading(
decomposed,
allocation,
base_premium = "base_premium",
weight = "earned_exposure",
output = "rate"
)
Assess possible excess-loss thresholds
Description
Compare candidate thresholds for capped severity and large-loss pricing work.
assess_excess_threshold() is a diagnostic helper. It does not choose a
threshold automatically. It shows how many claims and how much historical
claim cost sit above candidate thresholds, and how much pure premium would
remain after capping claims at each threshold.
Use this before calculate_excess_loss() to understand the effect of the
threshold on the portfolio. The output is useful for tariff notes, pricing
reviews and governance discussions around capped severity models.
Usage
assess_excess_threshold(
data,
claim_amount,
thresholds,
exposure = NULL,
group = NULL
)
Arguments
data |
A |
claim_amount |
Character string. Claim amount column. |
thresholds |
Numeric vector of candidate thresholds. |
exposure |
Optional character string. Exposure column. If supplied, pure premium before and after capping is calculated. |
group |
Optional character string. Grouping column used to assess thresholds by segment. |
Value
A data.frame with class "excess_threshold_assessment".
Author(s)
Martin Haringa
Examples
claims <- data.frame(
sector = rep(c("Industry", "Retail"), each = 5),
claim_amount = c(1000, 25000, 120000, 50000, 175000,
2000, 40000, 90000, 150000, 300000),
earned_exposure = rep(1, 10)
)
thresholds <- assess_excess_threshold(
data = claims,
claim_amount = "claim_amount",
thresholds = c(25000, 50000, 100000, 150000),
exposure = "earned_exposure",
group = "sector"
)
autoplot(thresholds, y = "premium_impact")
Autoplot for bootstrap_performance objects
Description
autoplot() method for objects created by bootstrap_performance().
Produces a histogram and density plot of the bootstrapped RMSE values,
with the RMSE of the original fitted model shown as a dashed vertical line.
Optionally, 95% quantile bounds are shown as dotted vertical lines.
Usage
## S3 method for class 'bootstrap_performance'
autoplot(object, fill = "#E6E6E6", color = NA, ...)
Arguments
object |
An object of class |
fill |
Fill color of the histogram bars. Default = |
color |
Border color of the histogram bars. Default = |
... |
Additional arguments passed to |
Value
A ggplot2::ggplot object.
Author(s)
Martin Haringa
Examples
## Not run:
mod1 <- glm(nclaims ~ age_policyholder, data = MTPL,
offset = log(exposure), family = poisson())
x <- bootstrap_performance(mod1, MTPL, n_resamples = 100,
show_progress = FALSE)
autoplot(x)
## End(Not run)
Autoplot for check_residuals objects
Description
autoplot() method for objects created by check_residuals().
Produces a simulation-based uniform QQ-plot of the residuals, with the
Kolmogorov-Smirnov p-value shown in the subtitle.
Optionally prints a message about whether deviations are detected.
Usage
## S3 method for class 'check_residuals'
autoplot(object, show_message = TRUE, max_points = 1000, ...)
Arguments
object |
An object of class |
show_message |
Logical. If TRUE (default), prints a short message based on the p-value from the KS test. |
max_points |
Maximum number of QQ-plot points to display. If the
residual check contains more points, an evenly spaced subset is shown. Use
|
... |
Additional arguments passed to |
Value
A ggplot2::ggplot object.
Author(s)
Martin Haringa
Plot an excess-loss allocation
Description
Visualise the allocated excess loading, allocated excess loss or credibility by allocation group.
Usage
## S3 method for class 'excess_loss_allocation'
autoplot(
object,
y = c("allocated_loading", "allocated_excess_loss", "credibility"),
top_n = NULL,
show_labels = FALSE,
...
)
Arguments
object |
An object returned by |
y |
Character. Measure to plot on the y-axis. |
top_n |
Optional positive whole number. If supplied, only the largest
|
show_labels |
Logical. If |
... |
Unused. |
Value
A ggplot object.
Author(s)
Martin Haringa
Plot an excess threshold assessment
Description
Visualise one diagnostic from an object returned by
assess_excess_threshold(). The plot helps compare how candidate thresholds
affect excess loss, excess claim counts or pure-premium impact.
Usage
## S3 method for class 'excess_threshold_assessment'
autoplot(
object,
y = c("premium_impact", "excess_loss", "n_excess_claims", "excess_loss_ratio"),
...
)
Arguments
object |
An object returned by |
y |
Character. Measure to plot on the y-axis. |
... |
Unused. |
Value
A ggplot object.
Author(s)
Martin Haringa
Automatically create a ggplot for objects obtained from factor analysis
Description
Takes an object produced by factor_analysis() or univariate()
(deprecated NSE interface) and plots the available statistics.
Usage
## S3 method for class 'factor_analysis'
autoplot(
object,
metrics = NULL,
ncol = 1,
show_exposure = TRUE,
show_exposure_labels = TRUE,
sort_by_exposure = FALSE,
level_order = NULL,
decimal_mark = ",",
line_color = NULL,
bar_fill = NULL,
label_width = 50,
flip_bars = FALSE,
show_total = FALSE,
total_color = NULL,
total_name = NULL,
rotate_angle = NULL,
custom_theme = NULL,
remove_underscores = FALSE,
compact_x_axis = TRUE,
show_plots = NULL,
background = NULL,
labels = NULL,
sort = NULL,
sort_manual = NULL,
dec.mark = NULL,
color = NULL,
color_bg = NULL,
coord_flip = NULL,
remove_x_elements = NULL,
...
)
Arguments
object |
A |
metrics |
Numeric or character vector specifying which metrics to plot (default is all available metrics). The numeric positions are:
Character values can be |
ncol |
Number of columns in output (default = 1). |
show_exposure |
Show exposure as background bars behind line plots (default = TRUE). |
show_exposure_labels |
Show labels with the exposure bars (default = TRUE). |
sort_by_exposure |
Sort risk factor levels into descending order by exposure (default = FALSE). |
level_order |
Custom order for risk factor levels; character vector (default = NULL). |
decimal_mark |
Decimal mark; defaults to |
line_color |
Optional override for line/point color. If NULL (default), colors are taken from the internal palette. If specified, the chosen color is applied to all line-based plots. |
bar_fill |
Optional override for background bar color. If NULL (default), the background color is taken from the internal palette. If specified, the chosen color is applied to all background bars. |
label_width |
Width of labels on the x-axis (default = 10). |
flip_bars |
Logical. If |
show_total |
Show line for total if |
total_color |
Color for total line (default = |
total_name |
Legend name for total line (default = NULL). |
rotate_angle |
Numeric value for angle of labels on the x-axis (degrees). |
custom_theme |
List with customized theme options. |
remove_underscores |
Logical; remove underscores from labels (default = FALSE). |
compact_x_axis |
Logical. When
This prevents duplicated x-axes in vertically stacked patchwork plots.
Defaults to |
show_plots |
Deprecated. Use |
background |
Deprecated alias for |
labels |
Deprecated alias for |
sort |
Deprecated alias for |
sort_manual |
Deprecated alias for |
dec.mark |
Deprecated alias for |
color |
Deprecated alias for |
color_bg |
Deprecated alias for |
coord_flip |
Deprecated alias for |
remove_x_elements |
Deprecated alias for |
... |
Other plotting parameters. |
Value
A ggplot2 object.
Author(s)
Marc Haine, Martin Haringa
Examples
## --- New usage (SE, recommended) ---
x <- factor_analysis(MTPL2,
x = "area",
severity = "amount",
nclaims = "nclaims",
exposure = "exposure")
autoplot(x)
## --- Deprecated usage (NSE) ---
x_old <- univariate(MTPL2, x = area, severity = amount,
nclaims = nclaims, exposure = exposure)
autoplot(x_old)
Plot a model refinement step
Description
Takes a rating_refinement object and plots one refinement step before
refit() is called. This is useful for checking whether manual tariff
restrictions, smoothing or expert-based relativities behave as intended
before they are used in a refined pricing model.
For objects produced by add_relativities(), original levels that are split
into new levels are removed from the connected original line and from the
x-axis. Instead, the original level is shown as a horizontal blue segment
spanning all child categories, with the original level label centred above
the segment.
Usage
## S3 method for class 'rating_refinement'
autoplot(
object,
variable = NULL,
step = NULL,
remove_underscores = FALSE,
rotate_angle = NULL,
custom_theme = NULL,
...
)
Arguments
object |
Object of class |
variable |
Optional character string specifying the risk factor to plot.
If |
step |
Optional integer specifying which refinement step to plot.
This is mainly relevant when multiple refinement steps have been applied
(e.g. multiple calls to
This makes it possible to inspect intermediate refinement stages before
calling |
remove_underscores |
Logical; if |
rotate_angle |
Optional numeric value for the angle of x-axis labels. |
custom_theme |
Optional list passed to |
... |
Additional plotting arguments passed to ggplot2 geoms. |
Value
A ggplot2 object.
Author(s)
Martin Haringa
Plot risk factor effects from rating_table() results
Description
Create a ggplot visualisation of a rating_table object produced by
rating_table(). Estimates are plotted per risk factor, with optional
exposure bars. Observed portfolio experience can be added first with
add_observed_experience().
When observed experience is attached, it is plotted as an additional line.
The scaling is controlled by add_observed_experience().
Usage
## S3 method for class 'rating_table'
autoplot(
object,
risk_factors = NULL,
ncol = 1,
show_exposure_labels = TRUE,
decimal_mark = ",",
y_label = "Relativity",
bar_fill = NULL,
model_color = NULL,
use_linetype = FALSE,
rotate_angle = NULL,
custom_theme = NULL,
remove_underscores = FALSE,
labels = NULL,
dec.mark = NULL,
ylab = NULL,
fill = NULL,
color = NULL,
linetype = NULL,
...
)
Arguments
object |
A |
risk_factors |
Character vector specifying which risk factors to plot. Defaults to all risk factors. |
ncol |
Number of columns in the patchwork layout. Default is 1. |
show_exposure_labels |
Logical; if |
decimal_mark |
Character; decimal separator, either |
y_label |
Character; label for the y-axis. Default is |
bar_fill |
Fill color for the exposure bars. If |
model_color |
Optional override for model line colors. If |
use_linetype |
Logical; if |
rotate_angle |
Numeric value for angle of labels on the x-axis (degrees). |
custom_theme |
List with customised theme options. |
remove_underscores |
Logical; remove underscores from labels. |
labels |
Deprecated alias for |
dec.mark |
Deprecated alias for |
ylab |
Deprecated alias for |
fill |
Deprecated alias for |
color |
Deprecated alias for |
linetype |
Deprecated alias for |
... |
Additional arguments passed to ggplot2 layers. |
Value
A ggplot/patchwork object.
Autoplot for GAM objects from risk_factor_gam()
Description
Generates a ggplot2 visualization of a fitted GAM created with
risk_factor_gam(). The plot shows the fitted curve, and optionally confidence
intervals and observed data points.
Usage
## S3 method for class 'riskfactor_gam'
autoplot(
object,
confidence = FALSE,
color_gam = "steelblue",
show_observations = FALSE,
x_stepsize = NULL,
size_points = 1,
color_points = "black",
rotate_labels = FALSE,
remove_outliers = NULL,
conf_int = NULL,
...
)
Arguments
object |
An object of class |
confidence |
Logical. If |
color_gam |
Color for the fitted GAM line, specified by name (e.g.,
|
show_observations |
Logical. If |
x_stepsize |
Numeric. Step size for tick marks on the x-axis. If
|
size_points |
Numeric. Point size for observed data. Default is |
color_points |
Color for the observed data points. Default is |
rotate_labels |
Logical. If |
remove_outliers |
Numeric. If specified, observations greater than this threshold are omitted from the plot. |
conf_int |
Deprecated. Use |
... |
Additional arguments passed to underlying |
Value
A ggplot object representing the fitted GAM.
Author(s)
Martin Haringa
Examples
## Not run:
library(ggplot2)
fit <- risk_factor_gam(MTPL,
risk_factor = "age_policyholder",
claim_count = "nclaims",
exposure = "exposure")
autoplot(fit, show_observations = TRUE)
## End(Not run)
Autoplot for tariff segment objects
Description
autoplot() method for objects created by derive_tariff_segments().
Produces a ggplot2::ggplot() of the fitted GAM together with the derived
tariff segment boundaries. Optionally, confidence intervals and observed data
points
can be added.
Usage
## S3 method for class 'tariff_segments'
autoplot(
object,
confidence = FALSE,
color_gam = "steelblue",
show_observations = FALSE,
color_splits = "grey50",
size_points = 1,
color_points = "black",
rotate_labels = FALSE,
remove_outliers = NULL,
conf_int = NULL,
...
)
Arguments
object |
An object of class |
confidence |
Logical, whether to plot 95% confidence intervals.
Default = |
color_gam |
Color of the fitted GAM line. Default = |
show_observations |
Logical, whether to add observed data points for each
level of the risk factor. Default = |
color_splits |
Color of the vertical split lines. Default = |
size_points |
Numeric, size of points if |
color_points |
Color of observed points. Default = |
rotate_labels |
Logical, whether to rotate x-axis labels by 45 degrees.
Default = |
remove_outliers |
Numeric, exclude observations above this value from
the plot (helps with extreme outliers). Default = |
conf_int |
Deprecated. Use |
... |
Additional arguments passed to |
Value
A ggplot2::ggplot object.
Author(s)
Martin Haringa
Plot a fitted truncated severity distribution
Description
Creates a plot of the empirical cumulative distribution function (ECDF) of the observed truncated claim amounts together with the fitted truncated CDF.
Usage
## S3 method for class 'truncated_severity'
autoplot(
object,
ecdf_geom = c("point", "step"),
x_label = NULL,
y_label = NULL,
y_limits = c(0, 1),
x_limits = NULL,
show_title = TRUE,
digits = 2,
truncation_digits = 2,
geom_ecdf = NULL,
xlab = NULL,
ylab = NULL,
ylim = NULL,
xlim = NULL,
print_title = NULL,
print_dig = NULL,
print_trunc = NULL,
...
)
Arguments
object |
An object produced by |
ecdf_geom |
Character string indicating how to display the empirical
CDF. Must be one of |
x_label |
Title of the x axis. Defaults to |
y_label |
Title of the y axis. Defaults to |
y_limits |
Numeric vector of length 2 specifying y-axis limits. |
x_limits |
Optional numeric vector of length 2 specifying x-axis limits. |
show_title |
Logical. If |
digits |
Integer. Number of digits for parameter estimates in the subtitle. |
truncation_digits |
Integer. Number of digits used for truncation bounds. |
geom_ecdf, xlab, ylab, ylim, xlim, print_title, print_dig, print_trunc |
Deprecated argument names kept for backward compatibility. |
... |
Currently unused. |
Details
The plot compares the empirical distribution of the observed, truncated claim severities with the fitted distribution conditional on the same truncation interval. This is a visual check of whether the selected severity distribution is plausible for the part of the portfolio that is actually observed.
Value
A ggplot2 object.
Author(s)
Martin Haringa
Deprecated alias for set_reference_level()
Description
biggest_reference() is deprecated as of version 0.9.0. Use
set_reference_level() instead.
Usage
biggest_reference(x, weight)
Arguments
x |
A factor. |
weight |
A numeric vector of the same length as |
Value
Bootstrapped model performance
Description
Generate repeated train/evaluation samples to compute model performance. Currently, the supported metric is root mean squared error (RMSE).
Usage
bootstrap_performance(
model,
data,
n_resamples = 50,
sample_fraction = 1,
metric = "rmse",
sampling = c("bootstrap", "split"),
show_progress = TRUE,
rmse_model = NULL,
n = NULL,
frac = NULL
)
Arguments
model |
A fitted model object. |
data |
Data used to fit the model object. |
n_resamples |
Integer. Number of resampling replicates. Default = 50. |
sample_fraction |
Fraction of the data used in the training sample. Must
be in |
metric |
Character. Performance metric to compute. Currently only
|
sampling |
Character. Sampling scheme. |
show_progress |
Logical. Show progress bar during bootstrap iterations. Default = TRUE. |
rmse_model |
Optional numeric RMSE of the fitted (original) model. If NULL (default), it is computed automatically. |
n, frac |
Deprecated argument names. Use |
Details
To test the predictive stability of a fitted model it can be helpful to assess the variation in a performance metric. The variation is calculated by refitting the model on repeated samples and storing the resulting metric values.
If
sample_fraction = 1, the metric is evaluated on the sampled training data.If
sample_fraction < 1, the metric is evaluated on rows that were not used for training.
Character columns and factor columns are converted to factors with levels taken from the full input data before resampling. For factor variables used in the model, the training sample is augmented when needed so every observed level is represented at least once. This prevents prediction failures when a level is present in the evaluation data but absent from a particular training sample.
Value
An object of class "bootstrap_performance", which is a list with components:
- rmse_bs
Numeric vector with
n_resamplesbootstrap RMSE values.- rmse_mod
Root mean squared error for the original fitted model.
- metric
Metric name.
- sampling
Sampling scheme.
Author(s)
Martin Haringa
Examples
## Not run:
mod1 <- glm(nclaims ~ age_policyholder, data = MTPL,
offset = log(exposure), family = poisson())
# Use all records
x <- bootstrap_performance(mod1, MTPL, n_resamples = 80,
show_progress = FALSE)
print(x)
autoplot(x)
# Use 80% of records and evaluate on the remaining records
x_frac <- bootstrap_performance(mod1, MTPL, n_resamples = 50,
sample_fraction = .8, sampling = "split",
show_progress = FALSE)
autoplot(x_frac)
## End(Not run)
Deprecated alias for bootstrap_performance()
Description
bootstrap_rmse() is deprecated in favour of bootstrap_performance().
Objects returned by bootstrap_rmse() keep class "bootstrap_rmse" for
backward compatibility and also inherit from "bootstrap_performance".
Usage
bootstrap_rmse(
model,
data,
n = 50,
frac = 1,
metric = "rmse",
sampling = c("bootstrap", "split"),
show_progress = TRUE,
rmse_model = NULL
)
Arguments
model |
A fitted model object. |
data |
Data used to fit the model object. |
n |
Deprecated. Use |
frac |
Deprecated. Use |
metric |
Character. Performance metric to compute. Currently only
|
sampling |
Character. Sampling scheme. |
show_progress |
Logical. Show progress bar during bootstrap iterations. Default = TRUE. |
rmse_model |
Optional numeric RMSE of the fitted (original) model. If NULL (default), it is computed automatically. |
Value
Decompose claim amounts into capped and excess parts
Description
Large claims can distort risk-factor relativities and make pricing models
unstable. calculate_excess_loss() separates each claim into a capped part
and an excess part above a selected threshold.
Usage
calculate_excess_loss(data, claim_amount, threshold)
Arguments
data |
A data.frame with claim-level observations. |
claim_amount |
Character string. Claim amount column. |
threshold |
Positive numeric scalar. Claims above this value contribute to the excess component. Claims below the threshold remain fully included in the capped claim amount. |
Details
The capped claim amount can be used to model the base premium, while the excess component can be analysed, pooled or allocated separately. This allows the impact of large individual claims to be controlled without ignoring the associated cost.
The function is deliberately deterministic. It does not perform smoothing, credibility weighting, allocation or simulation. It simply decomposes each observed claim into:
claim\_amount =
capped\_claim\_amount +
excess\_claim\_amount
where:
excess\_claim\_amount =
max(claim\_amount - threshold, 0)
and:
capped\_claim\_amount =
min(claim\_amount, threshold)
The resulting excess component can subsequently be allocated using
allocate_excess_loss() and added back to the technical premium using
apply_excess_loading().
Typical pricing workflow
A common workflow is:
Select an excess threshold.
Split claims into capped and excess components.
Model frequency and severity using capped claim amounts.
Allocate the excess-loss burden separately.
Add the resulting excess loading back to the technical premium.
This approach reduces the influence of a small number of large claims on risk-factor relativities while ensuring that the total cost of excess losses remains reflected in the final premium.
Value
A data.frame with the original data and the columns
claim_amount, capped_claim_amount, excess_claim_amount and
is_excess_claim.
Author(s)
Martin Haringa
Examples
claims <- data.frame(
claim_amount = c(1000, 120000, 30000)
)
calculate_excess_loss(
claims,
claim_amount = "claim_amount",
threshold = 100000
)
Check overdispersion of a Poisson claim frequency model
Description
Tests whether a fitted Poisson GLM shows overdispersion using Pearson's chi-squared statistic.
Usage
check_overdispersion(object)
Arguments
object |
A fitted model of class |
Details
In Poisson claim frequency models, the variance is assumed to be equal to the mean. A dispersion ratio above 1 indicates that the observed variation is larger than expected under that assumption. In pricing work this can be a useful diagnostic signal for omitted heterogeneity, clustering, outliers, or model misspecification. It does not automatically mean that the model is unusable.
A dispersion ratio close to 1 is broadly consistent with the Poisson variance assumption.
A dispersion ratio above 1 suggests overdispersion.
A p-value below 0.05 indicates statistically significant overdispersion.
Value
An object of class "overdispersion_check" and "overdispersion",
which is a list with elements:
- pearson_chisq
Pearson's chi-squared statistic.
- dispersion_ratio
Dispersion ratio, calculated as Pearson's chi-squared statistic divided by residual degrees of freedom.
- residual_df
Residual degrees of freedom.
- p_value
P-value from the chi-squared test.
For backwards compatibility the object also contains the aliases chisq,
ratio, rdf, and p.
Author(s)
Martin Haringa
References
Bolker B. et al. (2017). GLMM FAQ
See also: performance::check_overdispersion().
Examples
x <- glm(nclaims ~ area, offset = log(exposure),
family = poisson(), data = MTPL2)
check_overdispersion(x)
Check simulation-based model residuals
Description
Checks whether a fitted model shows systematic residual deviations from the
distribution implied by the model. The function uses simulation-based
residuals from DHARMa::simulateResiduals(), which are especially useful for
GLMs where classical residual plots can be hard to interpret.
Usage
check_residuals(object, n_simulations = 30)
Arguments
object |
A fitted |
n_simulations |
Number of simulations used to generate residuals. Must be a positive whole number. Default is 30. |
Details
In insurance pricing, residual checks are used to assess whether a model is behaving consistently across the portfolio. For example, a Poisson frequency model may fit the average claim count well but still show structure in the residuals because of omitted rating factors, unmodelled heterogeneity, clustering, outliers, or an unsuitable distributional assumption.
DHARMa simulates new responses from the fitted model and compares the
observed response with those simulations. The resulting scaled residuals are
approximately uniformly distributed on [0, 1] when the model is correctly
specified. This gives a common diagnostic scale for GLMs and related models,
where raw residuals are otherwise difficult to compare across different
fitted values, exposures, or expected claim amounts.
check_residuals() returns the scaled residuals, QQ-plot data, and a
Kolmogorov-Smirnov p-value for a simple uniformity check. The p-value should
be read as a diagnostic signal, not as a pricing decision rule. A low p-value
indicates that the residual distribution differs from what the fitted model
implies and that the model specification may need review.
Value
An object of class "residual_check" and "check_residuals",
which is a list with:
- qq_data
Data frame with theoretical quantiles (
x) and observed scaled residuals (y).- scaled_residuals
Numeric vector of DHARMa scaled residuals.
- p_value
P-value from a Kolmogorov-Smirnov test against
uniform(0, 1).
For backwards compatibility the object also contains the aliases df and
p.val.
Author(s)
Martin Haringa
References
Dunn, K. P., & Smyth, G. K. (1996). Randomized quantile residuals. JCGS, 5, 1–10.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Hartig, F. (2020). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.3.0. https://CRAN.R-project.org/package=DHARMa
Examples
## Not run:
m1 <- glm(nclaims ~ area, offset = log(exposure),
family = poisson(), data = MTPL2)
cr <- check_residuals(m1, n_simulations = 50)
autoplot(cr)
## End(Not run)
Deprecated alias for rating_grid()
Description
construct_model_points() is deprecated in favour of rating_grid().
Usage
construct_model_points(
x,
group_by = NULL,
exposure = NULL,
exposure_by = NULL,
aggregate_cols = NULL,
drop_na = FALSE,
group_vars = NULL,
agg_cols = NULL
)
Arguments
x |
A |
group_by |
Optional character vector with the variables that define the
rating-grid points. If |
exposure |
Optional character; name of the exposure column to aggregate. |
exposure_by |
Optional character; name of a column used to split exposure or counts, for example a year variable. |
aggregate_cols |
Optional character vector with additional numeric
columns to aggregate using |
drop_na |
Logical; if |
group_vars, agg_cols |
Deprecated argument names. Use |
Value
See rating_grid().
Deprecated alias for derive_tariff_segments()
Description
construct_tariff_classes() is deprecated as of version 0.9.0. Use
derive_tariff_segments() instead.
Usage
construct_tariff_classes(
object,
complexity = 0,
max_iterations = 10000,
population_size = 200,
seed = 1,
alpha = NULL,
niterations = NULL,
ntrees = NULL
)
Arguments
object |
An object of class |
complexity |
Numeric. Controls the complexity penalty used when deriving segments. Higher values generally yield fewer tariff segments. Default = 0. |
max_iterations |
Integer. Maximum number of search iterations used by the underlying grouping algorithm. Default = 10000. |
population_size |
Integer. Number of candidate trees used by the underlying grouping algorithm. Default = 200. |
seed |
Integer, seed for the random number generator (for reproducibility). |
alpha |
Deprecated. Use |
niterations |
Deprecated. Use |
ntrees |
Deprecated. Use |
Value
Default extrapolation break size based on existing tariff breaks
Description
Uses the median width of existing break intervals as a robust scale-aware default for extrapolation discretisation.
Usage
default_extrapolation_break_size(borders_model, factor = 1)
Arguments
borders_model |
A data.frame with columns |
factor |
Numeric scalar > 0. Multiplier applied to the median break width. |
Value
Numeric scalar (> 0).
Derive insurance tariff segments
Description
Derives data-driven tariff segments for a continuous risk factor from a fitted
"riskfactor_gam" object produced by risk_factor_gam(). The segments help
translate a smooth GAM response pattern into practical categorical rating
factors for a GLM tariff.
Usage
derive_tariff_segments(
object,
complexity = 0,
max_iterations = 10000,
population_size = 200,
seed = 1,
alpha = NULL,
niterations = NULL,
ntrees = NULL
)
Arguments
object |
An object of class |
complexity |
Numeric. Controls the complexity penalty used when deriving segments. Higher values generally yield fewer tariff segments. Default = 0. |
max_iterations |
Integer. Maximum number of search iterations used by the underlying grouping algorithm. Default = 10000. |
population_size |
Integer. Number of candidate trees used by the underlying grouping algorithm. Default = 200. |
seed |
Integer, seed for the random number generator (for reproducibility). |
alpha |
Deprecated. Use |
niterations |
Deprecated. Use |
ntrees |
Deprecated. Use |
Details
Evolutionary trees (via evtree::evtree()) are used as a technique to bin the
fitted GAM object into candidate tariff segments.
This method is based on the work by Henckaerts et al. (2018).
See Grubinger et al. (2014) for details on the parameters controlling the
evtree fit.
Value
A list of class "tariff_segments" with components:
- gam_prediction
Data frame with the fitted GAM curve.
- risk_factor
Name of the continuous risk factor.
- model_type
Model type:
"frequency","severity", or"pure_premium".- classification_data
Data frame used to derive the segments.
- risk_factor_values
Observed risk factor values in portfolio row order.
- segment_boundaries
Numeric vector with segment boundaries.
- assigned_segments
Factor with the tariff segment assigned to each observed risk factor value.
For backward compatibility, the old components prediction, x, model,
data, x_obs, splits, class_boundaries, assigned_groups, and
tariff_classes are also returned.
Author(s)
Martin Haringa
References
Antonio, K. and Valdez, E. A. (2012). Statistical concepts of a priori and a posteriori risk classification in insurance. Advances in Statistical Analysis, 96(2), 187–224. doi:10.1007/s10182-011-0152-7
Grubinger, T., Zeileis, A., and Pfeiffer, K.-P. (2014). evtree: Evolutionary learning of globally optimal classification and regression trees in R. Journal of Statistical Software, 61(1), 1–29. doi:10.18637/jss.v061.i01
Henckaerts, R., Antonio, K., Clijsters, M., & Verbelen, R. (2018). A data driven binning strategy for the construction of insurance tariff classes. Scandinavian Actuarial Journal, 2018(8), 681–705. doi:10.1080/03461238.2018.1429300
Wood, S.N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. JRSS B, 73(1), 3–36. doi:10.1111/j.1467-9868.2010.00749.x
Examples
## Not run:
library(dplyr)
# Recommended new usage (SE)
age_segments <- risk_factor_gam(MTPL,
risk_factor = "age_policyholder",
claim_count = "nclaims",
exposure = "exposure") |>
derive_tariff_segments()
MTPL |>
add_tariff_segments(age_segments, name = "age_policyholder_segment")
## End(Not run)
Edit an existing smoothing step in a refinement workflow
Description
Manually adjusts a smoothing step that was previously added with
add_smoothing(). This is intended for actuarial review of a smoothed tariff
curve, for example to flatten an unstable segment, align the end points of an
interval, or add extra control points where expert judgement should guide the
curve.
The adjusted smoothing is applied when refit() is called.
Usage
edit_smoothing(
model,
model_variable = NULL,
step = NULL,
from,
to,
from_value = NULL,
to_value = NULL,
control_positions = NULL,
control_values = NULL,
allow_extrapolation = FALSE,
extrapolation_step = NULL
)
Arguments
model |
Object of class |
model_variable |
Character string. The |
step |
Optional numeric index of the smoothing step to edit. |
from, to |
Numeric values giving the start and end of the source-variable interval to modify. |
from_value, to_value |
Optional numeric values used to override the
smoothed curve value at |
control_positions, control_values |
Optional numeric vectors of equal length. These define additional points that the edited smoothing curve should pass through. |
allow_extrapolation |
Logical. Whether edits may extend beyond the observed source-variable range. |
extrapolation_step |
Optional positive numeric scalar used to set the spacing of extra break points when extrapolation is allowed. |
Details
Use model_variable or step to identify the smoothing step to edit. The
interval from from to to defines the part of the source variable range
that should be changed. from_value and to_value can be used to force the
curve values at the interval boundaries. control_positions and
control_values add additional points that the edited curve should follow
inside the interval.
Value
Object of class rating_refinement.
Author(s)
Martin Haringa
Examples
set.seed(42)
driver_age <- rep(seq(20, 59), each = 4)
exposure <- rep(1, length(driver_age))
age_band <- cut(
driver_age,
breaks = c(18, 30, 40, 50, 60),
include.lowest = TRUE
)
expected_claims <- exp(
-1.7 + 0.018 * (driver_age - 20) + 0.0006 * (driver_age - 40)^2
)
portfolio <- data.frame(
claims = rpois(length(driver_age), exposure * expected_claims),
exposure = exposure,
driver_age = driver_age,
age_band = age_band
)
model <- glm(
claims ~ age_band + offset(log(exposure)),
family = poisson(),
data = portfolio
)
refined <- prepare_refinement(model, data = portfolio) |>
add_smoothing(
model_variable = "age_band",
source_variable = "driver_age",
breaks = c(18, 30, 40, 50, 60),
degree = 2,
weights = "exposure"
) |>
edit_smoothing(
model_variable = "age_band",
from = 30,
to = 50,
from_value = 1.00,
to_value = 1.10,
control_positions = c(40),
control_values = c(1.05)
)
refined_model <- refit(refined)
Extract model data
Description
extract_model_data() retrieves the modelling data and metadata from fitted
models. It works for objects of class "glm", as well as objects produced by
refitting procedures ("refitsmooth" or "refitrestricted").
model_data() is kept as a deprecated compatibility wrapper.
Usage
extract_model_data(x)
Arguments
x |
An object of class |
Details
For GLM objects, the function returns the model data and attaches attributes with the response, rating factors, terms object, and any weights or offsets.
For refit objects, the function removes auxiliary columns used during smoothing or restriction and attaches attributes with rating factors, merged smooths, restrictions, and offsets.
Value
A data.frame of class "model_data" with additional attributes:
-
response— response variable in the model; -
rf— names of risk factors in the model; -
offweights— weight and offset variables if present; -
terms— model terms object for plain GLMs; -
mgd_rst,mgd_smt— merged restrictions/smooths for refit objects; -
new_nm,old_nm— new and old column names for refit objects.
Author(s)
Martin Haringa
Examples
## Not run:
library(insurancerating)
pmodel <- glm(
breaks ~ wool + tension,
data = warpbreaks,
family = poisson(link = "log")
)
extract_model_data(pmodel)
## End(Not run)
Factor analysis for discrete risk factors
Description
Performs a factor analysis for discrete risk factors in an insurance portfolio. The following summary statistics are calculated:
frequency = number of claims / exposure
average severity = severity / number of claims
risk premium = severity / exposure
loss ratio = severity / premium
average premium = premium / exposure
Usage
factor_analysis(
data = NULL,
risk_factors = NULL,
claim_amount = NULL,
claim_count = NULL,
exposure = NULL,
premium = NULL,
group_by = NULL,
df = NULL,
x = NULL,
severity = NULL,
nclaims = NULL,
by = NULL
)
Arguments
data |
A |
risk_factors |
Character vector: column(s) in |
claim_amount |
Character, column in |
claim_count |
Character, column in |
exposure |
Character, column in |
premium |
Character, column in |
group_by |
Character vector of column(s) in |
df, x, severity, nclaims, by |
Deprecated argument names. Use |
Details
The function computes summary statistics for discrete risk factors.
-
Frequency: number of claims / exposure
-
Average severity: severity / number of claims
-
Risk premium: severity / exposure
-
Loss ratio: severity / premium
-
Average premium: premium / exposure
If one or more input arguments are not specified, the related statistics are omitted from the results.
Migration from univariate()
The function univariate() is deprecated as of version 0.8.0 and replaced by
factor_analysis(). In addition to the name change, the interface has also
changed:
-
univariate()used non-standard evaluation (NSE), so column names could be passed unquoted (e.g.x = area). -
factor_analysis()uses standard evaluation (SE), so column names must be passed as character strings (e.g.x = "area").
This makes the function easier to use in programmatic workflows.
univariate() is still available for backward compatibility but will emit a
deprecation warning and will be removed in a future release.
Value
An object of class "factor_analysis" and "univariate" with
summary statistics.
Author(s)
Martin Haringa
Examples
## --- New usage (SE) ---
factor_analysis(MTPL2,
risk_factors = "area",
claim_amount = "amount",
claim_count = "nclaims",
exposure = "exposure",
premium = "premium")
## --- Deprecated usage (NSE) ---
univariate(MTPL2,
x = area,
severity = amount,
nclaims = nclaims,
exposure = exposure,
premium = premium)
Deprecated alias for fisher_classify()
Description
fisher() is deprecated as of version 0.8.0.
Usage
fisher(x, n = 7, diglab = 2)
Arguments
x |
Numeric vector to classify. |
n |
Number of classes. |
diglab |
Deprecated. Use |
Value
See fisher_classify().
Fisher's natural breaks classification
Description
fisher_classify() is deprecated as of version 0.8.0 because Fisher-Jenks
classification is not directly linked to the insurance rating workflow.
Classifies a continuous numeric vector into intervals using Fisher-Jenks natural breaks. Useful for choropleth mapping or other applications where grouped ranges are required.
Usage
fisher_classify(x, n = 7, dig.lab = NULL, diglab = NULL)
Arguments
x |
A numeric vector to be classified. |
n |
Integer. Number of classes to generate (default = 7). |
dig.lab |
Integer. Number of significant digits to use for interval labels (default = 2). |
diglab |
Deprecated. Use |
Details
The "fisher" style uses the algorithm proposed by Fisher (1958), commonly
referred to as the Fisher-Jenks algorithm. This function is a thin wrapper
around classInt::classIntervals().
The argument diglab is deprecated and will be removed in a future version.
Value
A factor indicating the interval to which each element of x
belongs.
Author(s)
Martin Haringa
References
Bivand, R. (2018). classInt: Choose Univariate Class Intervals. R package version 0.2-3. https://CRAN.R-project.org/package=classInt
Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, pp. 789–798. doi:10.1080/01621459.1958.10501479
Examples
set.seed(1)
x <- rnorm(100)
fisher_classify(x, n = 5)
Deprecated NSE wrapper for risk_factor_gam()
Description
fit_gam() is deprecated as of version 0.8.0.
Please use risk_factor_gam() instead.
In addition, note that column arguments must now be passed as strings (standard evaluation).
Usage
fit_gam(
data,
nclaims,
x,
exposure,
amount = NULL,
pure_premium = NULL,
model = "frequency",
round_x = NULL
)
Arguments
data |
A data.frame containing the insurance portfolio. |
nclaims |
Deprecated NSE argument for claim counts. |
x |
Deprecated NSE argument for the continuous risk factor. |
exposure |
Character, name of column in |
amount |
Deprecated NSE argument for claim amounts. |
pure_premium |
(Optional) Character, column name in |
model |
Character string specifying the model type. One of
|
round_x |
Deprecated. Use |
Value
See risk_factor_gam().
Deprecated alias for fit_truncated_severity()
Description
fit_truncated_dist() is deprecated as of version 0.9.0. Use
fit_truncated_severity() instead.
Usage
fit_truncated_dist(
losses = NULL,
distribution = c("gamma", "lognormal"),
lower_truncation = NULL,
upper_truncation = NULL,
start_values = NULL,
print_initial = TRUE,
n_variants = 1,
n_shape_grid = 8,
n_scale_grid = 8,
show_progress = FALSE,
show_summary = TRUE,
y = NULL,
dist = NULL,
left = NULL,
right = NULL,
start = NULL,
trace = NULL,
report = NULL
)
Arguments
losses |
Numeric vector with observed claim severities. |
distribution |
Severity distribution to fit: |
lower_truncation |
Numeric lower truncation point. Claims at or below
this value are assumed not to be present in |
upper_truncation |
Numeric upper truncation point. Claims at or above
this value are assumed not to be present in |
start_values |
Optional named list of starting values. If |
print_initial |
Deprecated logical retained for backward compatibility. |
n_variants |
Controls how many local variations around base starts are used. |
n_shape_grid |
Number of grid points for gamma shape. |
n_scale_grid |
Number of grid points for gamma scale. |
show_progress |
Logical. If |
show_summary |
Logical. If |
y, dist, left, right, start, trace, report |
Deprecated argument names kept for backward compatibility. |
Value
Fit severity distributions to truncated claim data
Description
Estimate an underlying claim severity distribution when the observed claims
are truncated.
Usage
fit_truncated_severity(
losses = NULL,
distribution = c("gamma", "lognormal"),
lower_truncation = NULL,
upper_truncation = NULL,
start_values = NULL,
print_initial = TRUE,
n_variants = 1,
n_shape_grid = 8,
n_scale_grid = 8,
show_progress = FALSE,
show_summary = TRUE,
y = NULL,
dist = NULL,
left = NULL,
right = NULL,
start = NULL,
trace = NULL,
report = NULL
)
Arguments
losses |
Numeric vector with observed claim severities. |
distribution |
Severity distribution to fit: |
lower_truncation |
Numeric lower truncation point. Claims at or below
this value are assumed not to be present in |
upper_truncation |
Numeric upper truncation point. Claims at or above
this value are assumed not to be present in |
start_values |
Optional named list of starting values. If |
print_initial |
Deprecated logical retained for backward compatibility. |
n_variants |
Controls how many local variations around base starts are used. |
n_shape_grid |
Number of grid points for gamma shape. |
n_scale_grid |
Number of grid points for gamma scale. |
show_progress |
Logical. If |
show_summary |
Logical. If |
y, dist, left, right, start, trace, report |
Deprecated argument names kept for backward compatibility. |
Details
In insurance pricing, severity models are often fitted on claim amounts that are not observed over the full range of possible losses. Small claims may be absent because of a deductible, reporting threshold, or data extraction rule. Very large claims may be capped, excluded, or modelled separately as large losses. A standard gamma or lognormal fit on the remaining observed claims treats that truncated sample as if it were complete, which can bias the estimated severity distribution.
fit_truncated_severity() fits the distribution conditional on the claim being
observed within the truncation interval. This means the fitted likelihood
uses the density divided by the probability mass between lower_truncation
and upper_truncation. The function is intended for truncation, where
claims outside the interval are absent from the data. This differs from
censoring, where claims outside a limit are still observed but their exact
amount is not known.
Observed losses must lie strictly inside the truncation interval. Values outside the interval indicate that the bounds do not describe the data and therefore produce an error.
Value
An object of class c("truncated_severity", "truncated_dist", "fitdist"). The object
contains the fitted distribution parameters from fitdistrplus::fitdist()
and additional attributes:
- truncated_vec
The observed losses used for fitting.
- lower_truncation, upper_truncation
The truncation bounds.
- fit_attempts
Metadata for each attempted start combination.
- n_attempts, n_success, n_failed
Fit attempt counts.
- best_attempt_index
Index of the selected start combination.
Examples
## Not run:
observed <- MTPL2$amount[MTPL2$amount > 500 & MTPL2$amount < 10000]
fit <- fit_truncated_severity(
losses = observed,
distribution = "gamma",
lower_truncation = 500,
upper_truncation = 10000
)
autoplot(fit)
## End(Not run)
Deprecated alias for outlier_histogram()
Description
histbin() is deprecated as of version 0.8.0.
Please use outlier_histogram() instead.
In addition, note that x must now be passed as string
(standard evaluation).
Usage
histbin(
data,
x,
left = NULL,
right = NULL,
line = FALSE,
bins = 30,
fill = "#E6E6E6",
color = "white",
fill_outliers = "#F28E2B"
)
Arguments
data |
A data.frame containing the portfolio variable to inspect. |
x |
Character; numeric column in |
left, right |
Deprecated aliases for |
line |
Deprecated alias for |
bins |
Integer. Number of bins used for the displayed range. Default = 30. |
fill, color, fill_outliers |
Deprecated aliases for |
Value
See outlier_histogram().
Convert p-values into significance stars
Description
Convert p-values into significance stars
Usage
make_stars(pval)
Arguments
pval |
Numeric vector of p-values. |
Value
Character vector of the same length as pval,
containing "***", "**", "*", ".", or "".
Reduce portfolio periods by merging adjacent date ranges
Description
Merges overlapping or nearly adjacent policy periods within portfolio groups.
Usage
merge_date_ranges(
df,
...,
period_start = NULL,
period_end = NULL,
group_by = NULL,
aggregate_cols = NULL,
aggregate_fun = "sum",
merge_gap_days = 5,
begin = NULL,
end = NULL,
agg_cols = NULL,
agg = NULL,
min.gapwidth = NULL
)
Arguments
df |
A |
period_start |
Character string. Name of the column with period start dates. |
period_end |
Character string. Name of the column with period end dates. |
group_by |
Character vector with columns that identify the portfolio entity or rating segment within which date ranges should be merged. |
aggregate_cols |
Character vector with numeric columns to aggregate over merged ranges, for example premium or exposure. |
aggregate_fun |
Aggregation function or function name. Defaults to
|
merge_gap_days |
Non-negative whole number. Ranges with a gap smaller than this number of days are merged. Defaults to 5. |
begin, end, ..., agg_cols, agg, min.gapwidth |
Deprecated NSE argument names kept for backward compatibility. |
Details
Insurance portfolio extracts often contain multiple rows for the same policy or risk because of renewals, endorsements, product changes, or short administrative gaps. Before calculating portfolio in/outflow, active exposure windows, or policy counts, it can be useful to reduce those rows to stable coverage intervals.
merge_date_ranges() merges date ranges within each group_by combination.
Ranges with a gap smaller than merge_gap_days are treated as one continuous
interval. If aggregate_cols is supplied, those columns are aggregated over
the merged interval.
Value
A data.table of class "reduce", with attributes:
-
begin— name of the period-start column -
end— name of the period-end column -
cols— grouping columns
Author(s)
Martin Haringa
Examples
portfolio <- data.frame(
policy_nr = rep("12345", 11),
productgroup= rep("fire", 11),
product = rep("contents", 11),
begin_dat = as.Date(c(16709,16740,16801,17410,17440,17805,17897,
17956,17987,18017,18262), origin="1970-01-01"),
end_dat = as.Date(c(16739,16800,16831,17439,17531,17896,17955,
17986,18016,18261,18292), origin="1970-01-01"),
premium = c(89,58,83,73,69,94,91,97,57,65,55)
)
# Merge periods
pt1 <- merge_date_ranges(
portfolio,
period_start = "begin_dat",
period_end = "end_dat",
group_by = c("policy_nr", "productgroup", "product"),
merge_gap_days = 5
)
# Aggregate per period
summary(pt1, period = "days", policy_nr, productgroup, product)
# Merge periods and sum premium per period
pt2 <- merge_date_ranges(
portfolio,
period_start = "begin_dat",
period_end = "end_dat",
group_by = c("policy_nr", "productgroup", "product"),
aggregate_cols = "premium",
merge_gap_days = 5
)
# Create summary with aggregation per week
summary(pt2, period = "weeks", policy_nr, productgroup, product)
Deprecated alias for extract_model_data()
Description
model_data() is deprecated in favour of extract_model_data().
Usage
model_data(x)
Arguments
x |
An object of class |
Value
See extract_model_data().
Performance of fitted GLMs
Description
Computes model performance indices for one or more fitted GLMs.
Usage
model_performance(...)
Arguments
... |
One or more objects of class |
Details
The following indices are reported:
- AIC
Akaike's Information Criterion.
- BIC
Bayesian Information Criterion.
- RMSE
Root mean squared error, computed from observed and predicted values.
This function is adapted from performance::model_performance().
Value
A data frame of class "model_performance", with columns:
- Model
Name of the model object as passed to the function.
- AIC
AIC value.
- BIC
BIC value.
- RMSE
Root mean squared error.
Author(s)
Martin Haringa
Examples
m1 <- glm(nclaims ~ area, offset = log(exposure), family = poisson(),
data = MTPL2)
m2 <- glm(nclaims ~ area + premium, offset = log(exposure), family = poisson(),
data = MTPL2)
model_performance(m1, m2)
Portfolio histogram with tail bins
Description
Visualize the distribution of a numeric portfolio variable while keeping extreme tails readable.
Insurance portfolios often contain skewed variables such as claim amounts,
premium, exposure, insured sums, deductibles, or fitted premiums. A few very
large policies or claim events can stretch a regular histogram so much that
the body of the portfolio becomes hard to inspect. outlier_histogram()
keeps the main range visible and groups values below lower or above upper
into dedicated tail bins.
The plot is useful for actuarial portfolio checks, data quality review, and model preparation: it helps show where most risks are concentrated while still making the presence of extreme observations explicit.
Usage
outlier_histogram(
data,
x,
lower = NULL,
upper = NULL,
density = FALSE,
bins = 30,
bar_fill = "#E6E6E6",
bar_color = "white",
tail_fill = "#F28E2B",
tail_color = "white",
density_color = "#2C7FB8",
left = NULL,
right = NULL,
line = NULL,
fill = NULL,
color = NULL,
fill_outliers = NULL
)
Arguments
data |
A data.frame containing the portfolio variable to inspect. |
x |
Character; numeric column in |
lower |
Optional numeric lower threshold. Values below this threshold are grouped into one left-tail bin. |
upper |
Optional numeric upper threshold. Values above this threshold are grouped into one right-tail bin. |
density |
Logical. If |
bins |
Integer. Number of bins used for the displayed range. Default = 30. |
bar_fill |
Fill color for regular histogram bars. |
bar_color |
Border color for regular histogram bars. |
tail_fill |
Fill color for tail bins. |
tail_color |
Border color for tail bins. |
density_color |
Color for the optional density line. |
left, right |
Deprecated aliases for |
line |
Deprecated alias for |
fill, color, fill_outliers |
Deprecated aliases for |
Details
This function is intended as an exploratory portfolio diagnostic. It does not
remove or winsorize observations in data; it only groups tail values in the
visual display. The labels on the tail bins show the original range captured
by each tail bin.
The method for handling outlier bins is based on https://edwinth.github.io/blog/outlier-bin/.
Value
A ggplot2::ggplot object.
Author(s)
Martin Haringa
Examples
# Inspect the full premium distribution
outlier_histogram(MTPL2, "premium")
# Keep the portfolio body readable while showing both tails
outlier_histogram(MTPL2, "premium", lower = 30, upper = 120, bins = 30)
Deprecated alias for split_periods_to_months()
Description
period_to_months() is deprecated as of version 0.8.0. Use
split_periods_to_months() instead.
Usage
period_to_months(df, begin, end, ...)
Arguments
df |
A |
begin |
Deprecated NSE argument. Use |
end |
Deprecated NSE argument. Use |
... |
Deprecated NSE columns to prorate. Use |
Value
See split_periods_to_months().
Exploratory severity diagnostics by category
Description
Visualise individual claim amounts overall or per risk factor.
Average claim amounts can be misleading because a small number of large
losses may dominate the mean. plot_severity_distribution() shows the full
claim amount distribution, usually on a log scale, together with mean and
median claim amount markers. If risk_factor is supplied, the distribution
is shown per level of that risk factor. If risk_factor = NULL, the function
shows the overall claim amount distribution. This makes heavy tails, clusters
of small claims, spread differences, extreme losses and distributional shape
visible in a way that average severity alone cannot.
The function is intended for exploratory severity diagnostics in pricing
analysis, portfolio diagnostics, tariff notes, exploratory segmentation
analysis and severity model validation. It uses standard evaluation: pass
column names as character strings through claim_amount and risk_factor.
If threshold is supplied, claims above the threshold are highlighted in
"firebrick" and a dotted threshold line is added. Claims at or below the
threshold remain light grey. Direct labels for the mean, median and optional
threshold are added with ggrepel when show_labels = TRUE; ggrepel is a
suggested package and is not imported as a hard dependency.
Usage
plot_severity_distribution(
data,
claim_amount,
risk_factor = NULL,
top_n = 10,
min_claims = 20,
sort = c("median", "mean", "n_claims"),
threshold = NULL,
mean = TRUE,
median = TRUE,
distribution = c("none", "half_violin", "violin"),
point_method = c("quasirandom", "jitter", "none"),
orientation = c("horizontal", "vertical"),
log_scale = TRUE,
boxplot = FALSE,
boxplot_width = 0.06,
show_labels = TRUE,
all_claims_label = "All claims",
mean_label = "Mean",
median_label = "Median",
threshold_label = "Threshold",
x_label = NULL,
y_label = NULL,
point_alpha = 0.16,
point_size = 0.75,
point_width = 0.15
)
Arguments
data |
A |
claim_amount |
Character string. Name of the claim amount column. |
risk_factor |
Optional character string. Name of the risk factor used
to split the severity distribution. If |
top_n |
Positive whole number. Number of categories to keep after filtering and sorting. |
min_claims |
Positive whole number. Categories with fewer than this number of claim observations are removed. |
sort |
Character. Metric used to sort and select categories. One of
|
threshold |
Optional numeric scalar. If supplied, claims above this threshold are highlighted and a dotted threshold line is shown. |
mean |
Logical. If |
median |
Logical. If |
distribution |
Character. Distribution layer. One of |
point_method |
Character. Point placement method. One of
|
orientation |
Character. |
log_scale |
Logical. If |
boxplot |
Logical. If |
boxplot_width |
Numeric scalar. Width of the optional boxplot. Smaller values keep the boxplot as a subtle summary layer behind the individual claim points. |
show_labels |
Logical. If |
all_claims_label |
Character string used as the category label when
|
mean_label |
Character string used for the direct mean marker label.
Default is |
median_label |
Character string used for the direct median marker label.
Default is |
threshold_label |
Character string used for the optional threshold label. |
x_label |
Optional character string. X-axis label. If |
y_label |
Optional character string. Y-axis label. If |
point_alpha |
Numeric alpha for raw claim points. |
point_size |
Numeric point size for raw claim points. |
point_width |
Numeric spread for raw claim points. |
Value
A ggplot object. The plot can be extended with regular ggplot2
syntax, for example + ggplot2::labs(caption = "...") or
+ ggplot2::theme(...).
Author(s)
Martin Haringa
Examples
x <- plot_severity_distribution(
MTPL,
claim_amount = "amount",
risk_factor = "zip",
top_n = 4,
min_claims = 20,
point_method = "jitter",
show_labels = FALSE
)
print(x)
x_threshold <- plot_severity_distribution(
MTPL,
claim_amount = "amount",
risk_factor = NULL,
threshold = 10000,
min_claims = 20,
point_method = "jitter",
show_labels = FALSE
)
Prepare a model refinement workflow
Description
Start a refinement workflow for a fitted GLM. Refinement steps such as
smoothing, restrictions and expert-based relativities can be added
sequentially and are only applied once refit() is called.
Usage
prepare_refinement(model, data = NULL)
Arguments
model |
Object of class |
data |
Optional data.frame with the same rows and model variables as the
fitted GLM. If |
Value
Object of class rating_refinement.
Print an excess-loss allocation
Description
Compact print method for objects returned by allocate_excess_loss().
Usage
## S3 method for class 'excess_loss_allocation'
print(x, ...)
Arguments
x |
An object returned by |
... |
Unused. |
Value
Invisibly returns x.
Author(s)
Martin Haringa
Print an excess threshold assessment
Description
Compact print method for objects returned by assess_excess_threshold().
Usage
## S3 method for class 'excess_threshold_assessment'
print(x, ...)
Arguments
x |
An object returned by |
... |
Unused. |
Value
Invisibly returns x.
Author(s)
Martin Haringa
Deprecated alias for rating_table()
Description
rating_factors() is deprecated as of version 0.8.0. Use
rating_table() instead.
Usage
rating_factors(
...,
model_data = NULL,
exposure = TRUE,
exposure_name = NULL,
signif_stars = FALSE,
exponentiate = TRUE,
round_exposure = 0
)
Arguments
... |
glm object(s) produced by |
model_data |
Optional data.frame used to create the model(s). If |
exposure |
Logical or character. If |
exposure_name |
Deprecated. Use |
signif_stars |
Deprecated. Use |
exponentiate |
logical indicating whether or not to exponentiate the coefficient estimates. Defaults to TRUE. |
round_exposure |
number of digits for exposure (defaults to 0) |
Value
See rating_table().
Deprecated single-model rating table helper
Description
Legacy interface. Prefer rating_table() for fitted models in the new
workflow, but this function remains available.
Usage
rating_factors2(
model,
model_data = NULL,
exposure = TRUE,
exposure_name = NULL,
colname = "estimate",
exponentiate = TRUE,
round_exposure = 0
)
Arguments
model |
glm object produced by |
model_data |
Optional data.frame used to create glm object. If |
exposure |
Logical or character. If |
exposure_name |
Optional name for the exposure column in the output. |
colname |
name of coefficient column |
exponentiate |
logical indicating whether or not to exponentiate the coefficient estimates. Defaults to TRUE. |
round_exposure |
number of digits for exposure (defaults to 0) |
Value
A data frame with rating factor coefficients for one model.
Construct observed rating-grid points from model data or a data frame
Description
rating_grid() constructs rating-grid points by collapsing rows with
identical combinations of grouping variables to a single row.
The function returns only combinations that are actually observed in the input data. It does not create the full Cartesian product of all unique values. This keeps the output compact and suitable for model diagnostics, portfolio summaries, and prediction analysis.
When x is an object returned by extract_model_data(), the function uses
the extracted model metadata to determine the grouping variables if
group_by is not supplied. When x is a plain data.frame, it is
recommended to supply group_by explicitly.
Usage
rating_grid(
x,
group_by = NULL,
exposure = NULL,
exposure_by = NULL,
aggregate_cols = NULL,
drop_na = FALSE,
group_vars = NULL,
agg_cols = NULL
)
Arguments
x |
A |
group_by |
Optional character vector with the variables that define the
rating-grid points. If |
exposure |
Optional character; name of the exposure column to aggregate. |
exposure_by |
Optional character; name of a column used to split exposure or counts, for example a year variable. |
aggregate_cols |
Optional character vector with additional numeric
columns to aggregate using |
drop_na |
Logical; if |
group_vars, agg_cols |
Deprecated argument names. Use |
Details
The implementation uses base R only. Output is always a regular
data.frame, not a tibble or data.table.
If exposure_by is supplied, exposure or row counts are split across levels
of that variable and returned in wide format, for example
"exposure_2020" or "count_2020".
For objects returned by extract_model_data(), refinement mappings are joined
by their original factor column. They are not cross-joined onto every row.
Value
A data.frame with one row per observed rating-grid point.
Author(s)
Martin Haringa
Examples
## Not run:
rating_grid(mtcars, group_by = c("cyl", "vs"))
rating_grid(
mtcars,
group_by = c("cyl", "vs"),
exposure = "disp",
exposure_by = "gear",
aggregate_cols = "mpg"
)
pmodel <- glm(
breaks ~ wool + tension,
data = warpbreaks,
family = poisson(link = "log")
)
pmodel |>
extract_model_data() |>
rating_grid()
## End(Not run)
Build rating tables from fitted pricing models
Description
rating_table() extracts model coefficients in tariff-table form. It adds
the reference level for factor variables, can exponentiate GLM coefficients
into relativities, and can add exposure by risk-factor level when the model
data are available.
rating_table() is intended for fitted models:
plain
glmobjectsmodels obtained after
refit()models obtained after
refit_glm()
For pre-refit objects (rating_refinement, restricted, smooth) use
print(), summary() and autoplot() instead.
Usage
rating_table(
...,
model_data = NULL,
exposure = TRUE,
exposure_output = NULL,
exponentiate = TRUE,
significance = FALSE,
round_exposure = 0,
exposure_name = NULL,
signif_stars = NULL
)
Arguments
... |
glm object(s) produced by |
model_data |
Optional data.frame used to create the model(s). If |
exposure |
Logical or character. If |
exposure_output |
Optional name for the exposure column in the output.
If |
exponentiate |
logical indicating whether or not to exponentiate the coefficient estimates. Defaults to TRUE. |
significance |
Logical; if |
round_exposure |
number of digits for exposure (defaults to 0) |
exposure_name |
Deprecated. Use |
signif_stars |
Deprecated. Use |
Value
Object of class "rating_table" and legacy class "riskfactor".
Deprecated alias for merge_date_ranges()
Description
reduce() is deprecated as of version 0.8.0. Use
merge_date_ranges() instead.
Usage
reduce(df, begin, end, ..., agg_cols = NULL, agg = "sum", min.gapwidth = 5)
Arguments
df |
A |
begin |
Deprecated NSE argument. Use |
end |
Deprecated NSE argument. Use |
... |
Deprecated NSE grouping columns. Use |
agg_cols |
Deprecated NSE argument. Use |
agg |
Deprecated. Use |
min.gapwidth |
Deprecated. Use |
Value
See merge_date_ranges().
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- ggplot2
Refit a prepared refinement workflow
Description
Applies the refinement steps stored in a rating_refinement object and
returns a refitted GLM. This is the final step in the refinement workflow
after prepare_refinement(), add_smoothing(), add_restriction() or
add_relativities() have been used to define the proposed tariff structure.
Usage
refit(object, intercept_only = FALSE, ...)
Arguments
object |
Object of class |
intercept_only |
Logical. If |
... |
Additional arguments passed to |
Details
Refinement steps are not applied to the fitted model immediately. They are
collected on the rating_refinement object so they can be inspected first,
for example with autoplot.rating_refinement(). refit() then applies the
steps in order, updates the model formula and data, and calls stats::glm()
with the original model family and any additional arguments passed through
....
With intercept_only = FALSE, the refined GLM is fitted with the remaining
free model terms that are still present after applying the refinement steps.
With intercept_only = TRUE, remaining original model effects are fixed as
offsets based on the existing fitted relativities. The refit then estimates
only the intercept. This can be useful when the relative tariff structure
should remain fixed and only the overall premium level should be recalibrated.
Value
A refitted object of class glm. The returned model also stores
attributes used by rating_table() and rating_grid() to recognise
refined rating factors, fixed relativities and smoothing metadata.
Author(s)
Martin Haringa
Examples
zip_df <- data.frame(
zip = c(0, 1, 2, 3),
zip_adj = c(0.8, 0.9, 1.0, 1.2)
)
model <- glm(
nclaims ~ zip + offset(log(exposure)),
family = poisson(),
data = MTPL
)
refined_model <- prepare_refinement(model) |>
add_restriction(zip_df) |>
refit(intercept_only = TRUE)
Deprecated refit wrapper
Description
refit_glm() is deprecated as of version 0.9.0. Use refit() instead.
Usage
refit_glm(x, intercept_only = FALSE, ...)
Arguments
x |
Object of class |
intercept_only |
Logical. |
... |
Other arguments. |
Value
Object of class glm.
Combine multiple level splits into relativities
Description
Helper function to combine multiple level split definitions into a single
named list suitable for use in add_relativities().
Usage
relativities(...)
Arguments
... |
One or more objects created by |
Value
A named list of data.frames suitable for the relativities
argument in add_relativities().
Examples
relativities(
split_level("construction",
c("residential", "commercial", "civil"),
c(1.00, 1.10, 1.25))
)
Deprecated restriction helper
Description
restrict_coef() is deprecated as of version 0.9.0. Use
add_restriction() instead.
prepare_refinement(model) |> add_restriction(...) |> refit()
Usage
restrict_coef(model, restrictions)
Arguments
model |
A fitted model object. |
restrictions |
data.frame with exactly two columns. |
Value
A legacy restricted object. New code should use
prepare_refinement(), add_restriction(), and refit().
See Also
add_restriction(), prepare_refinement(), refit()
Generate random samples from a truncated gamma distribution
Description
Generates random observations from a gamma distribution
truncated to the interval (lower, upper) using inverse transform
sampling.
Usage
rgammat(n, shape, scale, lower, upper)
Arguments
n |
Integer. Number of observations to generate. |
shape |
Numeric. Shape parameter of the gamma distribution. |
scale |
Numeric. Scale parameter of the gamma distribution. |
lower |
Numeric. Lower truncation bound. |
upper |
Numeric. Upper truncation bound. |
Details
Random values are generated by sampling from a uniform distribution on the
interval [F(lower), F(upper)], where F is the CDF of the gamma
distribution, and then applying the inverse CDF.
This approach ensures that the generated values follow the truncated distribution exactly.
The implementation is based on the inverse transform method as described in: https://www.r-bloggers.com/2020/08/generating-data-from-a-truncated-distribution/
Value
A numeric vector of length n containing random draws from the
truncated gamma distribution.
Author(s)
Martin Haringa
Fit a GAM for a continuous risk factor
Description
Fits a generalized additive model (GAM) to a continuous risk factor in one of three insurance pricing contexts: claim frequency, claim severity, or pure premium. The fitted curve helps assess non-linear rating effects before a continuous variable is grouped into tariff segments or used in a GLM workflow.
Usage
risk_factor_gam(
data,
risk_factor = NULL,
claim_count = NULL,
exposure = NULL,
claim_amount = NULL,
pure_premium = NULL,
model = "frequency",
round_risk_factor = NULL,
x = NULL,
nclaims = NULL,
amount = NULL,
round_x = NULL
)
Arguments
data |
A data.frame containing the insurance portfolio. |
risk_factor |
Character, name of column in |
claim_count |
Character, name of column in |
exposure |
Character, name of column in |
claim_amount |
(Optional) Character, column name in |
pure_premium |
(Optional) Character, column name in |
model |
Character string specifying the model type. One of
|
round_risk_factor |
(Optional) Numeric value to round the risk factor to
a multiple of |
x, nclaims, amount, round_x |
Deprecated argument names. Use |
Details
-
Frequency model: Fits a Poisson GAM to the number of claims. The log of the exposure is used as an offset so the expected number of claims is proportional to exposure.
-
Severity model: Fits a Gamma GAM with log link to the average claim size (total amount divided by number of claims). The number of claims is included as a weight.
-
Pure premium model: Fits a Gamma GAM with log link to the pure premium (risk premium). Implemented by aggregating exposure-weighted pure premiums. The deprecated model value
"burning"is still accepted for backward compatibility.
Migration from fit_gam()
The function fit_gam() is deprecated as of version 0.8.0 and replaced by
risk_factor_gam(). In addition to the name change, the interface has also
changed:
-
fit_gam()used non-standard evaluation (NSE), so column names could be passed unquoted (e.g.x = age_policyholder). -
risk_factor_gam()uses standard evaluation (SE), so column names must be passed as character strings (e.g.risk_factor = "age_policyholder").
This makes the function easier to use in programmatic workflows.
riskfactor_gam() and fit_gam() are still available for backward
compatibility but will emit deprecation warnings.
Value
A list of class "riskfactor_gam" with the following elements:
prediction |
A data frame with predicted values and confidence intervals. |
x |
Name of the continuous risk factor. |
model |
The model type: |
data |
Merged data frame with predictions and observed values. |
x_obs |
Observed values of the continuous risk factor. |
Author(s)
Martin Haringa
References
Antonio, K. and Valdez, E. A. (2012). Statistical concepts of a priori and a posteriori risk classification in insurance. Advances in Statistical Analysis, 96(2):187–224.
Henckaerts, R., Antonio, K., Clijsters, M. and Verbelen, R. (2018). A data driven binning strategy for the construction of insurance tariff classes. Scandinavian Actuarial Journal, 2018:8, 681–705.
Wood, S.N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3–36.
Examples
## --- Recommended new usage (SE) ---
# Column names must be passed as strings
risk_factor_gam(MTPL,
risk_factor = "age_policyholder",
claim_count = "nclaims",
exposure = "exposure")
## --- Deprecated usage (NSE) ---
# This still works but will show a warning
fit_gam(MTPL,
nclaims = nclaims,
x = age_policyholder,
exposure = exposure)
Deprecated alias for risk_factor_gam()
Description
riskfactor_gam() is deprecated in favour of risk_factor_gam().
Usage
riskfactor_gam(
data,
nclaims = NULL,
x = NULL,
exposure = NULL,
amount = NULL,
pure_premium = NULL,
model = "frequency",
round_x = NULL,
risk_factor = NULL,
claim_count = NULL,
claim_amount = NULL,
round_risk_factor = NULL
)
Arguments
data |
A data.frame containing the insurance portfolio. |
nclaims |
Deprecated. Use |
x |
Deprecated. Use |
exposure |
Character, name of column in |
amount |
Deprecated. Use |
pure_premium |
(Optional) Character, column name in |
model |
Character string specifying the model type. One of
|
round_x |
Deprecated. Use |
risk_factor |
Character, name of column in |
claim_count |
Character, name of column in |
claim_amount |
(Optional) Character, column name in |
round_risk_factor |
(Optional) Numeric value to round the risk factor to
a multiple of |
Value
See risk_factor_gam().
Generate random samples from a truncated lognormal distribution
Description
Generates random observations from a lognormal distribution
truncated to the interval (lower, upper) using inverse transform
sampling.
Usage
rlnormt(n, meanlog, sdlog, lower, upper)
Arguments
n |
Integer. Number of observations to generate. |
meanlog |
Numeric. Mean of the underlying normal distribution. |
sdlog |
Numeric. Standard deviation of the underlying normal distribution. |
lower |
Numeric. Lower truncation bound. |
upper |
Numeric. Upper truncation bound. |
Details
Random values are generated by sampling from a uniform distribution on the
interval [F(lower), F(upper)], where F is the CDF of the
lognormal distribution, and then applying the inverse CDF.
This approach ensures that the generated values follow the truncated distribution exactly.
The implementation is based on the inverse transform method as described in: https://www.r-bloggers.com/2020/08/generating-data-from-a-truncated-distribution/
Value
A numeric vector of length n containing random draws from the
truncated lognormal distribution.
Author(s)
Martin Haringa
Root Mean Squared Error (RMSE)
Description
Computes the root mean squared error (RMSE) for a fitted model, defined as the square root of the mean of squared differences between predictions and observed values.
Usage
rmse(x, data = NULL)
Arguments
x |
A fitted model object (e.g. of class |
data |
A data frame containing the variables used in the model. Required
if not already stored in |
Details
The RMSE indicates the absolute fit of the model to the data. It can be interpreted as the standard deviation of the unexplained variance, and is expressed in the same units as the response variable. Lower values indicate better model fit.
Value
A numeric value: the root mean squared error.
Author(s)
Martin Haringa
Examples
x <- glm(nclaims ~ area, offset = log(exposure),
family = poisson(), data = MTPL2)
rmse(x, MTPL2)
Deprecated alias for active_rows_by_date()
Description
rows_per_date() is deprecated as of version 0.9.0. Use
active_rows_by_date() instead.
Usage
rows_per_date(
df,
dates,
df_begin,
df_end,
dates_date,
...,
nomatch = NULL,
mult = "all"
)
Arguments
df |
Deprecated. Use |
dates |
A |
df_begin |
Deprecated NSE argument. Use |
df_end |
Deprecated NSE argument. Use |
dates_date |
Deprecated NSE argument. Use |
... |
Deprecated NSE join columns. Use |
nomatch |
When a row (with interval say, |
mult |
When multiple rows in y match to the row in x, |
Value
Scale secondary axis for background plotting
Description
Internal helper to rescale a secondary variable (s_axis) so it aligns
with the scale of a first variable (f_axis). Adds two new columns
to the data frame: s_axis_scale and s_axis_print.
Usage
scale_second_axis(background, df, dfby, f_axis, s_axis, by)
Value
The input data frame with two additional columns:
-
s_axis_scale– scaled version ofs_axis -
s_axis_print– rounded version ofs_axis
Set the reference level of a factor
Description
Relevels a factor so that the selected category becomes the reference
(first) level. By default, the reference level is chosen as the level with
the largest total weight, for example the largest exposure in an insurance
portfolio. Use method = "manual" with reference_level when a specific
business category should be the reference level.
Usage
set_reference_level(
x,
weight = NULL,
method = "largest_weight",
reference_level = NULL
)
Arguments
x |
A factor (unordered). Character vectors should be converted to factor before use. |
weight |
A numeric vector of the same length as |
method |
Character. Method used to choose the reference level.
Supported methods are |
reference_level |
Character string with the level to use as reference
when |
Value
A factor of the same length as x, with the selected reference
level set as the first level.
Author(s)
Martin Haringa
References
Kaas, Rob & Goovaerts, Marc & Dhaene, Jan & Denuit, Michel. (2008). Modern Actuarial Risk Theory: Using R. doi:10.1007/978-3-540-70998-5
Examples
## Not run:
library(dplyr)
df <- chickwts |>
mutate(across(where(is.character), as.factor)) |>
mutate(across(where(is.factor), ~set_reference_level(., weight)))
set_reference_level(df$feed, method = "manual", reference_level = "casein")
## End(Not run)
Deprecated smoothing helper
Description
smooth_coef() is deprecated as of version 0.9.0. Use
add_smoothing() instead.
prepare_refinement(model) |> add_smoothing(...) |> refit()
Usage
smooth_coef(
model,
x_cut,
x_org,
degree = NULL,
breaks = NULL,
smoothing = "spline",
k = NULL,
weights = NULL
)
Arguments
model |
A fitted model object. |
x_cut |
Deprecated model variable used in the GLM. |
x_org |
Deprecated source variable used to fit the smoothing curve. |
degree |
Deprecated polynomial degree. |
breaks |
Deprecated smoothing break points. |
smoothing |
Deprecated smoothing type. |
k |
Deprecated spline basis dimension. |
weights |
Deprecated weights column. |
Value
A legacy smooth object. New code should use prepare_refinement(),
add_smoothing(), and refit().
See Also
add_smoothing(), prepare_refinement(), refit()
Define a level split with relativities
Description
Helper function to define how one level of a risk factor should be split
into sublevels with corresponding relativities. Intended for use inside
relativities() and add_relativities().
Usage
split_level(level, new_levels, relativities)
Arguments
level |
Character string. Existing level of the risk factor to split. |
new_levels |
Character vector. Names of the new sublevels. |
relativities |
Numeric vector. Relativities corresponding to each
sublevel. Must have the same length as |
Value
A named list of length 1, where the name is level and the
value is a data.frame with columns new_level and relativity.
Examples
split_level(
level = "construction",
new_levels = c("residential", "commercial", "civil"),
relativities = c(1.00, 1.10, 1.25)
)
Split policy periods into monthly rows
Description
Splits policy or exposure periods that cross calendar months into monthly rows. Numeric columns such as exposure or premium can be prorated over the resulting monthly rows.
This function uses standard evaluation (SE): column names must be passed
as character strings (e.g. period_start = "begin_date").
The older function period_to_months() used non-standard evaluation (NSE) and
is deprecated as of version 0.8.0.
Usage
split_periods_to_months(
df,
period_start = NULL,
period_end = NULL,
prorate_cols = NULL,
begin = NULL,
end = NULL,
cols = NULL
)
Arguments
df |
A |
period_start |
Character string. Name of the column with policy period start dates. |
period_end |
Character string. Name of the column with policy period end dates. |
prorate_cols |
Character vector with names of numeric columns to prorate over the monthly rows, for example exposure or premium. |
begin, end, cols |
Deprecated argument names kept for backward compatibility. |
Details
Rating and monitoring work often needs exposure, premium, claim counts, or policy counts by calendar month. Source portfolios, however, usually contain policy periods that start and end on arbitrary dates. This helper expands those periods into monthly rows before modelling, reporting, or joining to monthly portfolio summaries.
Prorated columns are distributed according to the part of the policy period that falls in each monthly row. Full months receive weight 1; partial months use a 30-day month convention. The total value of each prorated column is preserved per original row.
Value
A data.frame with the same columns as in df, plus an id column.
Author(s)
Martin Haringa
Examples
library(lubridate)
portfolio <- data.frame(
begin_date = ymd(c("2014-01-01", "2014-01-01")),
end_date = ymd(c("2014-03-14", "2014-05-10")),
exposure = c(0.2025, 0.3583),
premium = c(125, 150)
)
# New SE interface
split_periods_to_months(portfolio,
period_start = "begin_date",
period_end = "end_date",
prorate_cols = c("premium", "exposure")
)
# Old NSE interface (deprecated)
## Not run:
period_to_months(portfolio, begin_date, end_date, premium, exposure)
## End(Not run)
Construct a relativities mapping for level splitting
Description
Helper function to create a standardized data.frame defining relativities
for sublevels within a risk factor level. This function is intended to be
used as input for add_relativities().
Usage
split_relativities(new_levels, relativities)
Arguments
new_levels |
Character vector. Names of the new sublevels. |
relativities |
Numeric vector. Relativities corresponding to each
sublevel. Must have the same length as |
Details
This function provides a convenient and safe way to construct the required
input structure for add_relativities(). Each call defines how a single
level of a risk factor is split into multiple sublevels with corresponding
relativities.
Value
A data.frame with columns:
- new_level
Character. Name of the new sublevel.
- relativity
Numeric. Multiplicative factor relative to the base level.
Examples
split_relativities(
new_levels = c("residential", "commercial", "civil"),
relativities = c(1.00, 1.10, 1.25)
)
Summarise an excess-loss allocation
Description
Return the allocation audit table from an object produced by
allocate_excess_loss().
Usage
## S3 method for class 'excess_loss_allocation'
summary(object, compare_to_empirical = FALSE, ...)
Arguments
object |
An object returned by |
compare_to_empirical |
Logical. If |
... |
Unused. |
Value
A data.frame.
Author(s)
Martin Haringa
Summarise an excess threshold assessment
Description
Return the key threshold diagnostics from an
"excess_threshold_assessment" object.
Usage
## S3 method for class 'excess_threshold_assessment'
summary(object, ...)
Arguments
object |
An object returned by |
... |
Unused. |
Value
A data.frame.
Author(s)
Martin Haringa
Deprecated alias for factor_analysis()
Description
univariate() is deprecated as of version 0.8.0. Use
factor_analysis() instead.
Usage
univariate(
df,
x,
severity = NULL,
nclaims = NULL,
exposure = NULL,
premium = NULL,
by = NULL
)
Arguments
df |
A |
x |
Column name or expression with the risk factor. |
severity |
Column name or expression with claim amounts. |
nclaims |
Column name or expression with claim counts. |
exposure |
Column name or expression with exposures. |
premium |
Column name or expression with premiums. |
by |
Optional grouping column name or expression. |
Value
See factor_analysis().
Create new offset-term and new formula
Description
Create new offset-term and new formula
Usage
update_formula_add(offset_term, fm_no_offset, add_term)
Arguments
offset_term |
String obtained from get_offset() |
fm_no_offset |
Obtained from remove_offset_formula() |
add_term |
Name of restricted risk factor to add |
Deprecated alias for refit_glm()
Description
update_glm() is deprecated as of version 0.8.0. Use refit() for the new
refinement workflow.
Usage
update_glm(x, intercept_only = FALSE, ...)
Arguments
x |
Object of class |
intercept_only |
Logical. |
... |
Other arguments. |
Value
See refit_glm().