The expstudies package is meant to make analyzing life experience data easier. How to use the package is best shown through example, so load up expstudies.
You will need to be able to manipulate data frames to effectively work with this package. I use dplyr from the tidyverse for this. We load dplyr along with magrittr for the “%>%” operator.
##Making exposures from records Some synthetic data called “records” is included in the package. The data must have a “key”, “start”, and “end” column or the package will throw an error. It is also a requirement that the key column have no duplicate values.
key | start | end | issue_age | gender |
---|---|---|---|---|
B10251C8 | 2010-04-10 | 2019-04-04 | 35 | M |
D68554D5 | 2005-01-01 | 2019-04-04 | 30 | F |
The addExposures function creates rows for each policy year between the start and end date. We use 365.25 days as a full policy year.
key | duration | start_int | end_int | exposure |
---|---|---|---|---|
B10251C8 | 1 | 2010-04-10 | 2011-04-09 | 0.9993 |
B10251C8 | 2 | 2011-04-10 | 2012-04-09 | 1.002 |
B10251C8 | 3 | 2012-04-10 | 2013-04-09 | 0.9993 |
B10251C8 | 4 | 2013-04-10 | 2014-04-09 | 0.9993 |
B10251C8 | 5 | 2014-04-10 | 2015-04-09 | 0.9993 |
B10251C8 | 6 | 2015-04-10 | 2016-04-09 | 1.002 |
There is also an option for calculating monthly policy records in case we want to model skewness within policy years. This isn’t the default because a single record could result in hundreds of rows in the exposures data frame.
key | duration | policy_month | start_int | end_int | exposure |
---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 |
#Mortality/Lapse studies Let’s modify exposures in the year of death and add an indicator in the duration of death.
exposures_mod <- exposures %>% group_by(key) %>% mutate(exposure_mod = if_else(duration == max(duration), 1, exposure), death_cnt = if_else(duration == max(duration), 1, 0)) %>% ungroup()
tail(exposures_mod, 4)
key | duration | start_int | end_int | exposure | exposure_mod | death_cnt |
---|---|---|---|---|---|---|
D68554D5 | 12 | 2016-01-01 | 2016-12-31 | 1.002 | 1.002 | 0 |
D68554D5 | 13 | 2017-01-01 | 2017-12-31 | 0.9993 | 0.9993 | 0 |
D68554D5 | 14 | 2018-01-01 | 2018-12-31 | 0.9993 | 0.9993 | 0 |
D68554D5 | 15 | 2019-01-01 | 2019-04-04 | 0.2574 | 1 | 1 |
Now we can aggregate by duration to calculate mortality rates.
duration | q |
---|---|
1 | 0 |
2 | 0 |
3 | 0 |
4 | 0 |
5 | 0 |
6 | 0 |
7 | 0 |
8 | 0 |
9 | 0.5002 |
10 | 0 |
11 | 0 |
12 | 0 |
13 | 0 |
14 | 0 |
15 | 1 |
##Adding additional information We can add additional information by joining on our key.
exposures_mod <- exposures_mod %>% inner_join(select(records, key, issue_age, gender), by = "key")
head(exposures_mod)
key | duration | start_int | end_int | exposure | exposure_mod |
---|---|---|---|---|---|
B10251C8 | 1 | 2010-04-10 | 2011-04-09 | 0.9993 | 0.9993 |
B10251C8 | 2 | 2011-04-10 | 2012-04-09 | 1.002 | 1.002 |
B10251C8 | 3 | 2012-04-10 | 2013-04-09 | 0.9993 | 0.9993 |
B10251C8 | 4 | 2013-04-10 | 2014-04-09 | 0.9993 | 0.9993 |
B10251C8 | 5 | 2014-04-10 | 2015-04-09 | 0.9993 | 0.9993 |
B10251C8 | 6 | 2015-04-10 | 2016-04-09 | 1.002 | 1.002 |
death_cnt | issue_age | gender |
---|---|---|
0 | 35 | M |
0 | 35 | M |
0 | 35 | M |
0 | 35 | M |
0 | 35 | M |
0 | 35 | M |
Now we can calculate mortality by attained age. Or by attained age and gender.
attained_age | gender | q |
---|---|---|
41 | M | 0 |
42 | F | 0 |
42 | M | 0 |
43 | F | 0 |
43 | M | 1 |
44 | F | 1 |
##Premium Pattern We assume that the user has dated transactions with a key that corresponds to the key in the record file. Some simulated transactions come with the package.
key | trans_date | amt |
---|---|---|
B10251C8 | 2012-12-04 | 199 |
B10251C8 | 2013-12-28 | 197 |
B10251C8 | 2015-12-30 | 177 |
B10251C8 | 2019-05-07 | 192 |
B10251C8 | 2012-04-15 | 206 |
B10251C8 | 2019-04-02 | 220 |
The addStart function adds the start date of the appropriate exposure interval to the transactions.
start_int | key | trans_date | amt |
---|---|---|---|
2010-05-10 | B10251C8 | 2010-05-28 | 190 |
2010-06-10 | B10251C8 | 2010-07-04 | 189 |
2010-11-10 | B10251C8 | 2010-11-21 | 179 |
2011-04-10 | B10251C8 | 2011-05-08 | 210 |
2011-07-10 | B10251C8 | 2011-07-12 | 198 |
2012-01-10 | B10251C8 | 2012-01-14 | 194 |
We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.
trans_to_join <- trans_with_interval %>% group_by(start_int, key) %>% summarise(premium = sum(amt))
head(trans_to_join)
start_int | key | premium |
---|---|---|
2005-06-01 | D68554D5 | 97 |
2005-10-01 | D68554D5 | 169 |
2005-12-01 | D68554D5 | 96 |
2006-01-01 | D68554D5 | 193 |
2006-02-01 | D68554D5 | 107 |
2006-03-01 | D68554D5 | 119 |
Then we can join this to the exposures using a left join without duplicating any exposures.
premium_study <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(premium_study, 10)
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | NA |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | NA |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | NA |
B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | NA |
B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 |
B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | NA |
B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | NA |
Change the NA values resulting from the join to zeros using an if_else.
premium_study <- premium_study %>% mutate(premium = if_else(is.na(premium), 0, premium))
head(premium_study, 10)
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | 0 |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 0 |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 0 |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 0 |
B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | 0 |
B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 |
B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | 0 |
B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | 0 |
Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.
policy_month | avg_premium |
---|---|
1 | 60.46 |
2 | 66.88 |