Case study part 1

Introduction

The clmplus package provides practitioners with a fast and user friendly implementation of the modeling framework we derived in our paper Pittarello G., Hiabu M., and Villegas A., Replicating and extending chain-ladder via an age-period-cohort structure on the claim development in a run-off triangle, (pre-print, 2022).

We were able to connect the well-known hazard models developed in life insurance to non-life run-off triangles claims development. The flexibility of this approach goes beyond the methodological novelty: we hope to provide a user-friendly set of tools based on the point of contact between non-life insurance and life insurance in the actuarial science.

This vignette is organized as follows:

  • We show the connection between the age-period representation and run-off triangles.

  • We replicate the chain-ladder model with an age-model. As shown in the paper, by using the clmplus approach the resulting model is saving some parameters with respect to the standard GLM approach.

  • We show an example where adding a cohort effect can lead to an improvement on the model fit.

In this tutorial, we show an example on the AutoBIPaid run-off triangle from the ChainLadder package.

One discipline, one language

Consider the data set we chose for this tutorial.

The run-off triangle representation is displayed below:

library(ChainLadder)

data("AutoBI")
dataset=AutoBI$AutoBIPaid 
dataset
#>      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
#> [1,] 1904 5398  7496  8882  9712 10071 10199 10256
#> [2,] 2235 6261  8691 10443 11346 11754 12031    NA
#> [3,] 2441 7348 10662 12655 13748 14235    NA    NA
#> [4,] 2503 8173 11810 14176 15383    NA    NA    NA
#> [5,] 2838 8712 12728 15278    NA    NA    NA    NA
#> [6,] 2405 7858 11771    NA    NA    NA    NA    NA
#> [7,] 2759 9182    NA    NA    NA    NA    NA    NA
#> [8,] 2801   NA    NA    NA    NA    NA    NA    NA

colnames(dataset)=c(0:(dim(dataset)[1]-1))
rownames(dataset)=c(0:(dim(dataset)[1]-1))

Practitioners in general insurance refer to the x axis of this representation as development years. Similarly, the y axis is called accident years. The third dimension that matters is the diagonals: the calendar years. There is a one-to-one correspondence between the age-period representation and run-off triangles. In notional terms, life insurance actuaries use the following terminology:

  • ages are development years.

  • cohorts are accident years.

  • periods is calendar years.

Indeed, the age-period representation of the run-off triangle is the following:

#>      [,1] [,2] [,3] [,4]  [,5]  [,6]  [,7]  [,8]
#> [1,] 1904 2235 2441 2503  2838  2405  2759  2801
#> [2,]   NA 5398 6261 7348  8173  8712  7858  9182
#> [3,]   NA   NA 7496 8691 10662 11810 12728 11771
#> [4,]   NA   NA   NA 8882 10443 12655 14176 15278
#> [5,]   NA   NA   NA   NA  9712 11346 13748 15383
#> [6,]   NA   NA   NA   NA    NA 10071 11754 14235
#> [7,]   NA   NA   NA   NA    NA    NA 10199 12031
#> [8,]   NA   NA   NA   NA    NA    NA    NA 10256

Observe that the y axis is now the development years (or age) component.

Calendar years (or periods) are displayed on the x axis.

Accident years (or cohorts) are on the diagonals.

Replicate the chain-ladder with the clmplus package

clmplus is an out-of-the box set of tools to compute the claims reserve. We now show how to replicate the chain ladder model. Observe the run-off triangle data structure first needs to be initialized to a AggregateDataPP object.

Starting from the results in the paper we showed how to replicate the chain ladder with an age model. The computation on the AggregateDataPP object is obtained with the method clmplus specifying an hazard model. The clmplus method, estimates the models parameters.

a.model.fit=clmplus(AggregateDataPP =  rtt, 
             hazard.model = "a")
#> StMoMo: The following ages have been zero weigthed: 1 
#> StMoMo: The following years have been zero weigthed: 1 
#> StMoMo: The following cohorts have been zero weigthed: -7 -6 -5 -4 -3 -2 -1 7 
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm

Out of the fitted model, it is possible to extract the fitted development factors:


a.model.fit$fitted_development_factors
#>      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
#> [1,]   NA 3.098156 1.443611 1.195516 1.087378 1.036028 1.018557 1.005589
#> [2,]   NA 3.098156 1.443611 1.195516 1.087378 1.036028 1.018557       NA
#> [3,]   NA 3.098156 1.443611 1.195516 1.087378 1.036028       NA       NA
#> [4,]   NA 3.098156 1.443611 1.195516 1.087378       NA       NA       NA
#> [5,]   NA 3.098156 1.443611 1.195516       NA       NA       NA       NA
#> [6,]   NA 3.098156 1.443611       NA       NA       NA       NA       NA
#> [7,]   NA 3.098156       NA       NA       NA       NA       NA       NA
#> [8,]   NA       NA       NA       NA       NA       NA       NA       NA

It is also possible to extract the fitted effects on the claims development.


a.model.fit$fitted_effects
#> $fitted_development_effect
#>           0           1           2           3           4           5 
#>          NA  0.02366899 -1.01313612 -1.72538123 -2.48027780 -3.34130515 
#>           6           7 
#> -3.99615988 -5.18978418 
#> 
#> $fitted_calendar_effect
#> NULL
#> 
#> $fitted_accident_effect
#> NULL

Predictions can be computed with the predict method.


a.model <- predict(a.model.fit)

Out of the predict method, we can extract the predicted development factors, the full and lower triangle of predicted cumulative claims.


a.model$development_factors_predicted
#>      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
#> [1,]   NA       NA       NA       NA       NA       NA       NA       NA
#> [2,]   NA       NA       NA       NA       NA       NA       NA 1.005589
#> [3,]   NA       NA       NA       NA       NA       NA 1.018557 1.005589
#> [4,]   NA       NA       NA       NA       NA 1.036028 1.018557 1.005589
#> [5,]   NA       NA       NA       NA 1.087378 1.036028 1.018557 1.005589
#> [6,]   NA       NA       NA 1.195516 1.087378 1.036028 1.018557 1.005589
#> [7,]   NA       NA 1.443611 1.195516 1.087378 1.036028 1.018557 1.005589
#> [8,]   NA 3.098156 1.443611 1.195516 1.087378 1.036028 1.018557 1.005589

a.model$lower_triangle
#>      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
#> [1,]   NA       NA       NA       NA       NA       NA       NA       NA
#> [2,]   NA       NA       NA       NA       NA       NA       NA 12098.24
#> [3,]   NA       NA       NA       NA       NA       NA 14499.15 14580.19
#> [4,]   NA       NA       NA       NA       NA 15937.22 16232.97 16323.69
#> [5,]   NA       NA       NA       NA 16612.95 17211.49 17530.88 17628.86
#> [6,]   NA       NA       NA 14072.42 15302.04 15853.34 16147.53 16237.77
#> [7,]   NA       NA 13255.24 15846.86 17231.52 17852.34 18183.62 18285.24
#> [8,]   NA 8677.936 12527.57 14976.91 16285.56 16872.30 17185.39 17281.44

a.model$full_triangle
#>      0        1        2        3        4        5        6        7
#> 0 1904 5398.000  7496.00  8882.00  9712.00 10071.00 10199.00 10256.00
#> 1 2235 6261.000  8691.00 10443.00 11346.00 11754.00 12031.00 12098.24
#> 2 2441 7348.000 10662.00 12655.00 13748.00 14235.00 14499.15 14580.19
#> 3 2503 8173.000 11810.00 14176.00 15383.00 15937.22 16232.97 16323.69
#> 4 2838 8712.000 12728.00 15278.00 16612.95 17211.49 17530.88 17628.86
#> 5 2405 7858.000 11771.00 14072.42 15302.04 15853.34 16147.53 16237.77
#> 6 2759 9182.000 13255.24 15846.86 17231.52 17852.34 18183.62 18285.24
#> 7 2801 8677.936 12527.57 14976.91 16285.56 16872.30 17185.39 17281.44

Interestingly we provide predictions for different forecasting horizons. Below predictions for one calendar period. This can be specified with the forecasting_horizon argument.


a.model.2 <- predict(a.model.fit,
                     forecasting_horizon=1)

We show the consistency of our approach by comparing our estimates with those obtained with the Mack chain ladder method as implemented in the ChainLadder package.

mck.chl <- MackChainLadder(dataset)
ultimate.chl=mck.chl$FullTriangle[,dim(mck.chl$FullTriangle)[2]]
diagonal=rev(t2c(mck.chl$FullTriangle)[,dim(mck.chl$FullTriangle)[2]])

Estimates are gathered in a data.frame to ease the understanding.

data.frame(ultimate.cost.mack=ultimate.chl,
           ultimate.cost.clmplus=a.model$ultimate_cost,
           reserve.mack=ultimate.chl-diagonal,
           reserve.clmplus=a.model$reserve
           )
#>   ultimate.cost.mack ultimate.cost.clmplus reserve.mack reserve.clmplus
#> 0           10256.00              10256.00      0.00000         0.00000
#> 1           12098.24              12098.24     67.23865        67.23865
#> 2           14580.19              14580.19    345.18727       345.18727
#> 3           16323.69              16323.69    940.68770       940.68770
#> 4           17628.86              17628.86   2350.85562      2350.85562
#> 5           16237.77              16237.77   4466.77443      4466.77443
#> 6           18285.24              18285.24   9103.24335      9103.24335
#> 7           17281.44              17281.44  14480.43832     14480.43832

cat('\n Total reserve:',
    sum(a.model$reserve))
#> 
#>  Total reserve: 31754.43

Claims reserving with GLMs compared to hazard models

We fit the standard GLM model with the apc package. As shown in the paper the chain-ladder model can be replicated by fitting an age-cohort model.

library(apc)

ds.apc = apc.data.list(cum2incr(dataset),
                       data.format = "CL")

ac.model.apc = apc.fit.model(ds.apc,
                         model.family = "od.poisson.response",
                         model.design = "AC")

Inspect the model coefficients derived from the output:


ac.model.apc$coefficients.canonical[,'Estimate']
#>        level    age slope cohort slope     DD_age_3     DD_age_4     DD_age_5 
#>   7.41596168   0.74105900   0.16519698  -1.16411707  -0.02909890  -0.17467013 
#>     DD_age_6     DD_age_7     DD_age_8  DD_cohort_3  DD_cohort_4  DD_cohort_5 
#>  -0.17533888   0.17408744  -0.55360421   0.02140672  -0.07364998  -0.03603392 
#>  DD_cohort_6  DD_cohort_7  DD_cohort_8 
#>  -0.15911660   0.20095088  -0.17521544

ac.fcst.apc = apc.forecast.ac(ac.model.apc)

data.frame(reserve.mack=ultimate.chl-diagonal,
           reserve.apc=c(0,ac.fcst.apc$response.forecast.coh[,'forecast']),
           reserve.clmplus=a.model$reserve
           
           )
#>   reserve.mack reserve.apc reserve.clmplus
#> 0      0.00000     0.00000         0.00000
#> 1     67.23865    67.23865        67.23865
#> 2    345.18727   345.18727       345.18727
#> 3    940.68770   940.68770       940.68770
#> 4   2350.85562  2350.85562      2350.85562
#> 5   4466.77443  4466.77443      4466.77443
#> 6   9103.24335  9103.24335      9103.24335
#> 7  14480.43832 14480.43832     14480.43832

Our method is able to replicate the chain-ladder results with no need to add the cohort component.

a.model.fit$fitted_effects
#> $fitted_development_effect
#>           0           1           2           3           4           5 
#>          NA  0.02366899 -1.01313612 -1.72538123 -2.48027780 -3.34130515 
#>           6           7 
#> -3.99615988 -5.18978418 
#> 
#> $fitted_calendar_effect
#> NULL
#> 
#> $fitted_accident_effect
#> NULL

Further inspection can be performed with the clmplus package, which provides the graphical tools to inspect the fitted effects. Observe we model the rate in continuous time, the choice of a line plot is then consistent.

plot(a.model)

The benefitial effect of adding the cohort component

It is straightforward to state that from the statistical perspective it is desirable to have a model with less parameters. Nevertheless, our approach goes far beyond that.

By adding the cohort effect we are able to improve our modeling.

We show these results by inspecting the residuals plots.

#make it triangular
plot(a.model.fit)

Clearly, the red and blue areas suggest some trends that the model wasn’t able to catch. Consider now the age-cohort model and its residuals plot.

ac.model.fit <- clmplus(rtt, 
                    hazard.model="ac")
#> StMoMo: The following ages have been zero weigthed: 1 
#> StMoMo: The following years have been zero weigthed: 1 
#> StMoMo: The following cohorts have been zero weigthed: -7 -6 -5 -4 -3 -2 -1 7 
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm

ac.model <- predict(ac.model.fit,
                    gk.fc.model='a')
plot(ac.model.fit)

With no need of extrapolating a period component, we were able to improve the fit already. Pay attention, the cohort component for the cohort m is extrapolated.


plot(ac.model)

Extrapolation of a period component

Similarly, it is possible to add a period component and choose an age-period model or an age-period-cohort model.

ap.model.fit = clmplus(rtt,
                   hazard.model = "ap")
#> StMoMo: The following ages have been zero weigthed: 1 
#> StMoMo: The following years have been zero weigthed: 1 
#> StMoMo: The following cohorts have been zero weigthed: -7 -6 -5 -4 -3 -2 -1 7 
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm

ap.model<-predict(ap.model.fit, 
                   ckj.fc.model='a',
                   ckj.order = c(0,1,0))

apc.model.fit = clmplus(rtt,hazard.model = "apc")
#> StMoMo: The following ages have been zero weigthed: 1 
#> StMoMo: The following years have been zero weigthed: 1 
#> StMoMo: The following cohorts have been zero weigthed: -7 -6 -5 -4 -3 -2 -1 7 
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm

apc.model<-predict(apc.model.fit, 
                   gk.fc.model='a', 
                   ckj.fc.model='a',
                   gk.order = c(1,1,0),
                   ckj.order = c(0,1,0))
plot(ap.model.fit)

It can be seen that the age-period model does not suggest any serious improvement from the age-cohort model. It is worth noticing one more time that the age-cohort model does not require any extrapolation. In a similar fashion, we plot the age-period-cohort model below, which seems to lead us to a small improvement.

plot(apc.model.fit)

Below, the effects of the age-period-cohort model.

plot(apc.model)

Conclusions

In this vignette we wanted to show the flexibility of our modeling approach with respect to the well-known chain-ladder model.

  • By modeling the hazard rate we are able to replicate the chain-ladder results with less parameters. Indeed, we model the age component directly and add a cohort effect if needed.

  • Going from an age model to an age-cohort model may lead to a serious improvement in the model results.