--- title: "Case study part 1" author: "Gabriele Pittarello" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Case study part 1} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r tailor made functions, include=FALSE} t2c <- function(x){ " Function to transform an upper run-off triangle into a half-square. This function takes an upper run-off triangle as input. It returns a half square. " I= dim(x)[1] J= dim(x)[2] mx=matrix(NA,nrow=I,ncol=J) for(i in 1:(I)){ for(j in 1:(J)){ if(i+j<=J+1){ mx[j,(i+j-1)]=x[i,j] } } } return(mx) } c2t <- function(x){ " Function to transform a square into an upper run-off triangle. This function takes a half square as input. It returns an upper run-off triangle. " I= dim(x)[1] J= dim(x)[2] mx=matrix(NA,nrow=I,ncol=J) for(i in 1:(I)){ for(j in 1:(J)){ if(i+j<=J+1){ mx[i,j]=x[j,(i+j-1)] } } } return(mx) } ``` # Introduction The `clmplus` package provides practitioners with a fast and user friendly implementation of the modeling framework we derived in our paper *Pittarello G., Hiabu M., and Villegas A., Replicating and extending chain-ladder via an age-period-cohort structure on the claim development in a run-off triangle, (pre-print, 2022)*. We were able to connect the well-known hazard models developed in life insurance to non-life run-off triangles claims development. The flexibility of this approach goes beyond the methodological novelty: we hope to provide a user-friendly set of tools based on the point of contact between non-life insurance and life insurance in the actuarial science. This vignette is organized as follows: * We show the connection between the age-period representation and run-off triangles. * We replicate the chain-ladder model with an age-model. As shown in the paper, by using the `clmplus` approach the resulting model is saving some parameters with respect to the standard GLM approach. * We show an example where adding a cohort effect can lead to an improvement on the model fit. In this tutorial, we show an example on the `AutoBIPaid` run-off triangle from the `ChainLadder` package. ## One discipline, one language Consider the data set we chose for this tutorial. The run-off triangle representation is displayed below: ```{r sand box, message=FALSE, warning=FALSE} library(ChainLadder) data("AutoBI") dataset=AutoBI$AutoBIPaid dataset colnames(dataset)=c(0:(dim(dataset)[1]-1)) rownames(dataset)=c(0:(dim(dataset)[1]-1)) ``` Practitioners in general insurance refer to the x axis of this representation as **development years**. Similarly, the y axis is called **accident years**. The third dimension that matters is the diagonals: the **calendar years**. There is a one-to-one correspondence between the age-period representation and run-off triangles. In notional terms, life insurance actuaries use the following terminology: * **ages** are **development years**. * **cohorts** are **accident years**. * **periods** is **calendar years**. Indeed, the age-period representation of the run-off triangle is the following: ```{r life representation, echo=FALSE} t2c(dataset) ``` Observe that the y axis is now the **development years** (or **age**) component. **Calendar years** (or **periods**) are displayed on the x axis. **Accident years** (or **cohorts**) are on the diagonals. # Replicate the chain-ladder with the `clmplus` package `clmplus` is an out-of-the box set of tools to compute the claims reserve. We now show how to replicate the chain ladder model. Observe the run-off triangle data structure first needs to be initialized to a `AggregateDataPP` object. ```{r rtt data, include=FALSE} library(clmplus) rtt <- AggregateDataPP(cumulative.payments.triangle = dataset) ``` Starting from the results in the paper we showed how to replicate the chain ladder with an age model. The computation on the `AggregateDataPP` object is obtained with the method `clmplus` specifying an hazard model. The `clmplus` method, estimates the models parameters. ```{r amodel, message=FALSE, warning=FALSE} a.model.fit=clmplus(AggregateDataPP = rtt, hazard.model = "a") ``` Out of the fitted model, it is possible to extract the fitted development factors: ```{r amodeloutput1, message=FALSE} a.model.fit$fitted_development_factors ``` It is also possible to extract the fitted effects on the claims development. ```{r amodeloutput2, message=FALSE} a.model.fit$fitted_effects ``` Predictions can be computed with the `predict` method. ```{r amodelpredict, message=FALSE} a.model <- predict(a.model.fit) ``` Out of the predict method, we can extract the predicted development factors, the full and lower triangle of predicted cumulative claims. ```{r dfpredicted, message=FALSE} a.model$development_factors_predicted ``` ```{r ltpredicted, message=FALSE} a.model$lower_triangle ``` ```{r ftpredicted, message=FALSE} a.model$full_triangle ``` Interestingly we provide predictions for different forecasting horizons. Below predictions for one calendar period. This can be specified with the `forecasting_horizon` argument. ```{r predictionsoneyear, message=FALSE} a.model.2 <- predict(a.model.fit, forecasting_horizon=1) ``` We show the consistency of our approach by comparing our estimates with those obtained with the Mack chain ladder method as implemented in the `ChainLadder` package. ```{r mack, message=FALSE, warning=FALSE} mck.chl <- MackChainLadder(dataset) ultimate.chl=mck.chl$FullTriangle[,dim(mck.chl$FullTriangle)[2]] diagonal=rev(t2c(mck.chl$FullTriangle)[,dim(mck.chl$FullTriangle)[2]]) ``` Estimates are gathered in a `data.frame` to ease the understanding. ```{r clm replicated} data.frame(ultimate.cost.mack=ultimate.chl, ultimate.cost.clmplus=a.model$ultimate_cost, reserve.mack=ultimate.chl-diagonal, reserve.clmplus=a.model$reserve ) cat('\n Total reserve:', sum(a.model$reserve)) ``` ## Claims reserving with GLMs compared to hazard models We fit the standard GLM model with the `apc` package. As shown in the paper the chain-ladder model can be replicated by fitting an age-cohort model. ```{r apc clm} library(apc) ds.apc = apc.data.list(cum2incr(dataset), data.format = "CL") ac.model.apc = apc.fit.model(ds.apc, model.family = "od.poisson.response", model.design = "AC") ``` Inspect the model coefficients derived from the output: ```{r show comparison} ac.model.apc$coefficients.canonical[,'Estimate'] ac.fcst.apc = apc.forecast.ac(ac.model.apc) data.frame(reserve.mack=ultimate.chl-diagonal, reserve.apc=c(0,ac.fcst.apc$response.forecast.coh[,'forecast']), reserve.clmplus=a.model$reserve ) ``` Our method is able to replicate the chain-ladder results with no need to add the cohort component. ```{r fitted ax amodel} a.model.fit$fitted_effects ``` Further inspection can be performed with the `clmplus` package, which provides the graphical tools to inspect the fitted effects. Observe we model the rate in continuous time, the choice of a line plot is then consistent. ```{r plot effects ax, message=FALSE, warning=FALSE} plot(a.model) ``` # The benefitial effect of adding the cohort component It is straightforward to state that from the statistical perspective it is desirable to have a model with less parameters. Nevertheless, our approach goes far beyond that. By adding the cohort effect we are able to improve our modeling. We show these results by inspecting the residuals plots. ```{r amodel residuals} #make it triangular plot(a.model.fit) ``` Clearly, the red and blue areas suggest some trends that the model wasn't able to catch. Consider now the age-cohort model and its residuals plot. ```{r message=FALSE, warning=FALSE} ac.model.fit <- clmplus(rtt, hazard.model="ac") ac.model <- predict(ac.model.fit, gk.fc.model='a') plot(ac.model.fit) ``` With no need of extrapolating a period component, we were able to improve the fit already. Pay attention, the cohort component for the cohort `m` is extrapolated. ```{r message=FALSE, warning=FALSE} plot(ac.model) ``` ## Extrapolation of a period component Similarly, it is possible to add a period component and choose an age-period model or an age-period-cohort model. ```{r apapc models, message=FALSE, warning=FALSE} ap.model.fit = clmplus(rtt, hazard.model = "ap") ap.model<-predict(ap.model.fit, ckj.fc.model='a', ckj.order = c(0,1,0)) apc.model.fit = clmplus(rtt,hazard.model = "apc") apc.model<-predict(apc.model.fit, gk.fc.model='a', ckj.fc.model='a', gk.order = c(1,1,0), ckj.order = c(0,1,0)) ``` ```{r residuals apmodel} plot(ap.model.fit) ``` It can be seen that the age-period model does not suggest any serious improvement from the age-cohort model. It is worth noticing one more time that the age-cohort model does not require any extrapolation. In a similar fashion, we plot the age-period-cohort model below, which seems to lead us to a small improvement. ```{r residuals apcmodel} plot(apc.model.fit) ``` Below, the effects of the age-period-cohort model. ```{r apc effects, message=FALSE, warning=FALSE} plot(apc.model) ``` # Conclusions In this vignette we wanted to show the flexibility of our modeling approach with respect to the well-known chain-ladder model. * By modeling the hazard rate we are able to replicate the chain-ladder results with less parameters. Indeed, we model the age component directly and add a cohort effect if needed. * Going from an age model to an age-cohort model may lead to a serious improvement in the model results.