Access the full text.
Sign up today, get DeepDyve free for 14 days.
1IntroductionPremium tariffs have long been used to reference general insurance premiums according to insured’s attribute information. Tariff theory has developed along with modern statistical theory. Generalized linear models (GLMs) by Nelder and Wedderburn (1972) can model expected claim frequency and severity for each risk profile and estimate them from past claims data. Many rating factors can be incorporated into GLMs, and the optimal model can be chosen via model selection methods such as cross validation. However, some rating factors, such as the insured’s age, address and occupation, often have too many categories to obtain reliable estimates for their individual categories. In addition, complex tariffs with many categories may increase the operational risk of applying incorrect premiums. Hence, rating categories are often grouped into fewer categories which have similar risk levels in practice.However, finding the optimal grouping of the rating categories based on a model selection criterion is difficult, because of the immense computational workload required to process almost infinite number of grouping combinations. In such cases, rating categories have been grouped based on simplicity, sales strategies, and actuarial decision-making. Some studies have applied clustering methods to reduce the rating factors and categories (e.g. Pelessoni and Picech 1998; Guo 2003; Sanche and Lonergan 2006; Yao et al. 2016). Nonetheless, most of them separate the inference and clustering procedures, which would not provide solutions satisfying both the inference criterion and the clustering criterion simultaneously.In recent years, sparse regularization techniques, originating from the least absolute shrinkage and selection operator (lasso) by Tibshirani (1996), have been developed to enable fast variable selection when processing big data. Particularly, the fused lasso by Tibshirani et al. (2005) and its extensions are useful for automatically integrating the categories of factors by optimizing an objective function with L1 regularization terms for the differences between the regression coefficients on adjacent categories. Fujita et al. (2020) implemented the one-dimensional fused lasso for GLMs by introducing dummy variables with the ordinary lasso which indicate whether the category in each data comes before or after some change-points. Devriendt et al. (2021) proposed an efficient algorithm for sparse regression with multi-type regularization terms including the lasso and fused lasso with an application to insurance pricing analytics. Bleakley and Vert (2011) proposed the group fused lasso on a line to detect multiple change-points in multi-task learning and Alaíz, Barbero, and Dorronsoro (2013) generalized that approach to the group fused lasso on general adjacent graphs with an efficient algorithm. Nomura (2017) used the group fused lasso to integrate rating categories consistently between expected claim frequency and expected claim severity, which are modeled separately using GLMs.In this paper, we enhance the group fused lasso used in Nomura (2017) by imposing ordinal constraints on the regression coefficients in the GLMs to meet practical requirements such as the bonus–malus system in automobile insurance. The optimization problem for parameter inference can be solved by the alternating direction multiplier method (ADMM), which is modified from the one in Nomura (2017) to satisfy the ordinal constraints. Interaction of variables with the ordinal constraints can also be incorporated into our model and solved by the modified ADMM.The remainder of this article is organized as follows; The GLMs for claim frequency and claim severity are introduced in Section 2. The group fused lasso and the ADMM as its optimization algorithm proposed in Nomura (2017) are presented in Sections 3 and 4, respectively. The ordinal constraints on the group fused lasso and the modified ADMM are proposed in Section 5. An application of the proposed methods to motorcycle insurance data is presented in Section 6. The conclusion is finally presented in Section 7.2Generalized Linear Models for Claim Frequency and SeverityThis section introduces the generalized linear models for insurance pricing as the fundamental models used in this study. Consider p rating factors whose numbers of categories are n1, …, np, respectively. There are T policies or a group of policies with the same factor categories. Let xt1, …, xtpdenote the categories to which the tth policy or group of policies belongs. A generalized linear model (GLM) is an extended version of an ordinary linear regression model that can handle probability distributions in the exponential dispersion model and nonlinear link functions. In the exponential dispersion model, the probability mass functions or probability density functions of the observations a1, …, aThave a common form expressed as(1)f(at;θt,dt,ϕ)=expatθt−b(θt)ϕ/dt+c(at,dt,ϕ),t=1,…,T,$$f\left({a}_{t};{\theta }_{t},{d}_{t},\phi \right)=\mathrm{exp}\left\{\frac{{a}_{t}{\theta }_{t}-b\left({\theta }_{t}\right)}{\phi /{d}_{t}}+c\left({a}_{t},{d}_{t},\phi \right)\right\},\quad t=1,\dots ,T,$$where θtdenotes the parameter related to the mean of the observation atof the tth policy and ϕ is the dispersion parameter related to the variance of all the observation a1, …, aTacross the policies. Moreover, dtis the weight assigned to the tth policy and affects the variance of the observation at. The function b(θt) is assumed to be twice-differentiable and the function c(at, dt, ϕ) is a normalization constant that makes the sum of probabilities equal to one irrespective of the value of θt. The mean and variance of atare expressed using the first-order derivative b′ and second-order derivative b″ of the function b as follows:(2)μt=E(at)=b′(θt),Var(at)=ϕdtb′′(θt).$${\mu }_{t}=\mathrm{E}\left({a}_{t}\right)={b}^{\prime }\left({\theta }_{t}\right),\quad \mathrm{V}\mathrm{a}\mathrm{r}\left({a}_{t}\right)=\frac{\phi }{{d}_{t}}{b}^{\prime \prime }\left({\theta }_{t}\right).$$Ratemaking in general insurance often involves estimating the expected claim frequency and expected claim severity, separately, rather than estimating the expected total claim cost as the pure premium directly. Therefore, we next introduce the exponential dispersion models for claim frequency and claim severity, respectively.The Poisson distribution with the following probability mass function is often applied to the number of claims per policy:(3)f1(zt;μt(1),wt)=(wtμt(1))ztzt!e−wtμt(1),zt=0,1,…,$${f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)},{w}_{t}\right)=\frac{{\left({w}_{t}{\mu }_{t}^{\left(1\right)}\right)}^{{z}_{t}}}{{z}_{t}!}{e}^{-{w}_{t}{\mu }_{t}^{\left(1\right)}},\quad {z}_{t}=0,1,\dots ,$$where ztand wtare the number of claims and the exposure of the tth policy, respectively. Then, it holds that E(zt)=Var(zt)=wtμt(1)$E\left({z}_{t}\right)=\mathrm{V}\mathrm{a}\mathrm{r}\left({z}_{t}\right)={w}_{t}{\mu }_{t}^{\left(1\right)}$and hence μt(1)${\mu }_{t}^{\left(1\right)}$represents the expected claim frequency per exposure. Although the Poisson distribution itself does not belong to the exponential dispersion model, the probability distribution of the claim frequency zt/wtper exposure becomes the relative Poisson distribution which belongs to the exponential dispersion model (1) with θt(μt(1))=logμt(1)${\theta }_{t}\left({\mu }_{t}^{\left(1\right)}\right)=\mathrm{log}{\mu }_{t}^{\left(1\right)}$and ϕ = 1.Given the number of claims ztfrom the tth policy, the gamma distribution with the following probability density function can be fitted to the claim severity:(4)f2(yt;μt(2),zt,ϕ)=1ytΓ(zt/ϕ)ytztμt(2)ϕzt/ϕexp−ytztμt(2)ϕ,yt>0,$${f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)},{z}_{t},\phi \right)=\frac{1}{{y}_{t}{\Gamma}\left({z}_{t}/\phi \right)}{\left(\frac{{y}_{t}{z}_{t}}{{\mu }_{t}^{\left(2\right)}\phi }\right)}^{{z}_{t}/\phi }\mathrm{exp}\left(-\frac{{y}_{t}{z}_{t}}{{\mu }_{t}^{\left(2\right)}\phi }\right),\quad {y}_{t}{ >}0,$$where ytis the mean severity of the claims from the tth policy. Then, we have E(yt)=μt(2)$E\left({y}_{t}\right)={\mu }_{t}^{\left(2\right)}$and Var(yt)=ϕμt(2)2/zt$\mathrm{V}\mathrm{a}\mathrm{r}\left({y}_{t}\right)=\phi {\left({\mu }_{t}^{\left(2\right)}\right)}^{2}/{z}_{t}$. The gamma distribution belongs to the exponential dispersion model (1) with wt= ztand θt(μt(2))=−1/μt(2)${\theta }_{t}\left({\mu }_{t}^{\left(2\right)}\right)=-1/{\mu }_{t}^{\left(2\right)}$. Thus, the product of the expected claim frequency μt(1)${\mu }_{t}^{\left(1\right)}$and the expected claim severity μt(2)${\mu }_{t}^{\left(2\right)}$in (3), (4) provides the expected total claim cost, i.e. the pure premium, of the tth policy.Let xtit = 1, …, T, i = 1, …, p denote the category of the ith factor to which the tth policy belongs. In a generalized linear model, the mean parameter μtfound in (3) and (4) is formulated by:(5)g(μt)=β0+β1xt1+⋯+βpxtp,t=1,…,T,$$g\left({\mu }_{t}\right)={\beta }_{0}+{\beta }_{1{x}_{t1}}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}},\quad t=1,\dots ,T,$$where β0 is the intercept and βijis the regression coefficient for the jth category of the ith factor. Note that each factor has one reference category whose regression coefficient is fixed at zero. The function g is a differentiable monotonic function called a link function. When the link function is an identity function g(y) ≡ y, the mean parameter μtis just the right side of Equation (5). In contrast, when the link function is a logarithmic function g(y) = logy, the mean parameter μtis formulated by(6)μt=exp(β0)×exp(β1xt1)×⋯×exp(βpxtp),t=1,…,T.$${\mu }_{t}=\mathrm{exp}\left({\beta }_{0}\right){\times}\mathrm{exp}\left({\beta }_{1{x}_{t1}}\right){\times}\cdots {\times}\mathrm{exp}\left({\beta }_{p{x}_{\mathrm{tp}}}\right),\quad t=1,\dots ,T.$$In this case, exp(βij) represents the relative risk of the jth category from the reference category in the ith factor.The parameters in GLMs are typically estimated via the maximum likelihood method. Let β=(β0,β11,…,β1n1,…,βp1,…,βpnp)$\boldsymbol{\beta }=\left({\beta }_{0},{\beta }_{11},\dots ,{\beta }_{1{n}_{1}},\dots ,{\beta }_{p1},\dots ,{\beta }_{p{n}_{p}}\right)$denote the set of regression coefficients including the intercept. Then, the log-likelihood of the parameter set (β, ϕ) for the exponential dispersion model (1) is defined by(7)logL(β,ϕ;a1,…,aT,d1,…,dT)=∑t=1Tlogf(at;θt(β),dt,ϕ),$$\mathrm{log}L\left(\boldsymbol{\beta },\phi ;{a}_{1},\dots ,{a}_{T},{d}_{1},\dots ,{d}_{T}\right)=\sum _{t=1}^{T}\mathrm{log}f\left({a}_{t};{\theta }_{t}\left(\boldsymbol{\beta }\right),{d}_{t},\phi \right),$$where θt(β)=b′−1(μt)=b′−1◦g−1(β0+β1xt1+⋯+βpxtp)${\theta }_{t}\left(\boldsymbol{\beta }\right)={b}^{\prime -1}\left({\mu }_{t}\right)={b}^{\prime -1}{\circ}{g}^{-1}\left({\beta }_{0}+{\beta }_{1{x}_{t1}}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}\right)$. The maximum likelihood estimate (β̂,ϕ̂)$\left(\hat{\boldsymbol{\beta }},\hat{\phi }\right)$of the parameter set (β, ϕ) is given by(8)(β̂,ϕ̂)=argmin(β,ϕ)−logL(β,ϕ;a1,…,aT,d1,…,dT)=argmin(β,ϕ)−∑t=1Tlogf(at;θt(β),dt,ϕ).\begin{align}\hfill \left(\hat{\boldsymbol{\beta }},\hat{\phi }\right)& ={\text{arg⁡min}}_{\left(\boldsymbol{\beta },\phi \right)}-\mathrm{log}L\left(\boldsymbol{\beta },\phi ;{a}_{1},\dots ,{a}_{T},{d}_{1},\dots ,{d}_{T}\right)\hfill \\ \hfill & ={\text{arg⁡min}}_{\left(\boldsymbol{\beta },\phi \right)}-\sum _{t=1}^{T}\mathrm{log}f\left({a}_{t};{\theta }_{t}\left(\boldsymbol{\beta }\right),{d}_{t},\phi \right).\hfill \end{align}Particularly, the maximum likelihood estimate β̂$\hat{\boldsymbol{\beta }}$of the regression coefficients βcan be obtained by the following simple formula, irrespective of the value of ϕ:(9)β̂=argminβ∑t=1Tdt{−atθt(β)+b(θt(β))}.$$\hat{\boldsymbol{\beta }}={\text{arg⁡min}}_{\boldsymbol{\beta }}\sum _{t=1}^{T}{d}_{t}\left\{-{a}_{t}{\theta }_{t}\left(\boldsymbol{\beta }\right)+b\left({\theta }_{t}\left(\boldsymbol{\beta }\right)\right)\right\}.$$Therefore, we can first estimate the regression coefficients βby (9) and then estimate the dispersion parameter ϕ by optimizing the log-likelihood (7). Regarding the Poisson distribution (3), the dispersion parameter ϕ is fixed at one, as mentioned above, and the regression coefficients βare estimated by(10)β̂(1)=argminβ(1)∑t=1Twtμt(1)−ztlogμt(1)=argminβ(1)∑t=1Twtg−1(β0(1)+β1xt1(1)+⋯+βpxtp(1)) −ztlogg−1(β0(1)+β1xt1(1)+⋯+βpxtp(1)).\begin{align}\hfill {\hat{\boldsymbol{\beta }}}^{\left(1\right)}& ={\text{arg⁡min}}_{{\boldsymbol{\beta }}^{\left(1\right)}}\sum _{t=1}^{T}\left\{{w}_{t}{\mu }_{t}^{\left(1\right)}-{z}_{t}\mathrm{log}{\mu }_{t}^{\left(1\right)}\right\}\hfill \\ \hfill & ={\text{arg⁡min}}_{{\boldsymbol{\beta }}^{\left(1\right)}}\sum _{t=1}^{T}\left\{{w}_{t}{g}^{-1}\left({\beta }_{0}^{\left(1\right)}+{\beta }_{1{x}_{t1}}^{\left(1\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(1\right)}\right)\right.\hfill \\ \hfill & \quad \left.-{z}_{t}\mathrm{log}{g}^{-1}\left({\beta }_{0}^{\left(1\right)}+{\beta }_{1{x}_{t1}}^{\left(1\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(1\right)}\right)\right\}.\hfill \end{align}Regarding the gamma distribution (4), the estimates (9) of the regression coefficients become(11)β̂(2)=argminβ(2)∑t=1Tztytμt(2)+ztlogμt(2)=argminβ(2)∑t=1Tztytg−1(β0(2)+β1xt1(2)+⋯+βpxtp(2)) +ztlogg−1(β0(2)+β1xt1(2)+⋯+βpxtp(2)).\begin{align}\hfill {\hat{\boldsymbol{\beta }}}^{\left(2\right)}& ={\text{arg⁡min}}_{{\boldsymbol{\beta }}^{\left(2\right)}}\sum _{t=1}^{T}\left\{\frac{{z}_{t}{y}_{t}}{{\mu }_{t}^{\left(2\right)}}+{z}_{t}\mathrm{log}{\mu }_{t}^{\left(2\right)}\right\}\hfill \\ \hfill & ={\text{arg⁡min}}_{{\boldsymbol{\beta }}^{\left(2\right)}}\sum _{t=1}^{T}\left\{\frac{{z}_{t}{y}_{t}}{{g}^{-1}\left({\beta }_{0}^{\left(2\right)}+{\beta }_{1{x}_{t1}}^{\left(2\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(2\right)}\right)}\right.\hfill \\ \hfill & \quad \left.+{z}_{t}\mathrm{log}{g}^{-1}\left({\beta }_{0}^{\left(2\right)}+{\beta }_{1{x}_{t1}}^{\left(2\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(2\right)}\right)\right\}.\hfill \end{align}These optimization problems can be solved quickly through optimization methods, which are typically gradient-based methods such as Newton’s method. If we apply the logarithmic link function g(y) = log(y), the objective functions in (10) and (11) become strictly convex functions, each of which has a unique local minimum that can be obtained by ordinary optimization methods.3Automatic Segmentation of Rating Categories via the Group Fused LassoWe have introduced the GLMs to estimate expected claim frequency and expected claim severity from claim data. However, when using rating factors with so many categories or interactions of the factors, we have so many regression coefficients that their estimates might have large errors. In such cases, categories with similar risk levels are often integrated into groups in practice. Thus, in this section, we introduce the group fused lasso proposed by Bleakley and Vert (2011) to facilitate automatic segmentation of categories in rating factors.For sake of simplicity, let the first factor consist of a large number of categories V = {1, …, n1} which need to be integrated into fewer groups for ratemaking. We define a set of pairs of adjacent categories E = {e1, …, em} ⊆ V × V as candidates of pairs to be integrated. The set (V, E) is often referred as an undirected graph where V is a set of vertices and E is a set of edges. The fused lasso on the graph (V, E) is a regularization technique to estimate the regression coefficients by solving the following optimization problem:(12)min(β,ϕ)∑t=1Tq(β,ϕ;yt,wt)+κ∑(u,v)∈E|β1u−β1v|,$$\underset{\left(\boldsymbol{\beta },\phi \right)}{\mathrm{min}}\sum _{t=1}^{T}q\left(\boldsymbol{\beta },\phi ;{y}_{t},{w}_{t}\right)+\kappa \sum _{\left(u,v\right)\in E}\vert {\beta }_{1u}-{\beta }_{1v}\vert ,$$where the first term q(β, ϕ; yt, wt) is a loss function, which becomes the negative log-likelihood function in GLMs. The second term is the L1 regularization term on the differences between the pairs of coefficients (β1u, β1v) of adjacent categories (u, v) ∈ E, which encourages the coefficients β1u, β1vto have similar or even the same values. The categories whose regression coefficients are estimated at the same value are regarded as a group in rating-class segmentation. The weight κ on the second term is called a regularization parameter, and adjusts the impact of the regularization term.Using the fused lasso (12), expected claim frequency and expected claim severity are estimated separately and hence would have different groupings. Since the pure premium is obtained by the product of expected claim frequency and expected claim severity, it is more desirable to determine grouping of rating classes consistently between expected claim frequency and expected claim severity. Therefore, we introduce the group fused lasso to estimate expected claim frequency and expected claim severity simultaneously by solving the following optimization problem:(13)min(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+κ∑(u,v)∈E‖β1u−β1v‖2,\begin{align}\hfill & \underset{\left(\boldsymbol{\beta },\phi \right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \qquad \quad +\kappa \sum _{\left(u,v\right)\in E}{\Vert}{\boldsymbol{\beta }}_{1u}-{\boldsymbol{\beta }}_{1v}{{\Vert}}_{2},\hfill \end{align}where f1 and f2 are the probability mass function (3) of the Poisson distribution and the probability density function (4) of the gamma distribution, respectively. The loss function in Equation (13) is a negative log-likelihood for the joint distribution of the number of claims ztand claim severity yt. The regression coefficients β(1) and β(2) are involved with the expected claim frequency μt(1)(β(1))=g−1(β0(1)+β1xt1(1)+⋯+βpxtp(1))${\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right)={g}^{-1}\left({\beta }_{0}^{\left(1\right)}+{\beta }_{1{x}_{t1}}^{\left(1\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(1\right)}\right)$and the expected claim severity β(2)=g−1(β0(2)+β1xt1(2)+⋯+βpxtp(2))${\boldsymbol{\beta }}^{\left(2\right)}={g}^{-1}\left({\beta }_{0}^{\left(2\right)}+{\beta }_{1{x}_{t1}}^{\left(2\right)}+\cdots +{\beta }_{p{x}_{\mathrm{tp}}}^{\left(2\right)}\right)$, respectively, and combined into β= (β(1), β(2)). We also concatenate the same components of β(1) and β(2) for the category u of the first factor into β1u=(β1u(1),β1u(2))${\boldsymbol{\beta }}_{1u}=\left({\beta }_{1u}^{\left(1\right)},{\beta }_{1u}^{\left(2\right)}\right)$and introduce a regularization term ‖β1u−β1v‖2=(β1u(1)−β1v(1))2+(β1u(2)−β1v(2))2${\Vert}{\boldsymbol{\beta }}_{1u}-{\boldsymbol{\beta }}_{1v}{{\Vert}}_{2}=\sqrt{{\left({\beta }_{1u}^{\left(1\right)}-{\beta }_{1v}^{\left(1\right)}\right)}^{2}+{\left({\beta }_{1u}^{\left(2\right)}-{\beta }_{1v}^{\left(2\right)}\right)}^{2}}$for them to encourage the differences of coefficient pairs β1u=(β1u(1),β1u(2))${\boldsymbol{\beta }}_{1u}=\left({\beta }_{1u}^{\left(1\right)},{\beta }_{1u}^{\left(2\right)}\right)$and β1v=(β1v(1),β1v(2))${\boldsymbol{\beta }}_{1v}=\left({\beta }_{1v}^{\left(1\right)},{\beta }_{1v}^{\left(2\right)}\right)$to be zero simultaneously. Thus, we can determine the segmentation of rating categories simultaneously for the expected claim frequency and expected claim severity. The value of the regularization parameter κ is typically selected from discretized candidate values by cross validation methods. In N-fold cross validation, all the data (policies) are partitioned into N groups T1,…,TN⊂{1,…,T}${\mathcal{T}}_{1},\dots ,{\mathcal{T}}_{N}\subset \left\{1,\dots ,T\right\}$. For k = 1, …, N, we obtain the estimates β̂−k${\hat{\boldsymbol{\beta }}}_{-k}$, ϕ̂−k${\hat{\phi }}_{-k}$by (13) from all the data except for those in the kth group Tk${\mathcal{T}}_{k}$and fit them to the kth group Tk${\mathcal{T}}_{k}$to evaluate the validation error. As for the validation error, although we can use the sum of negative log-likelihoods for the observed claim frequency ztand claim severity yt, we adopt the negative log-likelihood of the observed total claim cost st= ytztgiven by(14)Validationerror=∑k=1N∑t∈Tk−logfs(st;β̂−k,ϕ̂−k,wt),$$\text{Validation⁡error}=\sum _{k=1}^{N}\sum _{t\in {\mathcal{T}}_{k}}-\mathrm{log}{f}_{s}\left({s}_{t};{\hat{\boldsymbol{\beta }}}_{-k},{\hat{\phi }}_{-k},{w}_{t}\right),$$where(15)fs(st;β̂−k,ϕ̂−k,wt)=f1(0;μt(1)(β̂−k(1)),wt) if st=0,∑z=1∞f1(z;μt(1)(β̂−k(1)),wt)f2(st/z;μt(2)(β̂−k(2)),z,ϕ̂−k)z otherwise.\begin{align}\hfill & {f}_{s}\left({s}_{t};{\hat{\boldsymbol{\beta }}}_{-k},{\hat{\phi }}_{-k},{w}_{t}\right)\hfill \\ \hfill & \qquad =\begin{cases}_{1}\left(0;{\mu }_{t}^{\left(1\right)}\left({\hat{\boldsymbol{\beta }}}_{-k}^{\left(1\right)}\right),{w}_{t}\right)\quad \hfill & \text{if\,}{s}_{t}=0,\hfill \\ \sum _{z=1}^{\infty }\left\{{f}_{1}\left(z;{\mu }_{t}^{\left(1\right)}\left({\hat{\boldsymbol{\beta }}}_{-k}^{\left(1\right)}\right),{w}_{t}\right)\frac{{f}_{2}\left({s}_{t}/z;{\mu }_{t}^{\left(2\right)}\left({\hat{\boldsymbol{\beta }}}_{-k}^{\left(2\right)}\right),z,{\hat{\phi }}_{-k}\right)}{z}\right\}\quad \hfill & \text{otherwise}.\hfill \end{cases}\hfill \end{align}This probability distribution of the total claim cost st= ytztis a compound Poisson distribution with the gamma distribution (4) and known as the Tweedie distribution proposed by Tweedie (1984). By using the Tweedie distribution, we intend to evaluate predictive performance for the total claim costs directly. Finally, the value which minimizes the validation error is selected for the regularization parameter κ.4Optimization Algorithm for Group Fused LassoThis section describes the algorithm to solve the optimization problem (13) introduced in the previous section. The optimization problem (8) without regularization terms can be quickly solved by using gradient-based methods such as Newton’s method and its variants. However, gradient-based methods are not applicable to the objective function in (13) whose gradients do not exist if one of the regularization terms takes exact zero. Some optimization algorithms have been proposed for the group fused lasso; the block coordinate descent method by Bleakley and Vert (2011), the alternating direction method of multipliers (ADMM) by Wahlberg et al. (2012), and the active set projected Newton method by Wytock, Sra, and Kolter (2014). The block coordinate descent method is only applicable to group fused lasso on a chain graph which can be reduced into an ordinary grouped lasso. The active set projected Newton method proposed can achieve solutions quickly for the group fused lasso on a general graph, but is difficult to apply to general loss functions other than the residual sum of squares. In contrast, the ADMM is highly versatile and can be applied to general convex loss functions and group fused lasso on general graphs. Thus, we introduce the ADMM to solve (13).The ADMM is an optimization method that extends the Lagrange multiplier method, in which augmented Lagrangian terms are added to the objective function. Before introducing the augmented Lagrangian, we rewrite the optimization problem (13) into the following equivalent constrained optimization problem:(16)min(β,ϕ,ξ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+κ∑l=1m‖ξl‖2,s.t.ξl=β1el1−β1el2,l=1,…,m,$$\begin{gathered}{c}\underset{\left(\boldsymbol{\beta },\phi ,\boldsymbol{\xi }\right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}+\kappa \sum _{l=1}^{m}{\Vert}{\boldsymbol{\xi }}_{l}{{\Vert}}_{2},\\ \text{s.t.⁡}{\boldsymbol{\xi }}_{l}={\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}},\qquad l=1,\dots ,m,\end{gathered}$$where ξl=(ξl(1),ξl(2))${\boldsymbol{\xi }}_{l}=\left({\xi }_{l}^{\left(1\right)},{\xi }_{l}^{\left(2\right)}\right)$is a two-dimensional dummy variable, which have to coincide with the difference between β1el1=(β1el1(1),β1el1(2))${\boldsymbol{\beta }}_{1{e}_{l1}}=\left({\beta }_{1{e}_{l1}}^{\left(1\right)},{\beta }_{1{e}_{l1}}^{\left(2\right)}\right)$and β1el2=(β1el2(1),β1el2(2))${\boldsymbol{\beta }}_{1{e}_{l2}}=\left({\beta }_{1{e}_{l2}}^{\left(1\right)},{\beta }_{1{e}_{l2}}^{\left(2\right)}\right)$, and el= (el1, el2) ∈ E ⊆ V × V is the lth edge in the undirected graph (V, E) on the categories of the first factor V = {1, …, n1}. We now introduce the augmented Lagrangian for solving the constrained optimization problem (16). The ordinary Lagrange multipliers method adds inner products −⟨β1el1−β1el2−ξl,λl⟩$-\langle {\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l},{\boldsymbol{\lambda }}_{l}\rangle $of the constraints β1el1−β1el2−ξl${\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l}$restricted to be zero and the Lagrange multipliers λl=(λl(1),λl(2))${\boldsymbol{\lambda }}_{l}=\left({\lambda }_{l}^{\left(1\right)},{\lambda }_{l}^{\left(2\right)}\right)$for l = 1, …, m, instead of removing the constraints in (16). In the augmented Lagrangian method, the L2 norms ρ2‖β1el1−β1el2−ξl‖22$\frac{\rho }{2}{\Vert}{\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}^{2}$with a common weight ρ/2 are further added to the objective function. Here we use ρ/2 instead of ρ because ρ/2 is used in most of the literatures using ADMM, including Wahlberg et al. (2012) and Wytock, Sra, and Kolter (2014). By optimizing the new objective function without constraints, we can obtain the same optimal solution as that of the constrained optimization problem (16). In the ADMM, the original parameter set (β, ϕ), the dummy variables ξ= (ξ1, …, ξm), and the Lagrange multipliers λ= (λ1, …, λm) are alternately optimized as follows:(17)(βnew,ϕnew)=argmin(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+∑l=1m−〈β1el1−β1el2−ξl,λl〉+ρ2‖β1el1−β1el2−ξl‖22,\begin{align}\hfill \left({\boldsymbol{\beta }}^{\text{new}},{\phi }^{\text{new}}\right)& ={\text{arg⁡min}}_{\left(\boldsymbol{\beta },\phi \right)}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad +\sum _{l=1}^{m}\left\{-\langle {\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l},{\boldsymbol{\lambda }}_{l}\rangle +\frac{\rho }{2}{\Vert}{\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}^{2}\right\},\hfill \end{align}(18)ξlnew=argminξlκ‖ξl‖2−〈β1el1new−β1el2new−ξl,λl〉+ρ2‖β1el1new−β1el2new−ξl‖22, l=1,…,m,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{l}^{\text{new}}={\text{arg⁡min}}_{{\boldsymbol{\xi }}_{l}} \kappa {\Vert}{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l},{\boldsymbol{\lambda }}_{l}\rangle +\frac{\rho }{2}{\Vert}{\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}^{2},\\ \quad l=1,\dots ,m,\end{gathered}$$(19)λlnew=λl−ρβ1el1new−β1el2new−ξlnew,l=1,…,m.\begin{align}\hfill {\boldsymbol{\lambda }}_{l}^{\text{new}}& ={\boldsymbol{\lambda }}_{l}-\rho \left({\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l}^{\text{new}}\right),\quad l=1,\dots ,m.\hfill \end{align}The regularization terms κ∑l=1m‖ξl‖2$\kappa {\sum }_{l=1}^{m}{\Vert}{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}$including none of (β, ϕ) are ignored in (17), while the loss functions including none of ξare ignored in (18). The objective function in (17) is differentiable and hence gradient-based methods can be used to obtain the optimal solution of (17) quickly. The solution of the optimization problem in (18) can be analytically obtained by(20)ξlnew=1−κ‖ηl‖2ηlρ if ‖ηl‖2>κ,0,if ‖ηl‖2≤κ, l=1,…,m.$${\boldsymbol{\xi }}_{l}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{l}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{l}}{\rho }\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{l}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{l}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad l=1,\dots ,m.$$where ηl=ρ(β1el1new−β1el2new)−λl${\boldsymbol{\eta }}_{l}=\rho \left({\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{l}$. In (19), the constant ρ adjusts the step size of updating λ= (λ1, …, λm). By updating the values of the parameters into (βnew, ϕnew, ξnew, λnew) repeatedly, they will converge to the optimal solution.5Group Fused Lasso under Ordinal ConstraintsWe have introduced the GLMs with the group fused lasso proposed in Nomura (2017) to estimate expected claim costs for automatically grouped rating classes. In practice, some ordinal constraints are often imposed to insurance premiums such as monotonic constraints on bonus–malus classes in automobile insurance. To obtain estimates that satisfy such constraints, we propose the group fused lasso for the GLMs under monotonic constraints and a modification of the ADMM given in the previous section.We inherit the notation in the previous sections and consider the following optimization problem for grouping expected claim frequency and expected claim severity simultaneously under ordinal constraints.(21)min(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ) +κ∑(u,v)∈E‖β1u−β1v‖2.s.t. β1el2−β1el1⪰0,l=1,…,m.\begin{align}\hfill \underset{\left(\boldsymbol{\beta },\phi \right)}{\mathrm{min}}& -\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad \qquad +\kappa \sum _{\left(u,v\right)\in E}{\Vert}{\boldsymbol{\beta }}_{1u}-{\boldsymbol{\beta }}_{1v}{{\Vert}}_{2}.\text{⁡s.t.}\quad {\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\beta }}_{1{e}_{l1}}{\succeq}0,\qquad l=1,\dots ,m.\hfill \end{align}The sign of inequality in constraints ≥ represents the inequality applied to each element, i.e., x≥ yfor x= (x1, …, xn) and y= (y1, …, yn) means xi≥ yi (i = 1, …, n). The optimization problem (21) can be solved by the ADMM constructed in a similar manner with that in the previous section. First, we rewrite the optimization problem (21) into the following equivalent optimization problem:(22)min(β,ϕ,ξ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+κ∑l=1m‖ξl‖2s.t. ξl=β1el2−β1el1⪰0,l=1,…,m.$$\begin{gathered}{c}\underset{\left(\boldsymbol{\beta },\phi ,\boldsymbol{\xi }\right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}+\kappa \sum _{l=1}^{m}{\Vert}{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}\\ \text{s.t.}\quad {\boldsymbol{\xi }}_{l}={\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\beta }}_{1{e}_{l1}}{\succeq}0,\qquad l=1,\dots ,m.\end{gathered}$$Then, the update equations of the ADMM to solve (22) can be obtained by adding the constraints ξl≥ 0 to those in the previous section:(23)(βnew,ϕnew)=argmin(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+∑l=1m−〈β1el1−β1el2−ξl,λl〉+ρ2‖β1el1−β1el2−ξl‖22,\begin{align}\hfill \left({\boldsymbol{\beta }}^{\text{new}},{\phi }^{\text{new}}\right)& ={\text{arg⁡min}}_{\left(\boldsymbol{\beta },\phi \right)}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad +\sum _{l=1}^{m}\left\{-\langle {\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l},{\boldsymbol{\lambda }}_{l}\rangle +\frac{\rho }{2}{\Vert}{\boldsymbol{\beta }}_{1{e}_{l1}}-{\boldsymbol{\beta }}_{1{e}_{l2}}-{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}^{2}\right\},\hfill \end{align}(24)ξlnew=argminξl⪰0κ‖ξl‖2−〈β1el1new−β1el2new−ξl,λl〉+ρ2‖β1el1new−β1el2new−ξl‖22, l=1,…,m,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{l}^{\text{new}}={\text{arg⁡min}}_{{\boldsymbol{\xi }}_{l}{\succeq}0} \kappa {\Vert}{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l},{\boldsymbol{\lambda }}_{l}\rangle +\frac{\rho }{2}{\Vert}{\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l}{{\Vert}}_{2}^{2},\\ \quad l=1,\dots ,m,\end{gathered}$$(25)λlnew=λl−ρβ1el1new−β1el2new−ξlnew,l=1,…,m.\begin{align}\hfill & {\boldsymbol{\lambda }}_{l}^{\text{new}}={\boldsymbol{\lambda }}_{l}-\rho \left({\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}-{\boldsymbol{\xi }}_{l}^{\text{new}}\right),\quad l=1,\dots ,m.\hfill \end{align}The objective function in (23) is the same as that in (17) and hence gradient-based methods can be used. The solution of the optimization problem in (24) can be analytically obtained by(26)ξlnew=1−κ‖ηl+‖2ηl+ρ if ‖ηl+‖2>κ,0,if ‖ηl+‖2≤κ, l=1,…,m,$${\boldsymbol{\xi }}_{l}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{l}^{+}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{l}^{+}}{\rho }\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{l}^{+}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{l}^{+}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad l=1,\dots ,m,$$where ηl+=(maxηl(1),0,maxηl(2),0)${\boldsymbol{\eta }}_{l}^{+}=\left(\mathrm{max}\left\{{\eta }_{l}^{\left(1\right)},0\right\},\mathrm{max}\left\{{\eta }_{l}^{\left(2\right)},0\right\}\right)$denotes the element-wise positive part of ηl=(ηl(1),ηl(2))=ρ(β1el1new−β1el2new)−λl${\boldsymbol{\eta }}_{l}=\left({\eta }_{l}^{\left(1\right)},{\eta }_{l}^{\left(2\right)}\right)=\rho \left({\boldsymbol{\beta }}_{1{e}_{l1}}^{\text{new}}-{\boldsymbol{\beta }}_{1{e}_{l2}}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{l}$. In (25), the constant ρ adjusts the step size in updating λ= (λ1, …, λm). By updating the values of parameters into (βnew, ϕnew, ξnew, λnew) repeatedly, they will converge to the optimal solution.6Application to Motorcycle Insurance DataIn this section, we apply the proposed method to claim data of the Swedish motorcycle insurance in Ohlsson and Johansson (2010). The data contain attribute information, exposure, number of claims, and total claim cost of each policy. We used the following variables in the dataset:–The owner’s age, between 0 and 99.–The EV-rate class classified by so called the EV ratio (=engine output (kW) ÷ (vehicle weight (kg) + 75) × 100).Class 1: EV ratio −5, Class 2: 6–8, Class 3: 9–12, Class 4: 13–15, Class 5: 16–19, Class 6: 20–24, Class 7: 25–.–The city-size class classified by the scale and location of cities and towns.Class 1: central and semi-central parts of Sweden’s three largest cities, Class 2: suburbs plus middle-sized cities, Class 3: lesser towns (except those in 5 or 7), Class 4: small towns and countryside (except 5–7), Class 5: Northern towns, Class 6: Northern countryside, Class 7: Gotland (Sweden’s largest island).–The Bonus–malus class taking values from 1 to 7. The class starts from 1 for a new driver, increases by 1 for each claim-free year, and decreases by 2 for each claim.–The Exposure or the number of policy years.–The number of claims.–The claim cost in Swedish Kronor.Table 1 shows the summary statistics aggregated by factor. Here, the claim frequency is calculated by dividing the number of claims by the exposure, and the claim severity is calculated by dividing the claim cost by the number of claims. As shown in Table 1, the claim frequency tends to be higher for younger owners, higher EV-rates (engine output), and larger cities. In contrast, the claim severity is relatively high for 20–59 year-old owners, the middle EV-rate (class 3), and large cities. Note that the claim frequency and severity do not always decline as the bonus–malus class increases.Table 1:Summary table of motorcycle insurance claim data aggregated by factor.FactorClassExposureNumber of claimsClaim frequencyClaim costClaim severityOwner’s age0–191247320.026353,88311,05920–3917,1413990.02310,855,50927,20740–5941,9112370.0065,448,98722,99260–994938290.006383,44113,222EV-rate15190460.009993,06221,58823990570.014883,13715,494321,6661660.0085,371,54332,359411,740980.0082,191,57822,363513,4401490.0113,297,11922,128688801750.0204,160,77623,776733160.018144,60524,101City-size162051830.0295,539,96330,273210,1031670.0174,811,16628,809311,6771230.0112,522,62820,509432,6281960.0063,774,62919,2585158290.006104,73911,63862800180.006288,04516,003724110.004650650Bonus–malus112,6571350.0112,914,08221,58627236720.0101,643,99022,83335151570.0111,749,70130,69744465640.0141,877,44129,33553771450.0121,297,57228,83564060430.0111,327,95530,883727,8962810.0106,231,07922,175Throughout this section, we fit the Poisson GLM (3) and the gamma GLM (4) to the number of claims ytand the claim severity zt(=the claim cost ÷ the number of claims), respectively, with log link g(μ) = logμ and p = 4 risk factors: the owner’s age xt1, EV-rate class xt2, city-size class xt3, and bonus–malus class xt4 of the tth policy. The owner’s age xt1 has n1 = 100 (0–99 years old) grades, whereas the other factors xt2, xt3, xt4 have n2 = n3 = n4 = 7 grades for each. In particular, we apply the group fused lasso on several types of underlying graphs with monotonic constraints for the EV-rate and Bonus–malus classes in the following subsections.6.1Group Fused Lasso for Single Factors with Monotonic ConstraintsFirst, we incorporated the group fused lasso for each factor into the GLMs and considered the following optimization problem to estimate the parameters:(27)min(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ) +κ∑i=14∑k=1ni−1‖βi,k+1−βi,k‖2,s.t. β2,k+1−β2,k⪰0,k=1,…,n2−1, β4,k+1−β4,k⪯0,k=1,…,n4−1,\begin{align}\hfill \begin{aligned}\hfill & \underset{\left(\boldsymbol{\beta },\phi \right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad \qquad +\kappa \sum _{i=1}^{4}\sum _{k=1}^{{n}_{i}-1}{\Vert}{\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}{{\Vert}}_{2},\hfill \\ \hfill & \qquad \qquad \text{s.t.}\quad {\boldsymbol{\beta }}_{2,k+1}-{\boldsymbol{\beta }}_{2,k}{\succeq}0,\qquad k=1,\dots ,{n}_{2}-1,\hfill \\ \hfill & \quad \quad \qquad \qquad {\boldsymbol{\beta }}_{4,k+1}-{\boldsymbol{\beta }}_{4,k}{\preceq}0,\qquad k=1,\dots ,{n}_{4}-1,\hfill \end{aligned}\end{align}where βi,k=βi,k(1),βi,k(2)${\boldsymbol{\beta }}_{i,k}=\left({\beta }_{i,k}^{\left(1\right)},{\beta }_{i,k}^{\left(2\right)}\right)$is the regression coefficients on the kth grade of the ith factor for expected claim frequency and expected claim severity, respectively. An adjacency graph (Vi, Ei) with vertex set Vi= {1, …, ni} and edge set Ei= {(k + 1, k)|k = 1, …, ni− 1} was applied in the group fused lasso for each factor i = 1, 2, 3, 4. Because the EV-rate classes are in an ascending order of EV rate, a higher claim cost is expected in the policy with the higher EV-rate class. Therefore, we imposed a monotonic constraint β2,1⪯β2,2⪯…⪯β2,n2${\boldsymbol{\beta }}_{2,1}{\preceq}{\boldsymbol{\beta }}_{2,2}{\preceq}\dots {\preceq}{\boldsymbol{\beta }}_{2,{n}_{2}}$on the EV-rate classes. We also introduced a monotonic constraint β4,1⪰β4,2⪰…⪰β4,n4${\boldsymbol{\beta }}_{4,1}{\succeq}{\boldsymbol{\beta }}_{4,2}{\succeq}\dots {\succeq}{\boldsymbol{\beta }}_{4,{n}_{4}}$on the bonus–malus classes since the bonus–malus class rises by no-claim periods and falls by claims. We illustrate the underlying graphs of the group fused lasso in Figure 1. Each pair of adjacent classes without constraints is connected by an undirected edge, whereas each pair of adjacent classes with an ordinal constrain is connected by a directed edge from the class with the smaller coefficient to the class with the larger coefficient.Figure 1:Underlying adjacent graphs of the group fused lasso for single factors.To solve the optimization problem (27), we first rewrite it into the following equivalent optimization problem:(28)min(β,ϕ,ξ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+κ∑i=14∑k=1ni−1‖ξi,k‖2,s.t. ξi,k=βi,k+1−βi,k,i=1,3,k=1,…,ni−1,ξ2,k=β2,k+1−β2,k⪰0,k=1,…,n2−1,ξ4,k=β4,k+1−β4,k⪯0,k=1,…,n4−1.$$\begin{aligned}\hfill & \underset{\left(\boldsymbol{\beta },\phi ,\boldsymbol{\xi }\right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \qquad \quad +\kappa \sum _{i=1}^{4}\sum _{k=1}^{{n}_{i}-1}{\Vert}{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2},\hfill \\ \hfill & \qquad \qquad \text{s.t.}\quad {\boldsymbol{\xi }}_{i,k}={\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k},\qquad i=1,3,\quad k=1,\dots ,{n}_{i}-1,\hfill \\ \hfill & \qquad \qquad \qquad {\boldsymbol{\xi }}_{2,k}={\boldsymbol{\beta }}_{2,k+1}-{\boldsymbol{\beta }}_{2,k}{\succeq}0,\qquad k=1,\dots ,{n}_{2}-1,\hfill \\ \hfill & \qquad \qquad \qquad {\boldsymbol{\xi }}_{4,k}={\boldsymbol{\beta }}_{4,k+1}-{\boldsymbol{\beta }}_{4,k}{\preceq}0,\qquad k=1,\dots ,{n}_{4}-1.\hfill \end{aligned}$$Then, the update equations to solve (28) are constructed in the same manner as in the previous sections and given by(29)(βnew,ϕnew)=argmin(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+∑i=14∑k=1ni−1−〈βi,k+1−βi,k−ξi,k,λi,k〉+ρi2‖βi,k+1−βi,k−ξi,k‖22,\begin{align}\hfill \left({\boldsymbol{\beta }}^{\text{new}},{\phi }^{\text{new}}\right)& =\underset{\left(\boldsymbol{\beta },\phi \right)}{\text{arg⁡min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad +\sum _{i=1}^{4}\sum _{k=1}^{{n}_{i}-1}\left\{-\langle {\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}-{\boldsymbol{\xi }}_{i,k},{\lambda }_{i,k}\rangle +\frac{{\rho }_{i}}{2}{\Vert}{\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}-{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2}^{2}\right\},\hfill \end{align}(30)ξi,knew=argminξi,kκ‖ξi,k‖2−〈βi,k+1new−βi,knew−ξi,k,λi,k〉+ρi2‖βi,k+1new−βi,knew−ξi,k‖22,i=1,3,k=1,…,ni−1,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{i,k}^{\text{new}}=\underset{{\boldsymbol{\xi }}_{i,k}}{\text{arg⁡min}} \kappa {\Vert}{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{i,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{i,k}^{\text{new}}-{\boldsymbol{\xi }}_{i,k},{\boldsymbol{\lambda }}_{i,k}\rangle +\frac{{\rho }_{i}}{2}{\Vert}{\boldsymbol{\beta }}_{i,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{i,k}^{\text{new}}-{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2}^{2},\\ i=1,3,\quad k=1,\dots ,{n}_{i}-1,\end{gathered}$$(31)ξ2,knew=argminξ2,k⪰0κ‖ξ2,k‖2−〈β2,k+1new−β2,knew−ξ2,k,λ2,k〉+ρ22‖β2,k+1new−β2,knew−ξ2,k‖22,k=1,…,n2−1,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{2,k}^{\text{new}}=\underset{{\boldsymbol{\xi }}_{2,k}{\succeq}0}{\text{arg⁡min}} \kappa {\Vert}{\boldsymbol{\xi }}_{2,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{2,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{2,k}^{\text{new}}-{\boldsymbol{\xi }}_{2,k},{\boldsymbol{\lambda }}_{2,k}\rangle +\frac{{\rho }_{2}}{2}{\Vert}{\boldsymbol{\beta }}_{2,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{2,k}^{\text{new}}-{\boldsymbol{\xi }}_{2,k}{{\Vert}}_{2}^{2},\\ k=1,\dots ,{n}_{2}-1,\end{gathered}$$(32)ξ4,knew=argminξ4,k⪯0κ‖ξ4,k‖2−〈β4,k+1new−β4,knew−ξ4,k,λ4,k〉+ρ42‖β4,k+1new−β4,knew−ξ4,k‖22,k=1,…,n4−1,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{4,k}^{\text{new}}=\underset{{\boldsymbol{\xi }}_{4,k}{\preceq}0}{\text{arg⁡min}} \kappa {\Vert}{\boldsymbol{\xi }}_{4,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{4,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{4,k}^{\text{new}}-{\boldsymbol{\xi }}_{4,k},{\boldsymbol{\lambda }}_{4,k}\rangle +\frac{{\rho }_{4}}{2}{\Vert}{\boldsymbol{\beta }}_{4,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{4,k}^{\text{new}}-{\boldsymbol{\xi }}_{4,k}{{\Vert}}_{2}^{2},\\ k=1,\dots ,{n}_{4}-1,\end{gathered}$$(33)λi,knew=λi,k−ρiβi,k+1new−βi,knew−ξi,knew,i=1,2,3,4,k=1,…,ni−1,$${\boldsymbol{\lambda }}_{i,k}^{\text{new}}={\boldsymbol{\lambda }}_{i,k}-{\rho }_{i}\left({\boldsymbol{\beta }}_{i,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{i,k}^{\text{new}}-{\boldsymbol{\xi }}_{i,k}^{\text{new}}\right),\quad i=1,2,3,4,\quad k=1,\dots ,{n}_{i}-1,$$where ρiis basically set at κ for i = 1, 2, 3, 4 but partially adjusted as ρ3 = max{10κ, 10} when κ < 10 to accelerate convergence to the optimal solution. The optimization in Equations (30)–(32) are analytically solved as described in the previous sections. First, the analytical solution of (30) is given by(34)ξi,knew=1−κ‖ηi,k‖2ηi,kρi if ‖ηi,k‖2>κ,0,if ‖ηi,k‖2≤κ, i=1,3,k=1,…,ni−1,$${\boldsymbol{\xi }}_{i,k}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{i,k}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{i,k}}{{\rho }_{i}}\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{i,k}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{i,k}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad i=1,3, k=1,\dots ,{n}_{i}-1,$$where ηi,k=ρi(βi,k+1new−βi,knew)−λi,k${\boldsymbol{\eta }}_{i,k}={\rho }_{i}\left({\boldsymbol{\beta }}_{i,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{i,k}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{i,k}$for i = 1, 3 and k = 1, …, ni. Second, the analytical solution of (31) is given by(35)ξ2,knew=1−κ‖η2,k+‖2η2,k+ρ2 if ‖η2,k+‖2>κ,0,if ‖η2,k+‖2≤κ, k=1,…,n2−1,$${\boldsymbol{\xi }}_{2,k}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{2,k}^{+}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{2,k}^{+}}{{\rho }_{2}}\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{2,k}^{+}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{2,k}^{+}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad k=1,\dots ,{n}_{2}-1,$$where η2,k+=(maxη2,k(1),0,maxη2,k(2),0)${\boldsymbol{\eta }}_{2,k}^{+}=\left(\mathrm{max}\left\{{\eta }_{2,k}^{\left(1\right)},0\right\},\mathrm{max}\left\{{\eta }_{2,k}^{\left(2\right)},0\right\}\right)$denotes the element-wise positive part of η2,k=(η2,k(1),η2,k(2))=ρ2(β2,k+1new−β2,knew)−λ2,k${\boldsymbol{\eta }}_{2,k}=\left({\eta }_{2,k}^{\left(1\right)},{\eta }_{2,k}^{\left(2\right)}\right)={\rho }_{2}\left({\boldsymbol{\beta }}_{2,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{2,k}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{2,k}$for k = 1, …, n2. Third, the analytical solution of (32) is given by(36)ξ4,knew=1−κ‖η4,k−‖2η4,k−ρ4 if ‖η4,k−‖2>κ,0,if ‖η4,k−‖2≤κ, k=1,…,n4−1,$${\boldsymbol{\xi }}_{4,k}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{4,k}^{-}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{4,k}^{-}}{{\rho }_{4}}\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{4,k}^{-}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{4,k}^{-}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad k=1,\dots ,{n}_{4}-1,$$where η4,k−=(minη4,k(1),0,minη4,k(2),0)${\boldsymbol{\eta }}_{4,k}^{-}=\left(\mathrm{min}\left\{{\eta }_{4,k}^{\left(1\right)},0\right\},\mathrm{min}\left\{{\eta }_{4,k}^{\left(2\right)},0\right\}\right)$denotes the element-wise negative part of η4,k=(η4,k(1),η4,k(2))=ρ4(β4,k+1new−β4,knew)−λ4,k${\boldsymbol{\eta }}_{4,k}=\left({\eta }_{4,k}^{\left(1\right)},{\eta }_{4,k}^{\left(2\right)}\right)={\rho }_{4}\left({\boldsymbol{\beta }}_{4,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{4,k}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{4,k}$for k = 1, …, n4.We selected the value of the regularization parameter κ by five-fold cross validation with the validation error (14), which is the negative log-likelihood of the total claim costs in the validation data, from 100 grid points κj=κ̄10−3j99${\kappa }_{j}=\bar{\kappa }1{0}^{-\frac{3j}{99}}$for j = 99, 98, …, 0 where κ̄$\bar{\kappa }$is the lowest limit of κ at which all the regression coefficients are estimated to be zero.The validation error for each candidate value of κ is shown in Figure 2 and takes a minimum value of 9462.0 when κ = 14.9 as indicated by the vertical dotted line. Therefore, we adopted κ = 14.9 and estimated the parameters from all the data. From the estimates β̂0=β̂0(1),β̂0(2)${\hat{\boldsymbol{\beta }}}_{0}=\left({\hat{\beta }}_{0}^{\left(1\right)},{\hat{\beta }}_{0}^{\left(2\right)}\right)$of the intercepts, we obtain the expected claim frequency exp(β̂0(1))=0.0087$\mathrm{exp}\left({\hat{\beta }}_{0}^{\left(1\right)}\right)=0.0087$(claim/year), expected claim severity exp(β̂0(2))=21021$\mathrm{exp}\left({\hat{\beta }}_{0}^{\left(2\right)}\right)=21021$(Krone/claim), and expected total claim cost (pure premium) exp(β̂0(1)+β̂0(2))=183$\mathrm{exp}\left({\hat{\beta }}_{0}^{\left(1\right)}+{\hat{\beta }}_{0}^{\left(2\right)}\right)=183$(Krone/year) for the policies belonging to the reference classes (owner’s age = 30, EV-rate class = 3, city-size class = 4, and bonus–malus class = 5). Moreover, from the estimates β̂i,k=β̂i,k(1),β̂i,k(2)${\hat{\boldsymbol{\beta }}}_{i,k}=\left({\hat{\beta }}_{i,k}^{\left(1\right)},{\hat{\beta }}_{i,k}^{\left(2\right)}\right)$of the regression coefficients, we calculated the relative expected claim frequency exp(β̂i,k(1))$\mathrm{exp}\left({\hat{\beta }}_{i,k}^{\left(1\right)}\right)$, relative expected claim severity exp(β̂i,k(2))$\mathrm{exp}\left({\hat{\beta }}_{i,k}^{\left(2\right)}\right)$, and relative expected total claim cost exp(β̂i,k(1)+β̂i,k(2))$\mathrm{exp}\left({\hat{\beta }}_{i,k}^{\left(1\right)}+{\hat{\beta }}_{i,k}^{\left(2\right)}\right)$of the kth category from the reference category in the ith factor as shown in Tables 2–4 and Figure 3. Note that the regression coefficients on the bonus–malus classes are all estimated to be zero and omitted in the tables.Figure 2:Cross validation errors for candidate values of regularization parameter κ.Table 2:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for 14 groups of owner’s age.Owner’s ageRelative expectedRelative expectedRelative expectedclaim frequencyclaim severitytotal claim cost0–242.0900.7791.627251.6361.0441.708261.6091.0671.716271.4371.0481.506281.2601.0721.350291.0921.0311.126301.0001.0001.00031–330.7050.9420.664340.6240.9420.588350.5040.9540.48136–390.4650.9420.43940–420.4030.9110.36743,440.3960.8990.35645–990.3610.7890.285Table 3:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for three groups of EV-rate classes.EV-rate classRelative expectedRelative expectedRelative expectedclaim frequencyclaim severitytotal claim cost1–41.0001.0001.00051.3131.0001.3136,72.0231.0002.023Table 4:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for four groups of city-size classes.City-size classRelative expectedRelative expectedRelative expectedclaim frequencyclaim severitytotal claim cost14.1511.5526.44322.5391.4933.79131.5221.1471.7474–71.0001.0001.000Figure 3:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for owner’s age.In Table 2 and Figure 3, the 100 categories of owner’s age were integrated into 14 groups; two of them contain wide ranges of younger ages 0–24 and older ages 45–99, respectively, and eight of them around 30 consist of single ages, which indicates that there are significant differences in insurance risk between those ages.The estimated expected claim frequency decreases monotonically with respect to the owner’s age and its difference is up to 5.8 times. In contrast, the estimated expected claim severity is the lowest in the youngest class 0–24 and the highest in late 20s, whose difference is only 1.4 times. Consequently, the product of them – the estimated expected total cost of claims – has its peak at 26, monotonically decreases after 26, and has six-fold difference at most.The EV-rate classes over and under the class 5 were integrated, respectively, which results in three groups of the EV-rate classes. There is two-fold difference in the estimated expected claim frequency but no difference in the estimated expected claim severity between the first and the last groups.Regarding the city-size classes, the classes 4–7 were integrated into one group and the others remained as single classes. Both the estimated expected claim frequency and estimated expected claim severity decrease monotonically with respect to the city-size classes and their difference is up to 4.2 and 1.6 times, respectively, which results in the difference up to 6.4 times in the estimated expected total cost of claims.6.2Group Fused Lasso for Interaction of Multiple Factors with Monotonic ConstraintsIn GLMs, interaction of multiple factors often improves predictive performance. The group fused lasso can also be applied to interaction of multiple factors with monotonic constraints by considering a multi-dimensional lattice graph. Although we tried several kinds of combinations for interaction, we explain the specific design of the model with interaction city-size classes × bonus–malus classes, which achieved the smallest validation error in five-fold cross validation among them.(37)min(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ) +κ1∑i=12∑k=1ni−1‖βi,k+1−βi,k‖2+κ2∑j=1n3−1∑k=1n4‖β3:4,j+1,k−β3:4,j,k‖2 +∑j=1n3∑k=1n4−1‖β3:4,j,k+1−β3:4,j,k‖2s.t. β2,k+1−β2,k⪰0,k=1,…,n2−1β3:4,j,k+1−β3:4,j,k⪯0,j=1,…,n3,k=1,…,n4−1,$$\begin{aligned}\hfill & \underset{\left(\boldsymbol{\beta },\phi \right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad \qquad +{\kappa }_{1}\sum _{i=1}^{2}\sum _{k=1}^{{n}_{i}-1}{\Vert}{\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}{{\Vert}}_{2}+{\kappa }_{2}\left(\sum _{j=1}^{{n}_{3}-1}\sum _{k=1}^{{n}_{4}}{\Vert}{\boldsymbol{\beta }}_{3:4,j+1,k}-{\boldsymbol{\beta }}_{3:4,j,k}{{\Vert}}_{2}\right.\hfill \\ \hfill & \qquad \quad \left.+\sum _{j=1}^{{n}_{3}}\sum _{k=1}^{{n}_{4}-1}{\Vert}{\boldsymbol{\beta }}_{3:4,j,k+1}-{\boldsymbol{\beta }}_{3:4,j,k}{{\Vert}}_{2}\right)\hfill \\ \hfill & \qquad \qquad \text{s.t.}\quad {\boldsymbol{\beta }}_{2,k+1}-{\boldsymbol{\beta }}_{2,k}{\succeq}0,\qquad k=1,\dots ,{n}_{2}-1\hfill \\ \hfill & \qquad \qquad {\boldsymbol{\beta }}_{3:4,j,k+1}-{\boldsymbol{\beta }}_{3:4,j,k}{\preceq}0,\qquad j=1,\dots ,{n}_{3},k=1,\dots ,{n}_{4}-1,\hfill \end{aligned}$$where β3:4,j,k=β3:4,j,k(1),β3:4,j,k(2)${\boldsymbol{\beta }}_{3:4,j,k}=\left({\beta }_{3:4,j,k}^{\left(1\right)},{\beta }_{3:4,j,k}^{\left(2\right)}\right)$is the regression coefficients of the jth city-size class and the kth bonus–malus class for expected claim frequency and expected claim severity. Note that we have n3 × n4 regression coefficient vectors β3:4,j,kfor all the combinations of the city-size class and the bonus–malus class including one reference category. Figure 4 shows the underlying graphs of the group fused lasso for the interaction.Figure 4:Underlying adjacent graphs of the group fused lasso for interaction of city-size classes and bonus–malus classes.Then, we rewrite (37) into the following equivalent optimization problem:(38)min(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ) +κ1∑i=12∑k=1ni−1‖ξi,k‖2+κ2∑j=1n3−1∑k=1n4‖ξ3,j,k‖2+∑j=1n3∑k=1n4−1‖ξ4,j,k‖2, s.t. ξ1,k=β1,k+1−β1,k,k=1,…,n1−1, ξ2,k=β2,k+1−β2,k⪰0,k=1,…,n2−1, ξ3,j,k=β3:4,j+1,k−β3:4,j,k,j=1,…,n3−1 k=1,…,n4, ξ4,j,k=β3:4,j,k+1−β3:4,j,k⪯0,j=1,…,n3 k=1,…,n4−1.$$\begin{aligned}\hfill & \underset{\left(\beta ,\phi \right)}{\mathrm{min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad \qquad +{\kappa }_{1}\sum _{i=1}^{2}\sum _{k=1}^{{n}_{i}-1}{\Vert}{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2}+{\kappa }_{2}\left(\sum _{j=1}^{{n}_{3}-1}\sum _{k=1}^{{n}_{4}}{\Vert}{\boldsymbol{\xi }}_{3,j,k}{{\Vert}}_{2}+\sum _{j=1}^{{n}_{3}}\sum _{k=1}^{{n}_{4}-1}{\Vert}{\boldsymbol{\xi }}_{4,j,k}{{\Vert}}_{2}\right),\hfill \\ \hfill & \qquad \quad \text{s.t.}\quad {\boldsymbol{\xi }}_{1,k}={\boldsymbol{\beta }}_{1,k+1}-{\boldsymbol{\beta }}_{1,k},\qquad k=1,\dots ,{n}_{1}-1,\hfill \\ \hfill & \qquad \quad {\boldsymbol{\xi }}_{2,k}={\boldsymbol{\beta }}_{2,k+1}-{\boldsymbol{\beta }}_{2,k}{\succeq}0,\qquad k=1,\dots ,{n}_{2}-1,\hfill \\ \hfill & \qquad \quad {\boldsymbol{\xi }}_{3,j,k}={\boldsymbol{\beta }}_{3:4,j+1,k}-{\boldsymbol{\beta }}_{3:4,j,k},\qquad j=1,\dots ,{n}_{3}-1\quad k=1,\dots ,{n}_{4},\hfill \\ \hfill & \qquad \quad {\boldsymbol{\xi }}_{4,j,k}={\boldsymbol{\beta }}_{3:4,j,k+1}-{\boldsymbol{\beta }}_{3:4,j,k}{\preceq}0,\qquad j=1,\dots ,{n}_{3}\quad k=1,\dots ,{n}_{4}-1.\hfill \end{aligned}$$Subsequently, the update equations to solve (38) are given by(39)(βnew,ϕnew)=argmin(β,ϕ)−∑t=1Tlogf1(zt;μt(1)(β(1)),wt)+logf2(yt;μt(2)(β(2)),zt,ϕ)+∑i=12∑k=1ni−1−⟨βi,k+1−βi,k−ξi,k,λi,k⟩+ρi2‖βi,k+1−βi,k−ξi,k‖22+∑j=1n3−1∑k=1n4−⟨β3:4,j+1,k−β3:4,j,k−ξ3,j,k,λ3,j,k⟩ +ρi2‖β3:4,j+1,k−β3:4,j,k−ξ3,j,k‖22+∑j=1n3∑k=1n4−1−⟨β3:4,j,k+1−β3:4,j,k−ξ4,j,k,λ4,j,k⟩ +ρi2‖β3:4,j,k+1−β3:4,j,k−ξ4,j,k‖22,\begin{align}\hfill \left({\boldsymbol{\beta }}^{\text{new}},{\phi }^{\text{new}}\right)& =\underset{\left(\boldsymbol{\beta },\phi \right)}{\text{arg⁡min}}-\sum _{t=1}^{T}\left\{\mathrm{log}{f}_{1}\left({z}_{t};{\mu }_{t}^{\left(1\right)}\left({\boldsymbol{\beta }}^{\left(1\right)}\right),{w}_{t}\right)+\mathrm{log}{f}_{2}\left({y}_{t};{\mu }_{t}^{\left(2\right)}\left({\boldsymbol{\beta }}^{\left(2\right)}\right),{z}_{t},\phi \right)\right\}\hfill \\ \hfill & \quad +\sum _{i=1}^{2}\sum _{k=1}^{{n}_{i}-1}\left\{-\langle {\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}-{\boldsymbol{\xi }}_{i,k},{\boldsymbol{\lambda }}_{i,k}\rangle +\frac{{\rho }_{i}}{2}{\Vert}{\boldsymbol{\beta }}_{i,k+1}-{\boldsymbol{\beta }}_{i,k}-{\boldsymbol{\xi }}_{i,k}{{\Vert}}_{2}^{2}\right\}\hfill \\ \hfill & \quad +\sum _{j=1}^{{n}_{3}-1}\sum _{k=1}^{{n}_{4}}\left\{-\langle {\boldsymbol{\beta }}_{3:4,j+1,k}-{\boldsymbol{\beta }}_{3:4,j,k}-{\boldsymbol{\xi }}_{3,j,k},{\boldsymbol{\lambda }}_{3,j,k}\rangle \right.\hfill \\ \hfill & \quad \left.+\frac{{\rho }_{i}}{2}{\Vert}{\boldsymbol{\beta }}_{3:4,j+1,k}-{\boldsymbol{\beta }}_{3:4,j,k}-{\boldsymbol{\xi }}_{3,j,k}{{\Vert}}_{2}^{2}\right\}\hfill \\ \hfill & \quad +\sum _{j=1}^{{n}_{3}}\sum _{k=1}^{{n}_{4}-1}\left\{-\langle {\boldsymbol{\beta }}_{3:4,j,k+1}-{\boldsymbol{\beta }}_{3:4,j,k}-{\boldsymbol{\xi }}_{4,j,k},{\boldsymbol{\lambda }}_{4,j,k}\rangle \right.\hfill \\ \hfill & \quad \left.+\frac{{\rho }_{i}}{2}{\Vert}{\boldsymbol{\beta }}_{3:4,j,k+1}-{\boldsymbol{\beta }}_{3:4,j,k}-{\boldsymbol{\xi }}_{4,j,k}{{\Vert}}_{2}^{2}\right\},\hfill \end{align}(40)ξ1,knew=argminξ1,kκ1‖ξ1,k‖2−⟨β1,k+1new−β1,knew−ξ1,k,λ1,k⟩+ρ12‖β1,k+1new−β1,knew−ξ1,k‖22,k=1,…,n1−1,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{1,k}^{\text{new}}=\underset{{\boldsymbol{\xi }}_{1,k}}{\text{arg⁡min}} {\kappa }_{1}{\Vert}{\boldsymbol{\xi }}_{1,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{1,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{1,k}^{\text{new}}-{\boldsymbol{\xi }}_{1,k},{\boldsymbol{\lambda }}_{1,k}\rangle +\frac{{\rho }_{1}}{2}{\Vert}{\boldsymbol{\beta }}_{1,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{1,k}^{\text{new}}-{\boldsymbol{\xi }}_{1,k}{{\Vert}}_{2}^{2},\\ \hfill k=1,\dots ,{n}_{1}-1,\end{gathered}$$(41)ξ2,knew=argminξ2,k⪰0κ1‖ξ2,k‖2−⟨β2,k+1new−β2,knew−ξ2,k,λ2,k⟩+ρ22‖β2,k+1new−β2,knew−ξ2,k‖22,k=1,…,n2−1,$$\begin{gathered}{c}{\boldsymbol{\xi }}_{2,k}^{\text{new}}=\underset{{\boldsymbol{\xi }}_{2,k}{\succeq}0}{\text{arg⁡min}} {\kappa }_{1}{\Vert}{\boldsymbol{\xi }}_{2,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{2,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{2,k}^{\text{new}}-{\boldsymbol{\xi }}_{2,k},{\boldsymbol{\lambda }}_{2,k}\rangle +\frac{{\rho }_{2}}{2}{\Vert}{\boldsymbol{\beta }}_{2,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{2,k}^{\text{new}}-{\boldsymbol{\xi }}_{2,k}{{\Vert}}_{2}^{2},\\ \hfill k=1,\dots ,{n}_{2}-1,\end{gathered}$$(42)ξ3,j,knew=argminξ3,j,kκ2‖ξ3,j,k‖2−⟨β3:4,j+1,knew−β3:4,j,knew−ξ3,j,k,λ3,j,k⟩+ρ32‖β3:4,j+1,knew−β3:4,j,knew−ξ3,j,k‖22,j=1,…,n3−1,k=1,…,n4,$$\begin{aligned}\hfill {\boldsymbol{\xi }}_{3,j,k}^{\text{new}}& =\underset{{\boldsymbol{\xi }}_{3,j,k}}{\text{arg⁡min}} {\kappa }_{2}{\Vert}{\boldsymbol{\xi }}_{3,j,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{3:4,j+1,k}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{3,j,k},{\boldsymbol{\lambda }}_{3,j,k}\rangle \hfill \\ \hfill & \quad +\frac{{\rho }_{3}}{2}{\Vert}{\boldsymbol{\beta }}_{3:4,j+1,k}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{3,j,k}{{\Vert}}_{2}^{2},\hfill \\ \hfill & j=1,\dots ,{n}_{3}-1,\quad k=1,\dots ,{n}_{4},\hfill \end{aligned}$$(43)ξ4,j,knew=argminξ4,j,kκ2‖ξ4,j,k‖2−⟨β3:4,j,k+1new−β3:4,j,knew−ξ4,j,k,λ4,j,k⟩+ρ42‖β3:4,j,k+1new−β3:4,j,knew−ξ4,j,k‖22,j=1,…,n3,k=1,…,n4−1,$$\begin{aligned}\hfill {\boldsymbol{\xi }}_{4,j,k}^{\text{new}}& =\underset{{\boldsymbol{\xi }}_{4,j,k}}{\text{arg⁡min}} {\kappa }_{2}{\Vert}{\boldsymbol{\xi }}_{4,j,k}{{\Vert}}_{2}-\langle {\boldsymbol{\beta }}_{3:4,j,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{4,j,k},{\boldsymbol{\lambda }}_{4,j,k}\rangle \hfill \\ \hfill & \quad +\frac{{\rho }_{4}}{2}{\Vert}{\boldsymbol{\beta }}_{3:4,j,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{4,j,k}{{\Vert}}_{2}^{2},\hfill \\ \hfill & j=1,\dots ,{n}_{3},\quad k=1,\dots ,{n}_{4}-1,\hfill \end{aligned}$$(44)λi,knew=λi,k−ρiβi,k+1new−βi,knew−ξi,knew,i=1,2,k=1,…,ni−1,\begin{align}\hfill & {\boldsymbol{\lambda }}_{i,k}^{\text{new}}={\boldsymbol{\lambda }}_{i,k}-{\rho }_{i}\left({\boldsymbol{\beta }}_{i,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{i,k}^{\text{new}}-{\boldsymbol{\xi }}_{i,k}^{\text{new}}\right),\quad i=1,2,\quad k=1,\dots ,{n}_{i}-1,\hfill \end{align}(45)λ3,j,knew=λ3,j,k−ρ3β3:4,j+1,knew−β3:4,j,knew−ξ3,j,knew,j=1,…,n3−1,k=1,…,n4,$${\boldsymbol{\lambda }}_{3,j,k}^{\text{new}}={\boldsymbol{\lambda }}_{3,j,k}-{\rho }_{3}\left({\boldsymbol{\beta }}_{3:4,j+1,k}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{3,j,k}^{\text{new}}\right),\quad j=1,\dots ,{n}_{3}-1,\quad k=1,\dots ,{n}_{4},$$(46)λ4,j,knew=λ4,j,k−ρ4β3:4,j,k+1new−β3:4,j,knew−ξ4,j,knew,j=1,…,n3,k=1,…,n4−1,$${\boldsymbol{\lambda }}_{4,j,k}^{\text{new}}={\boldsymbol{\lambda }}_{4,j,k}-{\rho }_{4}\left({\boldsymbol{\beta }}_{3:4,j,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}-{\boldsymbol{\xi }}_{4,j,k}^{\text{new}}\right), j=1,\dots ,{n}_{3}, k=1,\dots ,{n}_{4}-1,$$where ρiis basically set at κ1 for i = 1, 2 and at κ2 for i = 3, 4 but partially adjusted as ρ3 = ρ4 = max{10κ2, 10} when κ2 < 10 to accelerate convergence to the optimal solution. The optimal solutions in (40) and (41), which are the same as Equations (30) and (31), are given by (34) and (35), respectively. The analytical solution of (42) is given by(47)ξ3,j,knew=1−κ‖η3,j,k‖2η3,j,kρ3 if ‖η3,j,k‖2>κ,0,if ‖η3,j,k‖2≤κ, j=1,…,n3−1,k=1,…,n4,$${\boldsymbol{\xi }}_{3,j,k}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{3,j,k}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{3,j,k}}{{\rho }_{3}}\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{3,j,k}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{3,j,k}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad j=1,\dots ,{n}_{3}-1, k=1,\dots ,{n}_{4},$$where η3,j,k=ρ3(β3:4,j+1,knew−β3:4,j,knew)−λ3,j,k${\boldsymbol{\eta }}_{3,j,k}={\rho }_{3}\left({\boldsymbol{\beta }}_{3:4,j+1,k}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{3,j,k}$for j = 1, …, n3 − 1 and k = 1, …, n4. Next, the analytical solution of (43) is given by(48)ξ4,j,knew=1−κ‖η4,j,k−‖2η4,j,k−ρ4 if ‖η4,j,k−‖2>κ,0,if ‖η4,j,k−‖2≤κ, j=1,…,n3,k=1,…,n4−1,$${\boldsymbol{\xi }}_{4,j,k}^{\text{new}}=\begin{cases}\left(1-\frac{\kappa }{{\Vert}{\boldsymbol{\eta }}_{4,j,k}^{-}{{\Vert}}_{2}}\right)\frac{{\boldsymbol{\eta }}_{4,j,k}^{-}}{{\rho }_{4}}\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{4,j,k}^{-}{{\Vert}}_{2}{ >}\kappa ,\hfill \\ 0,\quad \hfill & \text{if\,}{\Vert}{\boldsymbol{\eta }}_{4,j,k}^{-}{{\Vert}}_{2}\le \kappa ,\hfill \end{cases}\quad j=1,\dots ,{n}_{3}, k=1,\dots ,{n}_{4}-1,$$where η4,j,k−=(minη4,j,k(1),0,minη4,j,k(2),0)${\boldsymbol{\eta }}_{4,j,k}^{-}=\left(\mathrm{min}\left\{{\eta }_{4,j,k}^{\left(1\right)},0\right\},\mathrm{min}\left\{{\eta }_{4,j,k}^{\left(2\right)},0\right\}\right)$denotes the element-wise negative part of η4,j,k=(η4,j,k(1),η4,j,k(2))=ρ4(β3:4,j,k+1new−β3:4,j,knew)−λ4,j,k${\boldsymbol{\eta }}_{4,j,k}=\left({\eta }_{4,j,k}^{\left(1\right)},{\eta }_{4,j,k}^{\left(2\right)}\right)={\rho }_{4}\left({\boldsymbol{\beta }}_{3:4,j,k+1}^{\text{new}}-{\boldsymbol{\beta }}_{3:4,j,k}^{\text{new}}\right)-{\boldsymbol{\lambda }}_{4,j,k}$for j = 1, …, n3 and k = 1, …, n4 − 1. We set κ1 = 14.9, the value of κ selected in the previous analysis, for single factors and selected the value of κ2 by five-fold cross validation with the validation error (14) from 100 grid points κ2j=κ̄210−3j99${\kappa }_{2j}={\bar{\kappa }}_{2}1{0}^{-\frac{3j}{99}}$for j = 99, 98, …, 0 where κ̄2${\bar{\kappa }}_{2}$is the lowest limit of κ2 at which all the relevant regression coefficients are estimated to be zero.The validation error for each candidate value of κ is shown in Figure 5 and takes a minimum value of 9458.6, which is less than that in the previous analysis, when κ2 = 1.02 as indicated by the vertical dotted line. In the similar way, we also tried some other combination of interaction; EV-rate classes × city-size classes, EV-rate classes × bonus–malus classes, and EV-rate classes × city-size classes × bonus–malus classes. The results are summarized in Table 5 and indicate that the model with interaction city-size classes × bonus–malus classes has the smallest validation error of the four candidates.Figure 5:Cross validation errors for candidate values of regularization parameter κ2 in interaction model.Table 5:Comparison of interaction models.Combination of interactionRegularization parameter κ2Validation errorCity-size × bonus–malus1.029458.6EV-rate × city-size1.669470.1EV-rate × bonus–malus1.179461.4EV-rate × city-size × bonus–malus0.4729474.4Thus, we adopted the model using interaction city-size classes × bonus–malus classes with κ1 = 14.9, κ2 = 1.02 and estimated the parameters from all the data. The estimated expected claim frequency exp(β̂0(1))=0.0081$\mathrm{exp}\left({\hat{\boldsymbol{\beta }}}_{0}^{\left(1\right)}\right)=0.0081$(claim/year), estimated expected claim severity exp(β̂0(2))=20967$\mathrm{exp}\left({\hat{\boldsymbol{\beta }}}_{0}^{\left(2\right)}\right)=20967$(Krone/claim), and expected total claim cost (pure premium) exp(β̂0(1)+β̂0(2))=170$\mathrm{exp}\left({\hat{\boldsymbol{\beta }}}_{0}^{\left(1\right)}+{\hat{\boldsymbol{\beta }}}_{0}^{\left(2\right)}\right)=170$(Krone/year) for the policies belonging to the reference classes are slightly less than those in the previous analysis. The relative expected claim frequency, relative expected claim severity, and relative expected total claim cost obtained from the estimates are shown in Tables 6–10, and Figure 6.Table 6:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for 14 groups of owner’s age by interaction model.Owner’s ageExpected claimExpected claimExpected total cost offrequency differenceseverity differenceclaims difference0–242.1270.7301.553251.6641.0121.683261.5941.0761.714271.4731.0731.580281.2891.0981.416291.1271.0541.188301.0001.0001.00031–330.7190.9370.673340.6390.9380.599350.5160.9530.49136–390.4790.9410.45040–420.4200.9130.38443,440.4100.8940.36745–990.3740.7800.292Table 7:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for three groups of EV-rate classes by interaction model.EV-rate classRelative expectedRelative expectedRelative expectedclaim frequencyclaim severitytotal claim cost1–41.0001.0001.00051.3351.0001.3356, 72.0781.0162.110Table 8:Estimates of relative expected claim frequency for interaction of the city-size classes and the bonus–malus classes.Bonus–malus class1234567City-size class15.7414.6574.5784.5783.9173.9173.91722.6182.6182.6182.6182.6182.6182.61831.5891.5891.5891.5891.5891.5891.58941.0001.0001.0001.0001.0001.0001.00051.0001.0001.0001.0001.0001.0000.98061.0001.0001.0001.0001.0001.0001.00071.0001.0001.0001.0001.0001.0001.000Table 9:Estimates of relative expected claim severity for interaction of the city-size classes and the bonus–malus classes.Bonus–malus class1234567City-size class11.7471.7471.7171.7171.5571.3441.34421.5561.5561.5561.5561.5561.5561.55631.4101.4101.4101.2051.2051.2050.78941.0001.0001.0001.0001.0001.0001.00051.0001.0001.0001.0001.0001.0000.74461.0081.0001.0001.0001.0001.0000.65071.0001.0001.0001.0001.0001.0000.650Table 10:Estimates of relative expected total claim cost for interaction of the city-size classes and the bonus–malus classes.Bonus–malus class1234567City-size class110.0278.1337.8607.8606.1005.2645.26424.0724.0724.0724.0724.0724.0724.07232.2412.2412.2411.9151.9151.9151.25441.0001.0001.0001.0001.0001.0001.00051.0001.0001.0001.0001.0001.0000.72961.0081.0001.0001.0001.0001.0000.65071.0001.0001.0001.0001.0001.0000.650Figure 6:Estimates of relative expected claim frequency, relative expected claim severity, and relative expected total claim cost for owner’s age by interaction model.As shown in Tables 6 and 7, and Figure 6, we obtained the same integrated groups and almost the same estimates as in the previous analysis for the owner’s age and the EV-rate classes. The estimates in Tables 8–10 indicate strong interaction between the city-size classes and the bonus–malus classes. There is about two-fold difference for the city-size class 1 but no difference for the city-size classes 2 and 4 in the expected total claim cost between the bonus–malus classes. In the city-size classes 5–7, the expected total claim cost drops by around 30% when the bonus–malus class goes up to 7 from the others. Consequently, the 49 combinations of the city-size classes and the bonus–malus classes are integrated into 13 groups in the expected total claim cost, whose difference is up to 15.4 times.7ConclusionThis paper introduced ordinal constraints on risk factors to the group fused lasso for insurance pricing. The group fused lasso encourages grouping regression coefficients of adjacent categories through optimization of the objective function including the group fused lasso terms. Strength of the grouping is adjusted by regularization parameter κ, which is tuned by minimizing cross-validation error that evaluates predictive performance on the validation datasets. Therefore, the grouping of rating factors is determined to have the optimal predictive performance within possible groupings induced by the group fused lasso.We added monotonic/ordinal constraints practically required for some risk factors such as bonus–malus classes to the model in Nomura (2017) and proposed the modified ADMM algorithm to estimate parameters under those constraints. If we use the model in Nomura (2017) without constraints, we may obtain regression coefficients not satisfying some of the constraints, which may result in inconsistent pure premiums such that, for example, pure premiums for upper bonus–malus classes (excellent drivers) would be higher than those for lower classes.We demonstrated our method in the analysis of motorcycle insurance data. In Section 6.1, we incorporated the group fused lasso with monotonic constraints into the regression coefficients of some factors and obtained a moderate number of rating groups for each factor. The estimated differences in the expected total claim cost (pure premium) are up to 5.8 times among the owner’s ages, 6.4 times among the city-size classes, and 2.0 times among the EV-rate classes, whereas there is no difference in the estimated total claim cost among bonus–malus classes. In Section 6.2, we introduced the group fused lasso on the multi-dimensional lattice graphs for the interaction of multiple factors. Specifically, we estimated the interaction between the bonus–malus and city-size classes, which revealed that, in contrast to the previous analysis, the expected claim frequency and severity vary by the bonus–malus classes whose differences are not very large and depend on the city-size classes. This result indicates that premium discount rates for the bonus–malus classes should change by the city-size classes.Sparse regularization techniques are widely applied to insurance data such as mortality analysis (SriDaran et al. 2022) and loss reserving (Gráinne, Taylor, and Miller 2021), and our method may be applicable to those fields as well. On a technical aspect, our method can be used with other distributions in the exponential dispersion family such as inverse Gaussian distribution. Furthermore, the group sparse lasso is applicable not only to GLMs but also to deep neural networks (Scardapane et al. 2017) and our approach may also be used in such machine learning models.
Asia-Pacific Journal of Risk and Insurance – de Gruyter
Published: Jan 1, 2023
Keywords: tariff analysis; generalized linear model; sparse regularization; group fused lasso; alternating direction method of multipliers
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.