Misplaced Pages

Functional regression

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Type of regression analysis

Functional regression is a version of regression analysis when responses or covariates include functional data. Functional regression models can be classified into four types depending on whether the responses or covariates are functional or scalar: (i) scalar responses with functional covariates, (ii) functional responses with scalar covariates, (iii) functional responses with functional covariates, and (iv) scalar or functional responses with functional and scalar covariates. In addition, functional regression models can be linear, partially linear, or nonlinear. In particular, functional polynomial models, functional single and multiple index models and functional additive models are three special cases of functional nonlinear models.

Functional linear models (FLMs)

Functional linear models (FLMs) are an extension of linear models (LMs). A linear model with scalar response Y R {\displaystyle Y\in \mathbb {R} } and scalar covariates X R p {\displaystyle X\in \mathbb {R} ^{p}} can be written as

Y = β 0 + X , β + ε , {\displaystyle Y=\beta _{0}+\langle X,\beta \rangle +\varepsilon ,} 1

where , {\displaystyle \langle \cdot ,\cdot \rangle } denotes the inner product in Euclidean space, β 0 R {\displaystyle \beta _{0}\in \mathbb {R} } and β R p {\displaystyle \beta \in \mathbb {R} ^{p}} denote the regression coefficients, and ε {\displaystyle \varepsilon } is a random error with mean zero and finite variance. FLMs can be divided into two types based on the responses.

Functional linear models with scalar responses

Functional linear models with scalar responses can be obtained by replacing the scalar covariates X {\displaystyle X} and the coefficient vector β {\displaystyle \beta } in model (1) by a centered functional covariate X c ( ) = X ( ) E ( X ( ) ) {\displaystyle X^{c}(\cdot )=X(\cdot )-\mathbb {E} (X(\cdot ))} and a coefficient function β = β ( ) {\displaystyle \beta =\beta (\cdot )} with domain T {\displaystyle {\mathcal {T}}} , respectively, and replacing the inner product in Euclidean space by that in Hilbert space L 2 {\displaystyle L^{2}} ,

Y = β 0 + X c , β + ε = β 0 + T X c ( t ) β ( t ) d t + ε , {\displaystyle Y=\beta _{0}+\langle X^{c},\beta \rangle +\varepsilon =\beta _{0}+\int _{\mathcal {T}}X^{c}(t)\beta (t)\,dt+\varepsilon ,} 2

where , {\displaystyle \langle \cdot ,\cdot \rangle } here denotes the inner product in L 2 {\displaystyle L^{2}} . One approach to estimating β 0 {\displaystyle \beta _{0}} and β ( ) {\displaystyle \beta (\cdot )} is to expand the centered covariate X c ( ) {\displaystyle X^{c}(\cdot )} and the coefficient function β ( ) {\displaystyle \beta (\cdot )} in the same functional basis, for example, B-spline basis or the eigenbasis used in the Karhunen–Loève expansion. Suppose { ϕ k } k = 1 {\displaystyle \{\phi _{k}\}_{k=1}^{\infty }} is an orthonormal basis of L 2 {\displaystyle L^{2}} . Expanding X c {\displaystyle X^{c}} and β {\displaystyle \beta } in this basis, X c ( ) = k = 1 x k ϕ k ( ) {\displaystyle X^{c}(\cdot )=\sum _{k=1}^{\infty }x_{k}\phi _{k}(\cdot )} , β ( ) = k = 1 β k ϕ k ( ) {\displaystyle \beta (\cdot )=\sum _{k=1}^{\infty }\beta _{k}\phi _{k}(\cdot )} , model (2) becomes Y = β 0 + k = 1 β k x k + ε . {\displaystyle Y=\beta _{0}+\sum _{k=1}^{\infty }\beta _{k}x_{k}+\varepsilon .} For implementation, regularization is needed and can be done through truncation, L 2 {\displaystyle L^{2}} penalization or L 1 {\displaystyle L^{1}} penalization. In addition, a reproducing kernel Hilbert space (RKHS) approach can also be used to estimate β 0 {\displaystyle \beta _{0}} and β ( ) {\displaystyle \beta (\cdot )} in model (2)

Adding multiple functional and scalar covariates, model (2) can be extended to

Y = k = 1 q Z k α k + j = 1 p T j X j c ( t ) β j ( t ) d t + ε , {\displaystyle Y=\sum _{k=1}^{q}Z_{k}\alpha _{k}+\sum _{j=1}^{p}\int _{{\mathcal {T}}_{j}}X_{j}^{c}(t)\beta _{j}(t)\,dt+\varepsilon ,} 3

where Z 1 , , Z q {\displaystyle Z_{1},\ldots ,Z_{q}} are scalar covariates with Z 1 = 1 {\displaystyle Z_{1}=1} , α 1 , , α q {\displaystyle \alpha _{1},\ldots ,\alpha _{q}} are regression coefficients for Z 1 , , Z q {\displaystyle Z_{1},\ldots ,Z_{q}} , respectively, X j c {\displaystyle X_{j}^{c}} is a centered functional covariate given by X j c ( ) = X j ( ) E ( X j ( ) ) {\displaystyle X_{j}^{c}(\cdot )=X_{j}(\cdot )-\mathbb {E} (X_{j}(\cdot ))} , β j {\displaystyle \beta _{j}} is regression coefficient function for X j c ( ) {\displaystyle X_{j}^{c}(\cdot )} , and T j {\displaystyle {\mathcal {T}}_{j}} is the domain of X j {\displaystyle X_{j}} and β j {\displaystyle \beta _{j}} , for j = 1 , , p {\displaystyle j=1,\ldots ,p} . However, due to the parametric component α {\displaystyle \alpha } , the estimation methods for model (2) cannot be used in this case and alternative estimation methods for model (3) are available.

Functional linear models with functional responses

For a functional response Y ( ) {\displaystyle Y(\cdot )} with domain T {\displaystyle {\mathcal {T}}} and a functional covariate X ( ) {\displaystyle X(\cdot )} with domain S {\displaystyle {\mathcal {S}}} , two FLMs regressing Y ( ) {\displaystyle Y(\cdot )} on X ( ) {\displaystyle X(\cdot )} have been considered. One of these two models is of the form

Y ( t ) = β 0 ( t ) + S β ( s , t ) X c ( s ) d s + ε ( t ) ,   for   t T , {\displaystyle Y(t)=\beta _{0}(t)+\int _{\mathcal {S}}\beta (s,t)X^{c}(s)\,ds+\varepsilon (t),\ {\text{for}}\ t\in {\mathcal {T}},} 4

where X c ( ) = X ( ) E ( X ( ) ) {\displaystyle X^{c}(\cdot )=X(\cdot )-\mathbb {E} (X(\cdot ))} is still the centered functional covariate, β 0 ( ) {\displaystyle \beta _{0}(\cdot )} and β ( , ) {\displaystyle \beta (\cdot ,\cdot )} are coefficient functions, and ε ( ) {\displaystyle \varepsilon (\cdot )} is usually assumed to be a random process with mean zero and finite variance. In this case, at any given time t T {\displaystyle t\in {\mathcal {T}}} , the value of Y {\displaystyle Y} , i.e., Y ( t ) {\displaystyle Y(t)} , depends on the entire trajectory of X {\displaystyle X} . Model (4), for any given time t {\displaystyle t} , is an extension of multivariate linear regression with the inner product in Euclidean space replaced by that in L 2 {\displaystyle L^{2}} . An estimating equation motivated by multivariate linear regression is r X Y = R X X β ,  for  β L 2 ( S × S ) , {\displaystyle r_{XY}=R_{XX}\beta ,{\text{ for }}\beta \in L^{2}({\mathcal {S}}\times {\mathcal {S}}),} where r X Y ( s , t ) = cov ( X ( s ) , Y ( t ) ) {\displaystyle r_{XY}(s,t)={\text{cov}}(X(s),Y(t))} , R X X : L 2 ( S × S ) L 2 ( S × T ) {\displaystyle R_{XX}:L^{2}({\mathcal {S}}\times {\mathcal {S}})\rightarrow L^{2}({\mathcal {S}}\times {\mathcal {T}})} is defined as ( R X X β ) ( s , t ) = S r X X ( s , w ) β ( w , t ) d w {\displaystyle (R_{XX}\beta )(s,t)=\int _{\mathcal {S}}r_{XX}(s,w)\beta (w,t)dw} with r X X ( s , w ) = cov ( X ( s ) , X ( w ) ) {\displaystyle r_{XX}(s,w)={\text{cov}}(X(s),X(w))} for s , w S {\displaystyle s,w\in {\mathcal {S}}} . Regularization is needed and can be done through truncation, L 2 {\displaystyle L^{2}} penalization or L 1 {\displaystyle L^{1}} penalization. Various estimation methods for model (4) are available.
When X {\displaystyle X} and Y {\displaystyle Y} are concurrently observed, i.e., S = T {\displaystyle {\mathcal {S}}={\mathcal {T}}} , it is reasonable to consider a historical functional linear model, where the current value of Y {\displaystyle Y} only depends on the history of X {\displaystyle X} , i.e., β ( s , t ) = 0 {\displaystyle \beta (s,t)=0} for s > t {\displaystyle s>t} in model (4). A simpler version of the historical functional linear model is the functional concurrent model (see below).
Adding multiple functional covariates, model (4) can be extended to

Y ( t ) = β 0 ( t ) + j = 1 p S j β j ( s , t ) X j c ( s ) d s + ε ( t ) ,   for   t T , {\displaystyle Y(t)=\beta _{0}(t)+\sum _{j=1}^{p}\int _{{\mathcal {S}}_{j}}\beta _{j}(s,t)X_{j}^{c}(s)\,ds+\varepsilon (t),\ {\text{for}}\ t\in {\mathcal {T}},} 5

where for j = 1 , , p {\displaystyle j=1,\ldots ,p} , X j c ( ) = X j ( ) E ( X j ( ) ) {\displaystyle X_{j}^{c}(\cdot )=X_{j}(\cdot )-\mathbb {E} (X_{j}(\cdot ))} is a centered functional covariate with domain S j {\displaystyle {\mathcal {S}}_{j}} , and β j ( , ) {\displaystyle \beta _{j}(\cdot ,\cdot )} is the corresponding coefficient function with the same domain, respectively. In particular, taking X j ( ) {\displaystyle X_{j}(\cdot )} as a constant function yields a special case of model (5) Y ( t ) = j = 1 p X j β j ( t ) + ε ( t ) ,   for   t T , {\displaystyle Y(t)=\sum _{j=1}^{p}X_{j}\beta _{j}(t)+\varepsilon (t),\ {\text{for}}\ t\in {\mathcal {T}},} which is a FLM with functional responses and scalar covariates.

Functional concurrent models

Assuming that S = T {\displaystyle {\mathcal {S}}={\mathcal {T}}} , another model, known as the functional concurrent model, sometimes also referred to as the varying-coefficient model, is of the form

Y ( t ) = α 0 ( t ) + α ( t ) X ( t ) + ε ( t ) ,   for   t T , {\displaystyle Y(t)=\alpha _{0}(t)+\alpha (t)X(t)+\varepsilon (t),\ {\text{for}}\ t\in {\mathcal {T}},} 6

where α 0 {\displaystyle \alpha _{0}} and α {\displaystyle \alpha } are coefficient functions. Note that model (6) assumes the value of Y {\displaystyle Y} at time t {\displaystyle t} , i.e., Y ( t ) {\displaystyle Y(t)} , only depends on that of X {\displaystyle X} at the same time, i.e., X ( t ) {\displaystyle X(t)} . Various estimation methods can be applied to model (6).
Adding multiple functional covariates, model (6) can also be extended to Y ( t ) = α 0 ( t ) + j = 1 p α j ( t ) X j ( t ) + ε ( t ) ,   for   t T , {\displaystyle Y(t)=\alpha _{0}(t)+\sum _{j=1}^{p}\alpha _{j}(t)X_{j}(t)+\varepsilon (t),\ {\text{for}}\ t\in {\mathcal {T}},} where X 1 , , X p {\displaystyle X_{1},\ldots ,X_{p}} are multiple functional covariates with domain T {\displaystyle {\mathcal {T}}} and α 0 , α 1 , , α p {\displaystyle \alpha _{0},\alpha _{1},\ldots ,\alpha _{p}} are the coefficient functions with the same domain.

Functional nonlinear models

Functional polynomial models

Functional polynomial models are an extension of the FLMs with scalar responses, analogous to extending linear regression to polynomial regression. For a scalar response Y {\displaystyle Y} and a functional covariate X ( ) {\displaystyle X(\cdot )} with domain T {\displaystyle {\mathcal {T}}} , the simplest example of functional polynomial models is functional quadratic regression Y = α + T β ( t ) X c ( t ) d t + T T γ ( s , t ) X c ( s ) X c ( t ) d s d t + ε , {\displaystyle Y=\alpha +\int _{\mathcal {T}}\beta (t)X^{c}(t)\,dt+\int _{\mathcal {T}}\int _{\mathcal {T}}\gamma (s,t)X^{c}(s)X^{c}(t)\,ds\,dt+\varepsilon ,} where X c ( ) = X ( ) E ( X ( ) ) {\displaystyle X^{c}(\cdot )=X(\cdot )-\mathbb {E} (X(\cdot ))} is the centered functional covariate, α {\displaystyle \alpha } is a scalar coefficient, β ( ) {\displaystyle \beta (\cdot )} and γ ( , ) {\displaystyle \gamma (\cdot ,\cdot )} are coefficient functions with domains T {\displaystyle {\mathcal {T}}} and T × T {\displaystyle {\mathcal {T}}\times {\mathcal {T}}} , respectively, and ε {\displaystyle \varepsilon } is a random error with mean zero and finite variance. By analogy to FLMs with scalar responses, estimation of functional polynomial models can be obtained through expanding both the centered covariate X c {\displaystyle X^{c}} and the coefficient functions β {\displaystyle \beta } and γ {\displaystyle \gamma } in an orthonormal basis.

Functional single and multiple index models

A functional multiple index model is given by Y = g ( T X c ( t ) β 1 ( t ) d t , , T X c ( t ) β p ( t ) d t ) + ε . {\displaystyle Y=g\left(\int _{\mathcal {T}}X^{c}(t)\beta _{1}(t)\,dt,\ldots ,\int _{\mathcal {T}}X^{c}(t)\beta _{p}(t)\,dt\right)+\varepsilon .} Taking p = 1 {\displaystyle p=1} yields a functional single index model. However, for p > 1 {\displaystyle p>1} , this model is problematic due to curse of dimensionality. With p > 1 {\displaystyle p>1} and relatively small sample sizes, the estimator given by this model often has large variance. An alternative p {\displaystyle p} -component functional multiple index model can be expressed as Y = g 1 ( T X c ( t ) β 1 ( t ) d t ) + + g p ( T X c ( t ) β p ( t ) d t ) + ε . {\displaystyle Y=g_{1}\left(\int _{\mathcal {T}}X^{c}(t)\beta _{1}(t)\,dt\right)+\cdots +g_{p}\left(\int _{\mathcal {T}}X^{c}(t)\beta _{p}(t)\,dt\right)+\varepsilon .} Estimation methods for functional single and multiple index models are available.

Functional additive models (FAMs)

Given an expansion of a functional covariate X {\displaystyle X} with domain T {\displaystyle {\mathcal {T}}} in an orthonormal basis { ϕ k } k = 1 {\displaystyle \{\phi _{k}\}_{k=1}^{\infty }} : X ( t ) = k = 1 x k ϕ k ( t ) {\displaystyle X(t)=\sum _{k=1}^{\infty }x_{k}\phi _{k}(t)} , a functional linear model with scalar responses shown in model (2) can be written as E ( Y | X ) = E ( Y ) + k = 1 β k x k . {\displaystyle \mathbb {E} (Y|X)=\mathbb {E} (Y)+\sum _{k=1}^{\infty }\beta _{k}x_{k}.} One form of FAMs is obtained by replacing the linear function of x k {\displaystyle x_{k}} , i.e., β k x k {\displaystyle \beta _{k}x_{k}} , by a general smooth function f k {\displaystyle f_{k}} , E ( Y | X ) = E ( Y ) + k = 1 f k ( x k ) , {\displaystyle \mathbb {E} (Y|X)=\mathbb {E} (Y)+\sum _{k=1}^{\infty }f_{k}(x_{k}),} where f k {\displaystyle f_{k}} satisfies E ( f k ( x k ) ) = 0 {\displaystyle \mathbb {E} (f_{k}(x_{k}))=0} for k N {\displaystyle k\in \mathbb {N} } . Another form of FAMs consists of a sequence of time-additive models: E ( Y | X ( t 1 ) , , X ( t p ) ) = j = 1 p f j ( X ( t j ) ) , {\displaystyle \mathbb {E} (Y|X(t_{1}),\ldots ,X(t_{p}))=\sum _{j=1}^{p}f_{j}(X(t_{j})),} where { t 1 , , t p } {\displaystyle \{t_{1},\ldots ,t_{p}\}} is a dense grid on T {\displaystyle {\mathcal {T}}} with increasing size p N {\displaystyle p\in \mathbb {N} } , and f j ( x ) = g ( t j , x ) {\displaystyle f_{j}(x)=g(t_{j},x)} with g {\displaystyle g} a smooth function, for j = 1 , , p {\displaystyle j=1,\ldots ,p}

Extensions

A direct extension of FLMs with scalar responses shown in model (2) is to add a link function to create a generalized functional linear model (GFLM) by analogy to extending linear regression to generalized linear regression (GLM), of which the three components are:

  1. Linear predictor η = β 0 + T X c ( t ) β ( t ) d t {\displaystyle \eta =\beta _{0}+\int _{\mathcal {T}}X^{c}(t)\beta (t)\,dt} ;
  2. Variance function Var ( Y | X ) = V ( μ ) {\displaystyle {\text{Var}}(Y|X)=V(\mu )} , where μ = E ( Y | X ) {\displaystyle \mu =\mathbb {E} (Y|X)} is the conditional mean;
  3. Link function g {\displaystyle g} connecting the conditional mean and the linear predictor through μ = g ( η ) {\displaystyle \mu =g(\eta )} .

See also

References

  1. ^ Morris, Jeffrey S. (2015). "Functional Regression". Annual Review of Statistics and Its Application. 2 (1): 321–359. arXiv:1406.4068. Bibcode:2015AnRSA...2..321M. doi:10.1146/annurev-statistics-010814-020413. S2CID 18637009.
  2. Yuan and Cai (2010). "A reproducing kernel Hilbert space approach to functional linear regression". The Annals of Statistics. 38 (6):3412–3444. doi:10.1214/09-AOS772.
  3. ^ Wang, Jane-Ling; Chiou, Jeng-Min; Müller, Hans-Georg (2016). "Functional Data Analysis". Annual Review of Statistics and Its Application. 3 (1): 257–295. Bibcode:2016AnRSA...3..257W. doi:10.1146/annurev-statistics-041715-033624.
  4. Kong, Dehan; Xue, Kaijie; Yao, Fang; Zhang, Hao H. "Partially functional linear regression in high dimensions". Biometrika. 103 (1): 147–159. doi:10.1093/biomet/asv062. ISSN 0006-3444.
  5. Hu, Z. (2004-06-01). "Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data". Biometrika. 91 (2): 251–262. doi:10.1093/biomet/91.2.251. ISSN 0006-3444.
  6. Ramsay and Silverman (2005). Functional data analysis, 2nd ed., New York: Springer, ISBN 0-387-40080-X.
  7. Ramsay, J. O.; Dalzell, C. J. (1991). "Some Tools for Functional Data Analysis". Journal of the Royal Statistical Society. Series B (Methodological). 53 (3): 539–572. ISSN 0035-9246.
  8. Yao, Fang; Müller, Hans-Georg; Wang, Jane-Ling. "Functional linear regression analysis for longitudinal data". The Annals of Statistics. 33 (6): 2873–2903. arXiv:math/0603132. doi:10.1214/009053605000000660. ISSN 0090-5364.
  9. Grenander, Ulf. "Stochastic processes and statistical inference". Arkiv för Matematik. 1 (3): 195–277. doi:10.1007/BF02590638. ISSN 0004-2080.
  10. Malfait, Nicole; Ramsay, James O. (2003). "The historical functional linear model". Canadian Journal of Statistics. 31 (2): 115–128. doi:10.2307/3316063. ISSN 1708-945X.
  11. Fan, Jianqing; Zhang, Wenyang. "Statistical estimation in varying coefficient models". The Annals of Statistics. 27 (5): 1491–1518. doi:10.1214/aos/1017939139. ISSN 0090-5364.
  12. Huang, Jianhua Z.; Wu, Colin O.; Zhou, Lan (2004). "Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data". Statistica Sinica. 14 (3): 763–788. ISSN 1017-0405.
  13. Şentürk, Damla; Müller, Hans-Georg (2010-09-01). "Functional Varying Coefficient Models for Longitudinal Data". Journal of the American Statistical Association. doi:10.1198/jasa.2010.tm09228. ISSN 0162-1459.
  14. ^ Yao, F.; Muller, H.-G. (2010-03-01). "Functional quadratic regression". Biometrika. 97 (1): 49–64. doi:10.1093/biomet/asp069. ISSN 0006-3444.
  15. ^ Chen, Dong; Hall, Peter; Müller, Hans-Georg. "Single and multiple index functional regression models with nonparametric link". The Annals of Statistics. 39 (3): 1720–1747. arXiv:1211.5018. doi:10.1214/11-AOS882. ISSN 0090-5364.
  16. Jiang, Ci-Ren; Wang, Jane-Ling. "Functional single index models for longitudinal data". The Annals of Statistics. 39 (1): 362–388. arXiv:1103.1726. doi:10.1214/10-AOS845. ISSN 0090-5364.
  17. Müller, Hans-Georg; Yao, Fang (2008-12-01). "Functional Additive Models". Journal of the American Statistical Association. doi:10.1198/016214508000000751. ISSN 0162-1459.
  18. Fan, Yingying; James, Gareth M.; Radchenko, Peter. "Functional additive regression". The Annals of Statistics. 43 (5): 2296–2325. arXiv:1510.04064. doi:10.1214/15-AOS1346. ISSN 0090-5364.
Category: