Title: | Engression Modelling |
Version: | 0.1.4 |
Description: | Fits engression models for nonlinear distributional regression. Predictors and targets can be univariate or multivariate. Functionality includes estimation of conditional mean, estimation of conditional quantiles, or sampling from the fitted distribution. Training is done full-batch on CPU (the python version offers GPU-accelerated stochastic gradient descent). Based on "Engression: Extrapolation for nonlinear regression?" by Xinwei Shen and Nicolai Meinshausen (2023). Also supports classification (experimental). <doi:10.48550/arXiv.2307.00835>. |
URL: | https://github.com/xwshen51/engression/ |
BugReports: | https://github.com/xwshen51/engression/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Imports: | torch |
NeedsCompilation: | no |
Packaged: | 2023-11-21 11:10:49 UTC; nicolai |
Author: | Xinwei Shen [aut], Nicolai Meinshausen [aut, cre] |
Maintainer: | Nicolai Meinshausen <meinshausen@stat.math.ethz.ch> |
Repository: | CRAN |
Date/Publication: | 2023-11-22 08:40:02 UTC |
Convert Data Frame to Numeric Matrix
Description
This function converts a data frame into a numeric matrix. If the data frame contains factor or character variables, they are first converted to numeric.
Usage
dftomat(X)
Arguments
X |
A data frame to be converted to a numeric matrix. |
Value
A numeric matrix corresponding to the input data frame.
Energy Loss Calculation
Description
This function calculates the energy loss for given tensors. The loss is calculated
as the mean of the L2 norms between yt
and mxt
and between yt
and mxpt
,
subtracted by half the mean of the L2 norm between mxt
and mxpt
.
Usage
energyloss(yt, mxt, mxpt)
Arguments
yt |
A tensor representing the target values. |
mxt |
A tensor representing the model's stochastic predictions. |
mxpt |
A tensor representing another draw of the model's stochastic predictions. |
Value
A scalar representing the calculated energy loss.
Energy Loss Calculation (Extended Output)
Description
This function calculates the energy loss for given tensors, similar to energyloss()
. The loss is calculated
as the mean of the L2 norms between yt
and mxt
and between yt
and mxpt
,
subtracted by half the mean of the L2 norm between mxt
and mxpt
. Unlike energyloss()
, this function
also returns the prediction loss s1 = E(|yt-mxt|) and variance loss s2 = E(|mxt-mxpt'|) as part of the output.
Usage
energylossall(yt, mxt, mxpt)
Arguments
yt |
A tensor representing the target values. |
mxt |
A tensor representing the model's stochastic predictions. |
mxpt |
A tensor representing another draw of the model's stochastic predictions. |
Value
A vector containing the calculated energy loss, s1
, and s2
.
Energy Loss Calculation with Beta Scaling
Description
This function calculates the energy loss for given tensors. The loss is calculated
as the mean of the L2 norms between yt
and mxt
and between yt
and mxpt
,
each raised to the power of beta
, subtracted by half the mean of the L2 norm between mxt
and mxpt
,
also raised to the power of beta
.
Usage
energylossbeta(yt, mxt, mxpt, beta)
Arguments
yt |
A tensor representing the target values. |
mxt |
A tensor representing the model's stochastic predictions. |
mxpt |
A tensor representing another draw of the model's stochastic predictions. |
beta |
A numeric value for scaling the energy loss. |
Value
A scalar representing the calculated energy loss.
Engression Function
Description
This function fits an engression model to the data. It allows for the tuning of several parameters related to model complexity. Variables are per default internally standardized (predictions are on original scale).
Usage
engression(
X,
Y,
noise_dim = 5,
hidden_dim = 100,
num_layer = 3,
dropout = 0.05,
batch_norm = TRUE,
num_epochs = 1000,
lr = 10^(-3),
beta = 1,
silent = FALSE,
standardize = TRUE
)
Arguments
X |
A matrix or data frame representing the predictors. |
Y |
A matrix or vector representing the target variable(s). If Y is a factor a classification model is fitted (experimental). |
noise_dim |
The dimension of the noise introduced in the model (default: 5). |
The size of the hidden layer in the model (default: 100). | |
num_layer |
The number of layers in the model (default: 3). |
dropout |
The dropout rate to be used in the model in case no batch normalization is used. Only active if batch normalization is off. (default: 0.01) |
batch_norm |
A boolean indicating whether to use batch-normalization (default: TRUE). |
num_epochs |
The number of epochs to be used in training (default: 1000). |
lr |
The learning rate to be used in training (default: 10^-3). |
beta |
The beta scaling factor for energy loss (default: 1). |
silent |
A boolean indicating whether to suppress output during model training (default: FALSE). |
standardize |
A boolean indicating whether to standardize the input data (default: TRUE). |
Value
An engression model object with class "engression".
Examples
n = 1000
p = 5
X = matrix(rnorm(n*p),ncol=p)
Y = (X[,1]+rnorm(n)*0.1)^2 + (X[,2]+rnorm(n)*0.1) + rnorm(n)*0.1
Xtest = matrix(rnorm(n*p),ncol=p)
Ytest = (Xtest[,1]+rnorm(n)*0.1)^2 + (Xtest[,2]+rnorm(n)*0.1) + rnorm(n)*0.1
## fit engression object
engr = engression(X,Y)
print(engr)
## prediction on test data
Yhat = predict(engr,Xtest,type="mean")
cat("\n correlation between predicted and realized values: ", signif(cor(Yhat, Ytest),3))
plot(Yhat, Ytest,xlab="prediction", ylab="observation")
## quantile prediction
Yhatquant = predict(engr,Xtest,type="quantiles")
ord = order(Yhat)
matplot(Yhat[ord], Yhatquant[ord,], type="l", col=2,lty=1,xlab="prediction", ylab="observation")
points(Yhat[ord],Ytest[ord],pch=20,cex=0.5)
## sampling from estimated model
Ysample = predict(engr,Xtest,type="sample",nsample=1)
## plot of realized values against first variable
oldpar <- par()
par(mfrow=c(1,2))
plot(Xtest[,1], Ytest, xlab="Variable 1", ylab="Observation")
## plot of sampled values against first variable
plot(Xtest[,1], Ysample, xlab="Variable 1", ylab="Sample from engression model")
par(oldpar)
Engression Fit Function
Description
This function fits an Engression model to the provided data. It allows for the tuning of several parameters related to model complexity and training. The function is not meant to be exported but can be used within the package or for internal testing purposes.
Usage
engressionfit(
X,
Y,
noise_dim = 100,
hidden_dim = 100,
num_layer = 3,
dropout = 0.01,
batch_norm = TRUE,
num_epochs = 200,
lr = 10^(-3),
beta = 1,
silent = FALSE
)
Arguments
X |
A matrix or data frame representing the predictors. |
Y |
A matrix representing the target variable(s). |
noise_dim |
The dimension of the noise introduced in the model (default: 100). |
The size of the hidden layer in the model (default: 100). | |
num_layer |
The number of layers in the model (default: 3). |
dropout |
The dropout rate to be used in the model in case no batch normalization is used (default: 0.01) |
batch_norm |
A boolean indicating whether to use batch-normalization (default: TRUE). |
num_epochs |
The number of epochs to be used in training (default: 200). |
lr |
The learning rate to be used in training (default: 10^-3). |
beta |
The beta scaling factor for energy loss (default: 1). |
silent |
A boolean indicating whether to suppress output during model training (default: FALSE). |
Value
A list containing the trained engression model and a vector of loss values.
Prediction Function for Engression Models
Description
This function computes predictions from a trained engression model. It allows for the generation of point estimates, quantiles, or samples from the estimated distribution.
Usage
## S3 method for class 'engression'
predict(
object,
Xtest,
type = c("mean", "sample", "quantile")[1],
trim = 0.05,
quantiles = 0.1 * (1:9),
nsample = 200,
drop = TRUE,
...
)
Arguments
object |
A trained engression model returned from engression, engressionBagged or engressionfit functions. |
Xtest |
A matrix or data frame representing the predictors in the test set. |
type |
The type of prediction to make. "mean" for point estimates, "sample" for samples from the estimated distribution, or "quantile" for quantiles of the estimated distribution (default: "mean"). |
trim |
The proportion of extreme values to trim when calculating the mean (default: 0.05). |
quantiles |
The quantiles to estimate if type is "quantile" (default: 0.1*(1:9)). |
nsample |
The number of samples to draw if type is "sample" (default: 200). |
drop |
A boolean indicating whether to drop dimensions of length 1 from the output (default: TRUE). |
... |
additional arguments (currently ignored) |
Value
A matrix or array of predictions.
Examples
n = 1000
p = 5
X = matrix(rnorm(n*p),ncol=p)
Y = (X[,1]+rnorm(n)*0.1)^2 + (X[,2]+rnorm(n)*0.1) + rnorm(n)*0.1
Xtest = matrix(rnorm(n*p),ncol=p)
Ytest = (Xtest[,1]+rnorm(n)*0.1)^2 + (Xtest[,2]+rnorm(n)*0.1) + rnorm(n)*0.1
## fit engression object
engr = engression(X,Y)
print(engr)
## prediction on test data
Yhat = predict(engr,Xtest,type="mean")
cat("\n correlation between predicted and realized values: ", signif(cor(Yhat, Ytest),3))
plot(Yhat, Ytest,xlab="prediction", ylab="observation")
## quantile prediction
Yhatquant = predict(engr,Xtest,type="quantiles")
ord = order(Yhat)
matplot(Yhat[ord], Yhatquant[ord,], type="l", col=2,lty=1,xlab="prediction", ylab="observation")
points(Yhat[ord],Ytest[ord],pch=20,cex=0.5)
## sampling from estimated model
Ysample = predict(engr,Xtest,type="sample",nsample=1)
Print an Engression Model Object
Description
This function is a utility that displays a summary of a fitted Engression model object.
Usage
## S3 method for class 'engression'
print(x, ...)
Arguments
x |
A trained engression model returned from the engressionfit function. |
... |
additional arguments (currently ignored) |
Value
This function does not return anything. It prints a summary of the model, including information about its architecture and training process, and the loss values achieved at several epochs during training.
Examples
n = 1000
p = 5
X = matrix(rnorm(n*p),ncol=p)
Y = (X[,1]+rnorm(n)*0.1)^2 + (X[,2]+rnorm(n)*0.1) + rnorm(n)*0.1
## fit engression object
engr = engression(X,Y)
print(engr)