dsiNMF
)In this vignette, we consider approximating non-negative multiple matrices as a product of binary (or non-negative) low-rank matrices (a.k.a., factor matrices).
Test data is available from toyModel
.
library("dcTensor")
X <- dcTensor::toyModel("dsiNMF_Easy")
You will see that there are some blocks in the data matrices as follows.
suppressMessages(library("fields"))
layout(t(1:3))
image.plot(X[[1]], main="X1", legend.mar=8)
image.plot(X[[2]], main="X2", legend.mar=8)
image.plot(X[[3]], main="X3", legend.mar=8)
Here, we consider the approximation of \(K\) binary data matrices \(X_{k}\) (\(N \times M_{k}\)) as the matrix product of \(W\) (\(N \times J\)) and \(V_{k}\) (J \(M_{k}\)):
\[ X_{k} \approx W H_{k} \ \mathrm{s.t.}\ W,H_{k} \in \{0,1\} \]
This is the combination of binary matrix factorization (BMF (Z. et a. Zhang 2007)) and simultaneous non-negative matrix decomposition (siNMF (Badea 2008; S. et a. Zhang 2012; Yilmaz 2010; CICHOCK 2009)), which is implemented by adding binary regularization against siNMF.
For the details of arguments of dsiNMF, see ?dsiNMF
. After the calculation, various objects are returned by dsiNMF
.
See also siNMF
function of nnTensor package.
In BSMF, a rank parameter \(J\) (\(\leq \min(N, M)\)) is needed to be set in advance. Other settings such as the number of iterations (num.iter
) or factorization algorithm (algorithm
) are also available. For the details of arguments of dsiNMF, see ?dsiNMF
. After the calculation, various objects are returned by dsiNMF
. BSMF is achieved by specifying the binary regularization parameter as a large value like the below:
set.seed(123456)
out_dsiNMF <- dsiNMF(X, Bin_W=1E+1, Bin_H=c(1E+1, 1E+1, 1E+1), J=3)
str(out_dsiNMF, 2)
## List of 6
## $ W : num [1:100, 1:3] 0.000534 0.000534 0.000534 0.000534 0.000534 ...
## $ H :List of 3
## ..$ : num [1:300, 1:3] 1.37e-10 8.92e-11 1.77e-10 3.42e-10 9.71e-11 ...
## ..$ : num [1:200, 1:3] 1.11e-10 3.15e-10 1.14e-10 1.58e-10 5.23e-10 ...
## ..$ : num [1:150, 1:3] 0.998 0.998 0.998 0.998 0.998 ...
## $ RecError : Named num [1:101] 1.00e-09 1.27e+02 1.16e+02 1.11e+02 1.09e+02 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TrainRecError: Named num [1:101] 1.00e-09 1.27e+02 1.16e+02 1.11e+02 1.09e+02 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ RelChange : Named num [1:101] 1.00e-09 5.25e-01 9.17e-02 4.06e-02 2.12e-02 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
The reconstruction error (RecError
) and relative error (RelChange
, the amount of change from the reconstruction error in the previous step) can be used to diagnose whether the calculation is converged or not.
layout(t(1:2))
plot(log10(out_dsiNMF$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_dsiNMF$RelChange[-1]), type="b", main="Relative Change")
The products of \(W\) and \(H_{k}\)s show whether the original data marices are well-recovered by dsiNMF
.
recX <- lapply(seq_along(X), function(x){
out_dsiNMF$W %*% t(out_dsiNMF$H[[x]])
})
layout(rbind(1:3, 4:6))
image.plot(X[[1]], main="X1", legend.mar=8)
image.plot(X[[2]], main="X2", legend.mar=8)
image.plot(X[[3]], main="X3", legend.mar=8)
image.plot(recX[[1]], main="Reconstructed X1", legend.mar=8)
image.plot(recX[[2]], main="Reconstructed X2", legend.mar=8)
image.plot(recX[[3]], main="Reconstructed X3", legend.mar=8)
The histograms of \(H_{k}\)s show that \(H_{k}\)s look binary.
layout(rbind(1:2, 3:4))
hist(out_dsiNMF$W, main="W", breaks=100)
hist(out_dsiNMF$H[[1]], main="H1", breaks=100)
hist(out_dsiNMF$H[[2]], main="H2", breaks=100)
hist(out_dsiNMF$H[[3]], main="H3", breaks=100)
Semi-Binary Simultaneous Matrix Factorization (SBSMF) is an extension of BSMF; we can select specific factor matrix (or matrices).
To demonstrate SBSMF, next we use non-negative matrices from the nnTensor
package.
suppressMessages(library("nnTensor"))
X2 <- nnTensor::toyModel("siNMF_Easy")
layout(t(1:3))
image.plot(X2[[1]], main="X1", legend.mar=8)
image.plot(X2[[2]], main="X2", legend.mar=8)
image.plot(X2[[3]], main="X3", legend.mar=8)
In SBSMF, a rank parameter \(J\) (\(\leq \min(N, M)\)) is needed to be set in advance. Other settings such as the number of iterations (num.iter
) or factorization algorithm (algorithm
) are also available. For the details of arguments of dsiNMF, see ?dsiNMF
. After the calculation, various objects are returned by dsiNMF
. SBSMF is achieved by specifying the binary regularization parameter as a large value like the below:
set.seed(123456)
out_dsiNMF2 <- dsiNMF(X2, Bin_W=1E+2, J=3)
str(out_dsiNMF2, 2)
## List of 6
## $ W : num [1:100, 1:3] 0.0988 0.1006 0.1057 0.1023 0.1003 ...
## $ H :List of 3
## ..$ : num [1:300, 1:3] 4.32e-10 2.58e-10 6.12e-10 1.84e-09 3.78e-10 ...
## ..$ : num [1:200, 1:3] 5.59e-15 2.28e-14 2.49e-14 2.74e-14 8.56e-14 ...
## ..$ : num [1:150, 1:3] 95.6 92.7 94 96.2 95.1 ...
## $ RecError : Named num [1:101] 1.00e-09 1.18e+04 1.14e+04 1.09e+04 1.08e+04 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TrainRecError: Named num [1:101] 1.00e-09 1.18e+04 1.14e+04 1.09e+04 1.08e+04 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ RelChange : Named num [1:101] 1.00e-09 1.07e-01 3.54e-02 4.11e-02 1.25e-02 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
RecError
and RelChange
can be used to diagnose whether the calculation is converged or not.
layout(t(1:2))
plot(log10(out_dsiNMF2$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_dsiNMF2$RelChange[-1]), type="b", main="Relative Change")
The products of \(W\) and \(H_{k}\)s show whether the original data is well-recovered by dsiNMF
.
recX <- lapply(seq_along(X2), function(x){
out_dsiNMF2$W %*% t(out_dsiNMF2$H[[x]])
})
layout(rbind(1:3, 4:6))
image.plot(X2[[1]], main="X1", legend.mar=8)
image.plot(X2[[2]], main="X2", legend.mar=8)
image.plot(X2[[3]], main="X3", legend.mar=8)
image.plot(recX[[1]], main="Reconstructed X1", legend.mar=8)
image.plot(recX[[2]], main="Reconstructed X2", legend.mar=8)
image.plot(recX[[3]], main="Reconstructed X3", legend.mar=8)
The histograms of \(H_{k}\)s show that all the factor matrices \(H_{k}\)s look binary.
layout(rbind(1:2, 3:4))
hist(out_dsiNMF2$W, breaks=100)
hist(out_dsiNMF2$H[[1]], main="H1", breaks=100)
hist(out_dsiNMF2$H[[2]], main="H2", breaks=100)
hist(out_dsiNMF2$H[[3]], main="H3", breaks=100)
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Rocky Linux 9.5 (Blue Onyx)
##
## Matrix products: default
## BLAS: /opt/R/4.4.3/lib64/R/lib/libRblas.so
## LAPACK: /opt/R/4.4.3/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Asia/Tokyo
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] nnTensor_1.3.0 fields_16.3.1 viridisLite_0.4.2 spam_2.11-1
## [5] dcTensor_1.3.1
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_1.8.9 dplyr_1.1.4 compiler_4.4.3
## [5] maps_3.4.3 tidyselect_1.2.1 Rcpp_1.1.0 plot3D_1.4.2
## [9] tagcloud_0.7.0 jquerylib_0.1.4 scales_1.3.0 yaml_2.3.10
## [13] fastmap_1.2.0 ggplot2_3.5.1 R6_2.6.1 generics_0.1.3
## [17] tcltk_4.4.3 knitr_1.50 MASS_7.3-65 dotCall64_1.1-1
## [21] misc3d_0.9-1 tibble_3.3.0 munsell_0.5.1 pillar_1.10.1
## [25] bslib_0.9.0 RColorBrewer_1.1-3 rlang_1.1.6 cachem_1.1.0
## [29] xfun_0.53 sass_0.4.10 cli_3.6.5 magrittr_2.0.3
## [33] digest_0.6.37 grid_4.4.3 rTensor_1.4.9 lifecycle_1.0.4
## [37] vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0 colorspace_2.1-1
## [41] rmarkdown_2.29 pkgconfig_2.0.3 tools_4.4.3 htmltools_0.5.8.1
Badea, L. 2008. “Extracting Gene Expression Profiles Common to Colon and Pancreatic Adenocarcinoma Using Simultaneous Nonnegative Matrix Factorization.” Pacific Symposium on Biocomputing, 279–90.
CICHOCK, A. et al. 2009. Nonnegative Matrix and Tensor Factorizations. Wiley.
Yilmaz, Y. K. 2010. “Probabilistic Latent Tensor Factorization.” IVA/ICA 2010, 346–53.
Zhang, S. et al. 2012. “Discovery of Multi-Dimensional Modules by Integrative Analysis of Cancer Genomic Data.” Nucleic Acids Research 40(19): 9379–91.
Zhang, Z. et al. 2007. “Binary Matrix Factorization with Applications.” ICDM 2007, 391–400.