Version: | 2.2 |
Title: | Fast Embedding Guided by Self-Organizing Map |
Depends: | R (≥ 3.2) |
Suggests: | knitr, rmarkdown |
Imports: | ggplot2, igraph, Matrix, Rtsne, umap, uwot |
Description: | Provides a smooth mapping of multidimensional points into low-dimensional space defined by a self-organizing map. Designed to work with 'FlowSOM' and flow-cytometry use-cases. See Kratochvil et al. (2019) <doi:10.12688/f1000research.21642.1>. |
License: | GPL (≥ 3) |
URL: | https://github.com/exaexa/EmbedSOM |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-01-31 20:08:19 UTC; exa |
Author: | Mirek Kratochvil [aut, cre], Sofie Van Gassen [cph], Britt Callebaut [cph], Yvan Saeys [cph], Ron Wehrens [cph] |
Maintainer: | Mirek Kratochvil <exa.exa@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-01-31 22:30:02 UTC |
An acceptable cluster color palette
Description
An acceptable cluster color palette
Usage
ClusterPalette(n, vcycle = c(1, 0.7), scycle = c(0.7, 1), alpha = 1)
Arguments
n |
How many colors to generate |
vcycle , scycle |
Small vectors with cycles of saturation/value for hsv |
alpha |
Opacity of the colors |
Examples
EmbedSOM::ClusterPalette(10)
Process the cells with SOM into a nice embedding
Description
Process the cells with SOM into a nice embedding
Usage
EmbedSOM(
data = NULL,
map = NULL,
fsom = NULL,
smooth = NULL,
k = NULL,
adjust = NULL,
importance = NULL,
coordsFn = NULL,
coords = NULL,
emcoords = NULL,
emcoords.pow = 1,
parallel = F,
threads = if (parallel) 0 else 1
)
Arguments
data |
Data matrix with points that optionally overrides the one from |
map |
Map object in FlowSOM format, to optionally override |
fsom |
FlowSOM object with a built SOM (used if data or map are missing) |
smooth |
Produce smoother (positive values) or more rough approximation (negative values). |
k |
How many neighboring landmarks (e.g. SOM nodes) to take into the whole computation |
adjust |
How much non-local information to remove from the approximation |
importance |
Scaling of the landmarks, will be used to scale the incoming data (should be same as used for training the SOM or to select the landmarks) |
coordsFn |
A coordinates-generating function (e.g. |
coords |
A matrix of embedding-space coordinates that correspond to |
emcoords |
Provided for backwards compatibility, will be removed. Use |
emcoords.pow |
Provided for backwards compatibility, will be removed. Use a parametrized |
parallel |
Boolean flag whether the computation should be parallelized (this flag is just a nice name for |
threads |
Number of threads used for computation, 0 chooses hardware concurrency, 1 (default) turns off parallelization. |
Value
matrix with 2D or 3D coordinates of the embedded data
, depending on the map
Examples
d <- cbind(rnorm(10000), 3*runif(10000), rexp(10000))
colnames(d) <- paste0("col",1:3)
map <- EmbedSOM::SOM(d, xdim=10, ydim=10)
e <- EmbedSOM::EmbedSOM(data=d, map=map)
EmbedSOM::PlotEmbed(e, data=d, 'col1', pch=16)
Generate colors for multi-color marker expression labeling in a single plot
Description
Generate colors for multi-color marker expression labeling in a single plot
Usage
ExprColors(
exprs,
base = exp(1),
scale = 1,
cutoff = 0,
pow = NULL,
col = ClusterPalette(dim(exprs)[2], alpha = alpha),
nocolor = grDevices::rgb(0.75, 0.75, 0.75, alpha/2),
alpha = 0.5
)
Arguments
exprs |
Matrix-like object with marker expressions (extract it manually from your data) |
base , scale |
Base(s) and scale(s) for softmax (convertible to numeric vectors of size |
cutoff |
Gray level (expressed in sigmas of the sample distribution) |
pow |
Obsolete, now renamed to |
col |
Colors to use, defaults to colors taken from 'ClusterPalette' |
nocolor |
The color to use for sub-gray-level expression, default gray. |
alpha |
Default alpha value. |
Examples
d <- cbind(rnorm(1e5), rexp(1e5))
EmbedSOM::PlotEmbed(d, col=EmbedSOM::ExprColors(d, pow=2))
The ggplot2 scale gradient from ExpressionPalette.
Description
The ggplot2 scale gradient from ExpressionPalette.
Usage
ExpressionGradient(...)
Arguments
... |
Arguments passed to |
Examples
library(EmbedSOM)
library(ggplot2)
# simulate a simple dataset
e <- cbind(rnorm(10000),rnorm(10000))
data <- data.frame(Val=log(1+e[,1]^2+e[,2]^2))
PlotGG(e, data=data) +
geom_point(aes_string(color="Val"), alpha=.5) +
ExpressionGradient(guide=FALSE)
Marker expression palette generator based off ColorBrewer's RdYlBu, only better for plotting of half-transparent cells
Description
Marker expression palette generator based off ColorBrewer's RdYlBu, only better for plotting of half-transparent cells
Usage
ExpressionPalette(n, alpha = 1)
Arguments
n |
How many colors to generate |
alpha |
Opacity of the colors |
Examples
EmbedSOM::ExpressionPalette(10)
Train a Growing Quadtree Self-Organizing Map
Description
Train a Growing Quadtree Self-Organizing Map
Usage
GQTSOM(
data,
init.dim = c(3, 3),
target_codes = 100,
rlen = 10,
radius = c(sqrt(sum(init.dim^2)), 0.5),
epochRadii = seq(radius[1], radius[2], length.out = rlen),
coords = NULL,
codes = NULL,
coordsFn = NULL,
importance = NULL,
distf = 2,
nhbr.distf = 2,
noMapping = F,
parallel = F,
threads = if (parallel) 0 else 1
)
Arguments
data |
Input data matrix |
init.dim |
Initial size of the SOM, default |
target_codes |
Make the SOM grow linearly to at most this amount of nodes (default |
rlen |
Number of training iterations |
radius |
Start and end training radius, as in |
epochRadii |
Precise radii for each epoch (must be of length |
coords |
Quadtree coordinates of the initial SOM nodes. |
codes |
Initial codebook |
coordsFn |
Function to generate/transform grid coordinates (e.g. |
importance |
Weights of input data dimensions |
distf |
Distance measure to use in input data space (1=manhattan, 2=euclidean, 3=chebyshev, 4=cosine) |
nhbr.distf |
Distance measure to use in output space (as in |
noMapping |
If |
parallel |
Parallelize the training by setting appropriate |
threads |
Number of threads to use for training. Defaults to 0 (chooses maximum available hardware threads) if |
Add Kamada-Kawai-generated embedding coordinates to the map
Description
This uses a complete graph on the map codebook, which brings overcrowding problems. It is therefore useful to transform the distances for avoiding that (e.g. by exponentiating them slightly using distFn function).
Usage
GraphCoords(
dim = NULL,
dist.method = NULL,
distFn = function(x) x,
layoutFn = igraph::layout_with_kk
)
Arguments
dim |
Dimension of the result (passed to |
dist.method |
The method to compute distances, passed to |
distFn |
Custom transformation function of the distance matrix |
layoutFn |
iGraph-compatible graph layouting function (default igraph::layout_with_kk) |
Value
a function that transforms the map, usable as coordsFn
parameter
Create a grid from first 2 PCA components
Description
Create a grid from first 2 PCA components
Usage
Initialize_PCA(data, xdim, ydim, zdim = NULL)
Arguments
data |
matrix in which each row represents a point |
xdim , ydim , zdim |
Dimensions of the SOM grid |
Value
array containing the selected selected rows
Add MST-style embedding coordinates to the map
Description
Add MST-style embedding coordinates to the map
Usage
MSTCoords(
dim = NULL,
dist.method = NULL,
distFn = function(x) x,
layoutFn = igraph::layout_with_kk
)
Arguments
dim |
Dimension of the result (passed to layoutFn) |
dist.method |
The method to compute distances, passed to |
distFn |
Custom transformation function of the distance matrix |
layoutFn |
iGraph-compatible graph layouting function (default |
Value
a function that transforms the map, usable as coordsFn
parameter
Assign nearest node to each datapoint
Description
Assign nearest node to each datapoint
Usage
MapDataToCodes(
codes,
data,
distf = 2,
parallel = F,
threads = if (parallel) 0 else 1
)
Arguments
codes |
matrix with nodes of the SOM |
data |
datapoints to assign |
distf |
Distance function (1=manhattan, 2=euclidean, 3=chebyshev, 4=cosine) |
threads , parallel |
Use parallel computation (see |
Value
array with nearest node id for each datapoint
Helper for computing colors for embedding plots
Description
Helper for computing colors for embedding plots
Usage
NormalizeColor(data, low = NULL, high = NULL, pow = 0, sds = 1)
Arguments
data |
Vector of scalar values to normalize between 0 and 1 |
low , high |
Originally quantiles for clamping the color. Only kept for backwards compatibility, now ignored. |
pow |
The scaled data are transformed to data^(2^pow). If set to 0, nothing happens. Positive values highlight differences in the data closer to 1, negative values highlight differences closer to 0. |
sds |
Inverse scale factor for measured standard deviation (greater value makes data look more extreme) |
Examples
EmbedSOM::NormalizeColor(c(1,100,500))
Export a data frame for plotting with marker intensities and density.
Description
Export a data frame for plotting with marker intensities and density.
Usage
PlotData(
embed,
fsom,
data = fsom$data,
cols,
names,
normalize = cols,
pow = 0,
sds = 1,
vf = PlotId,
density = "Density",
densBins = 256,
densLimit = NULL,
fdens = sqrt
)
Arguments
embed , fsom , data , cols |
The embedding data, columns to select |
names |
Column names for output |
normalize |
List of columns to normalize using |
pow , sds |
Parameters for the normalization |
vf |
Custom value-transforming function |
density |
Name of the density column |
densBins |
Number of bins for density calculation |
densLimit |
Upper limit of density (prevents outliers) |
fdens |
Density-transforming function; default sqrt |
Default plot
Description
Default plot
Usage
PlotDefault(pch = ".", cex = 1, ...)
Arguments
pch , cex , ... |
correctly defaulted and passed to 'plot' |
Helper function for plotting the embedding
Description
Convenience plotting function. Takes the embed
matrix which is the output of
EmbedSOM()
, together with a multitude of arguments that set how the plotting
is done.
Usage
PlotEmbed(
embed,
value = 0,
red = 0,
green = 0,
blue = 0,
fr = PlotId,
fg = PlotId,
fb = PlotId,
fv = PlotId,
powr = 0,
powg = 0,
powb = 0,
powv = 0,
sdsr = 1,
sdsg = 1,
sdsb = 1,
sdsv = 1,
clust = NULL,
nbin = 256,
maxDens = NULL,
fdens = sqrt,
limit = NULL,
alpha = NULL,
fsom,
data,
col,
cluster.colors = ClusterPalette,
expression.colors = ExpressionPalette,
na.color = grDevices::rgb(0.75, 0.75, 0.75, if (is.null(alpha)) 0.5 else alpha/2),
plotf = PlotDefault,
...
)
Arguments
embed |
The embedding from |
value |
The column of |
red , green , blue |
The same, for individual RGB components |
fv , fr , fg , fb |
Functions to transform the values before they are normalized |
powv , powr , powg , powb |
Passed to corresponding |
sdsv , sdsr , sdsg , sdsb |
Passed to |
clust |
Cluster labels (used as a factor) |
nbin , maxDens , fdens |
Parameters of density calculation, see |
limit |
Low/high offset for |
alpha |
Default alpha value of points |
fsom |
FlowSOM object |
data |
Data matrix, taken from |
col |
Overrides the computed point colors with exact supplied colors. |
cluster.colors |
Function to generate cluster colors, default |
expression.colors |
Function to generate expression color scale, default |
na.color |
Color to assign to |
plotf |
Plot function, defaults to |
... |
Extra params passed to the plot function |
Examples
EmbedSOM::PlotEmbed(cbind(rnorm(1e5),rnorm(1e5)))
Wrap PlotData result in ggplot object.
Description
This creates a ggplot2 object for plotting.
Usage
PlotGG(embed, ...)
Arguments
embed |
Embedding data |
... |
Extra arguments passed to |
Examples
library(EmbedSOM)
library(ggplot2)
# simulate a simple dataset
e <- cbind(rnorm(10000),rnorm(10000))
PlotGG(e, data=data.frame(Expr=runif(10000))) +
geom_point(aes_string(color="Expr"))
Identity on whatever
Description
Identity on whatever
Usage
PlotId(x)
Arguments
x |
Just the x. |
Value
The x.
Create a map by randomly selecting points
Description
Create a map by randomly selecting points
Usage
RandomMap(data, k, coordsFn)
Arguments
data |
Input data matrix, with individual data points in rows |
k |
How many points to sample |
coordsFn |
a function to generate embedding coordinates (default none) |
Value
map object (without the grid, if coordsFn
was not specified)
Examples
d <- iris[,1:4]
EmbedSOM::PlotEmbed(
EmbedSOM::EmbedSOM(
data = d,
map = EmbedSOM::RandomMap(d, 30, EmbedSOM::GraphCoords())),
pch=19, clust=iris[,5]
)
Build a self-organizing map
Description
Build a self-organizing map
Usage
SOM(
data,
xdim = 10,
ydim = 10,
zdim = NULL,
batch = F,
rlen = 10,
alphaA = c(0.05, 0.01),
radiusA = stats::quantile(nhbrdist, 0.67) * c(1, 0),
alphaB = alphaA * c(-negAlpha, -0.1 * negAlpha),
radiusB = negRadius * radiusA,
negRadius = 1.33,
negAlpha = 0.1,
epochRadii = seq(radiusA[1], radiusA[2], length.out = rlen),
init = FALSE,
initf = Initialize_PCA,
distf = 2,
codes = NULL,
importance = NULL,
coordsFn = NULL,
nhbr.method = "maximum",
noMapping = F,
parallel = F,
threads = if (parallel) 0 else 1
)
Arguments
data |
Matrix containing the training data |
xdim |
Width of the grid |
ydim |
Hight of the grid |
zdim |
Depth of the grid, causes the grid to be 3D if set |
batch |
Use batch training (default |
rlen |
Number of training epochs; or number of times to loop over the training data in online training |
alphaA |
Start and end learning rate for online learning (only for online training) |
radiusA |
Start and end radius |
alphaB |
Start and end learning rate for the second radius (only for online training) |
radiusB |
Start and end radius (only for online training; make sure it is larger than radiusA) |
negRadius |
easy way to set radiusB as a multiple of default radius (use lower value for higher dimensions) |
negAlpha |
the same for alphaB |
epochRadii |
Vector of length |
init |
Initialize cluster centers in a non-random way |
initf |
Use the given initialization function if init==T (default: Initialize_PCA) |
distf |
Distance function (1=manhattan, 2=euclidean, 3=chebyshev, 4=cosine) |
codes |
Cluster centers to start with |
importance |
array with numeric values. Columns of |
coordsFn |
Function to generate/transform grid coordinates (e.g. |
nhbr.method |
Way of computing grid distances, passed as |
noMapping |
If TRUE, do not compute the mapping (default FALSE). Makes the process quicker by 1 |
parallel |
Parallelize the batch training by setting appropriate |
threads |
Number of threads of the batch training (has no effect on online training). Defaults to 0 (chooses maximum available hardware threads) if |
Value
A map useful for embedding (EmbedSOM()
function) or further analysis, e.g. clustering.
See Also
FlowSOM::SOM
Add UMAP-based coordinates to a map
Description
Add UMAP-based coordinates to a map
Usage
UMAPCoords(dim = NULL, UMAPFn = NULL)
Arguments
dim |
Dimension of the result (passed to |
UMAPFn |
UMAP function to run (default umap::umap configured by umap::umap.defaults) |
Value
a function that transforms the map, usable as coordsFn
parameter
Add U-Matrix-optimized embedding coordinates to the map
Description
The map must already contain a SOM grid with corresponding xdim
,ydim
(possibly zdim
)
Usage
UMatrixCoords(
dim = NULL,
dist.method = NULL,
distFn = function(x) x,
layoutFn = igraph::layout_with_kk
)
Arguments
dim |
Dimension of the result (passed to |
dist.method |
The method to compute distances, passed to |
distFn |
Custom transformation function of the distance matrix |
layoutFn |
iGraph-compatible graph layouting function (default igraph::layout_with_kk) |
Value
a function that transforms the map, usable as 'coordsFn' parameter
Create a map from k-Means clusters
Description
May give better results than 'RandomMap' on data where random sampling
is complicated.
This does not use actual kMeans clustering, but re-uses the batch version of
SOM()
with tiny radius (which makes it work the same as kMeans). In
consequence, the speedup of SOM function is applied here as well. Additionally,
because we don't need that amount of clustering precision, parameters ‘batch=F, rlen=1’
may give a satisfactory result very quickly.
Usage
kMeansMap(data, k, coordsFn, batch = T, ...)
Arguments
data |
Input data matrix, with individual data points in rows |
k |
How many points to sample |
coordsFn |
a function to generate embedding coordinates (default none) |
batch |
Use batch-SOM training (effectively kMeans, default TRUE) |
... |
Passed to |
Value
map object (without the grid, if coordsFn was not specified)
Examples
d <- iris[,1:4]
EmbedSOM::PlotEmbed(
EmbedSOM::EmbedSOM(
data = d,
map = EmbedSOM::kMeansMap(d, 10, EmbedSOM::GraphCoords())),
pch=19, clust=iris[,5]
)
Add KNN-topology-based embedding coordinates to the map
Description
Internally, this does not use FNN::get.knn()
anymore.
Usage
kNNCoords(
k = 4,
dim = NULL,
dist.method = NULL,
distFn = function(x) x,
layoutFn = igraph::layout_with_kk
)
Arguments
k |
Size of the neighborhoods (default 4) |
dim |
Dimension of the result (passed to |
dist.method |
The method to compute distances, passed to |
distFn |
Custom transformation function of the distance matrix |
layoutFn |
iGraph-compatible graph layouting function (default igraph::layout_with_kk) |
Value
a function that transforms the map, usable as coordsFn
parameter
Add tSNE-based coordinates to a map
Description
Add tSNE-based coordinates to a map
Usage
tSNECoords(dim = NULL, tSNEFn = Rtsne::Rtsne, ...)
Arguments
dim |
Dimension of the result (passed to |
tSNEFn |
tSNE function to run (default Rtsne::Rtsne) |
... |
passed to |
Value
a function that transforms the map, usable as coordsFn
parameter
Add UMAP-based coordinates to a map, using the 'uwot' package
Description
Add UMAP-based coordinates to a map, using the 'uwot' package
Usage
uwotCoords(dim = NULL, uwotFn = uwot::umap, ...)
Arguments
dim |
Dimension of the result (passed to |
uwotFn |
UMAP function to run (default uwot::umap) |
... |
passed to |
Value
a function that transforms the map, usable as coordsFn
parameter