Title: | Hilbert Similarity Index for High Dimensional Data |
---|---|
Description: | Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure. |
Authors: | Yann Abraham [aut, cre], Marilisa Neri [aut], John Skilling [ctb] |
Maintainer: | Yann Abraham <[email protected]> |
License: | CC BY-NC-SA 4.0 |
Version: | 0.4.3.9000 |
Built: | 2024-11-03 03:36:05 UTC |
Source: | https://github.com/yannabraham/hilbertsimilarity |
Add new manual cuts to the cuts
matrix generated using make.cut
add.cut(cuts, new.cuts, cut.id = "manual", update = FALSE)
add.cut(cuts, new.cuts, cut.id = "manual", update = FALSE)
cuts |
a list of cuts generated using |
new.cuts |
a list of new cut thresholds to be added to |
cut.id |
string identifying the new cuts |
update |
if FALSE (the default) adding a |
The matrix can be cut using either the fixed cuts (type='fixed'
), or the combined cuts (type='combined'
)
where the limits have been adjusted to match local minima and maxima.
an updated cuts
matrix with an extra set of thresholds named cut.id
.
Yann Abraham
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
Use a Fourier series to project the Hilbert curve, based on the number of points per Hilbert index. See Wikipedia - Andrews plot for a description of the method.
andrewsProjection(x, breaks = 30)
andrewsProjection(x, breaks = 30)
x |
a matrix of counts, where rows correspond to samples and columns to Hilbert index |
breaks |
the number of points used to display the Andrews curve |
The Andrews curve corresponds to a projection of each item to where
t (the Andrews index) varies between
and
.
a list with 2 items:
freq : a matrix with breaks
rows and ncol(x)
columns containing the Andrews vector for projection
i : a vector with breaks
elements corresponding to the Andrews indices
Yann Abraham
# generate a random matrix ncols <- 5 mat <- matrix(rnorm(ncols*1000),ncol=ncols) dimnames(mat)[[2]] <- LETTERS[seq(ncols)] # generate categories conditions <- sample(letters[1:3],nrow(mat),replace = TRUE) # generate 4 bins with a minimum bin size of 5 horder <- 4 cuts <- make.cut(mat,n=horder+1,count.lim=5) # Generate the cuts and compute the Hilbert index cut.mat <- do.cut(mat,cuts,type='fixed') hc <- do.hilbert(cut.mat,horder) # compute hilbert index per condition condition.mat <- table(conditions,hc) condition.pc <- apply(condition.mat,1,function(x) x/sum(x)) condition.pc <- t(condition.pc) # project the matrix to the Andrews curve av <- andrewsProjection(condition.pc) proj <- condition.pc %*% t(av$freq) plot(range(av$i), range(proj), type='n', xlab='Andrews index', ylab='Projection') for(i in seq(nrow(proj))) { lines(av$i, proj[i,], col=i) } legend('bottomleft', legend=letters[1:3], col=seq(1,3), pch=16, bty='n')
# generate a random matrix ncols <- 5 mat <- matrix(rnorm(ncols*1000),ncol=ncols) dimnames(mat)[[2]] <- LETTERS[seq(ncols)] # generate categories conditions <- sample(letters[1:3],nrow(mat),replace = TRUE) # generate 4 bins with a minimum bin size of 5 horder <- 4 cuts <- make.cut(mat,n=horder+1,count.lim=5) # Generate the cuts and compute the Hilbert index cut.mat <- do.cut(mat,cuts,type='fixed') hc <- do.hilbert(cut.mat,horder) # compute hilbert index per condition condition.mat <- table(conditions,hc) condition.pc <- apply(condition.mat,1,function(x) x/sum(x)) condition.pc <- t(condition.pc) # project the matrix to the Andrews curve av <- andrewsProjection(condition.pc) proj <- condition.pc %*% t(av$freq) plot(range(av$i), range(proj), type='n', xlab='Andrews index', ylab='Projection') for(i in seq(nrow(proj))) { lines(av$i, proj[i,], col=i) } legend('bottomleft', legend=letters[1:3], col=seq(1,3), pch=16, bty='n')
Apply cuts generated using the make.cut
function to the reference matrix
do.cut(mat, cuts, type = "combined")
do.cut(mat, cuts, type = "combined")
mat |
the matrix to cut |
cuts |
a list of cuts generated using |
type |
the type of cuts to use (use |
The matrix can be cut using either the fixed cuts (type='fixed'
), or the combined cuts (type='combined'
)
where the limits have been adjusted to match local minima and maxima.
Returned values correspond to the bin defined between the first and second threshold of the specified cuts
,
then between the second and third threshold, and so on. The values will range between 0 (the first bin) and n-1
where
n
is the number of values in the specified cuts
.
a matrix of the same dimensionality as mat
where values correspond to bins defined by the type
thresholds defined cuts
.
Yann Abraham
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
Generate the Hilbert Index corresponding to the sub-spaces defined by the coordinates
generated via do.cut
do.hilbert(mat, horder)
do.hilbert(mat, horder)
mat |
the cut reference matrix |
horder |
the Hilbert order, i.e. the number of bins in each dimension |
For each line in mat
, the function will compute the corresponding
Hilbert index. Each index corresponds to a specific
sub-cube of the original high-dimensional space, and consecutive hilbert index correspond to adjacent sub-cubes
a vector of indices, one for each line in mat
Marilisa Neri
Yann Abraham
John Skilling (for the original C
function)
# generate a random 3D matrix mat <- matrix(rnorm(300),ncol=3) dimnames(mat)[[2]] <- LETTERS[1:3] # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat) # generate the Hilber index hc <- do.hilbert(cut.mat,2) plot(table(hc),type='l')
# generate a random 3D matrix mat <- matrix(rnorm(300),ncol=3) dimnames(mat)[[2]] <- LETTERS[1:3] # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat) # generate the Hilber index hc <- do.hilbert(cut.mat,2) plot(table(hc),type='l')
Estimate the Hilbert order, or the number of bins in each dimension, so that if the matrix was random every row in the matrix would correspond to a single bin.
hilbert.order(mat)
hilbert.order(mat)
mat |
the matrix for which to estimate the Hilbert order |
Assuming the matrix is fully random, there is no need to generate more voxels (the combination of bins over all dimensions) than there are rows in the matrix. The number can be derived from the following formula:
where c is the number of bins, d is the number of dimensions and N is the total number of cells in the dataset. c can be computed easily using the following formula:
The number of cuts for do.cut
is the number of bins plus 1.
the suggested number of bins to use for the specified mat
.
Yann Abraham
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
hilbertMapping
will compute the Hilbert index for each
row of a matrix of integer coordinates corresponding to sub-cubes in a high dimensional space.
x |
a matrix of a matrix of integer coordinates (see |
bits |
the hilbert order, i.e. the number of cuts in each dimension |
Functions: TransposetoAxes AxestoTranspose Purpose: Transform in-place between Hilbert transpose and geometrical axes Example: b=5 bits for each of n=3 coordinates. 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored as its Transpose X[0] = A D G J M X[2]| X[1] = B E H K N <——-> | /X[1] X[2] = C F I L O axes |/ high low 0—— X[0] Axes are stored conventionally as b-bit integers. Author: John Skilling 20 Apr 2001 to 11 Oct 2003
The source code includes the correction suggested in the following StackOverflow discussion.
a vector of hilbert index, one for each line in x
Marilisa Neri
Yann Abraham
John Skilling
Starting from a Hilbert Index generated in a high dimensional space, returns a set of coordinates in a new (lower) dimensional space
hilbertProjection(hc, target = 2)
hilbertProjection(hc, target = 2)
hc |
the hilbert index returned by |
target |
the number of dimensions in the target space (defaults to 2) |
Based on the maximum index and the targeted number of dimensions the number of target bins is computed and used to generate a reference matrix and a reference index. The reference matrix is returned, ordered by the reference index.
a matrix with target
columns, corresponding to
the projection of each Hilbert index to target
dimensions
Marilisa Neri
Yann Abraham
John Skilling (for the original C
function)
# generate a random matrix ncols <- 5 mat <- matrix(rnorm(ncols*5000),ncol=ncols) dimnames(mat)[[2]] <- LETTERS[seq(ncols)] # generate 4 bins with a minimum bin size of 5 horder <- 4 cuts <- make.cut(mat,n=horder+1,count.lim=5) # Generate the cuts and compute the Hilbert index cut.mat <- do.cut(mat,cuts,type='fixed') hc <- do.hilbert(cut.mat,horder) chc <- table(hc) idx <- as.numeric(names(chc)) # project the matrix to 2 dimensions proj <- hilbertProjection(hc) # visualize the result img <- matrix(0,ncol=max(proj[,2])+1,nrow = max(proj[,1])+1) img[proj[idx,]+1] <- chc image(img)
# generate a random matrix ncols <- 5 mat <- matrix(rnorm(ncols*5000),ncol=ncols) dimnames(mat)[[2]] <- LETTERS[seq(ncols)] # generate 4 bins with a minimum bin size of 5 horder <- 4 cuts <- make.cut(mat,n=horder+1,count.lim=5) # Generate the cuts and compute the Hilbert index cut.mat <- do.cut(mat,cuts,type='fixed') hc <- do.hilbert(cut.mat,horder) chc <- table(hc) idx <- as.numeric(names(chc)) # project the matrix to 2 dimensions proj <- hilbertProjection(hc) # visualize the result img <- matrix(0,ncol=max(proj[,2])+1,nrow = max(proj[,1])+1) img[proj[idx,]+1] <- chc image(img)
This package provides a method to compute similarity between single cell samples in high dimensional space. After dividing the space into voxels, each sample is summarized as a number of cells per voxel. Voxels are ordered using a Hilbert curve, so that each sample can be represented as a 1-dimensional density plot. the distance between 2 samples corresponds to the Jensen Shannon distance between the 2 probability vectors.
# generate 3 samples over 5 dimensions # sample 1 and 2 are similar, sample 3 has an extra population # set the seed for reproducible examples set.seed(1234) my.samples <- lapply(LETTERS[1:3],function(j) { # each sample has a different number of events n <- floor(runif(1,0.5,0.8)*10000) # matrix is random normal over 5 dimensions cur.mat <- matrix(rnorm(5*n),ncol=5) # rescale cur.mat to a [0,3] interval cur.mat <- 3*(cur.mat-min(cur.mat))/diff(range(cur.mat)) dimnames(cur.mat)[[2]] <- LETTERS[(length(LETTERS)-4):length(LETTERS)] if(j=='C') { # select 30% of the points cur.rws <- sample(n,round(n*0.3,0)) # select 2 columns at random cur.cls <- sample(ncol(cur.mat),2) # create an artificial sub population cur.mat[cur.rws,cur.cls] <- 4*cur.mat[cur.rws,cur.cls] } return(cur.mat) } ) names(my.samples) <- LETTERS[1:3] # check the population size lapply(my.samples,nrow) # assemble a sample matrix my.samples.mat <- do.call('rbind',my.samples) my.samples.id <- lapply(names(my.samples), function(cur.spl) rep(cur.spl,nrow(my.samples[[cur.spl]]))) my.samples.id <- unlist(my.samples.id) # Estimate the maximum required Hilbert order hilbert.order(my.samples.mat) # Estimate the cut positions my.cuts <- make.cut(my.samples.mat,n=5,count.lim=5) # Visualize the cuts show.cut(my.cuts) # Cut the matrix & compute the hilbert index my.samples.cut <- do.cut(my.samples.mat,my.cuts,type='combined') system.time(my.samples.index <- do.hilbert(my.samples.cut,horder=4)) # Visualize samples as density plots my.samples.dens <- density(my.samples.index) my.samples.dens$y <- (my.samples.dens$y-min(my.samples.dens$y))/diff(range(my.samples.dens$y)) plot(my.samples.dens,col='grey3',lty=2) ksink <- lapply(names(my.samples),function(cur.spl) { cat(cur.spl,'\n') cur.dens <- density(my.samples.index[my.samples.id==cur.spl], bw=my.samples.dens$bw) cur.dens$y <- (cur.dens$y-min(cur.dens$y))/diff(range(cur.dens$y)) lines(cur.dens$x, cur.dens$y, col=match(cur.spl,names(my.samples))+1) } ) legend('topright', legend=names(my.samples), co=seq(length(my.samples))+1, pch=16, bty='n' ) # assemble a contingency table my.samples.table <- table(my.samples.index,my.samples.id) dim(my.samples.table) heatmap(log10(my.samples.table+0.00001), col=colorRampPalette(c('white',blues9))(24), Rowv=NA,Colv=NA, scale='none') # compute the Jensen-Shannon distance my.samples.dist <- js.dist(t(my.samples.table)) my.samples.clust <- hclust(my.samples.dist) plot(my.samples.clust)
# generate 3 samples over 5 dimensions # sample 1 and 2 are similar, sample 3 has an extra population # set the seed for reproducible examples set.seed(1234) my.samples <- lapply(LETTERS[1:3],function(j) { # each sample has a different number of events n <- floor(runif(1,0.5,0.8)*10000) # matrix is random normal over 5 dimensions cur.mat <- matrix(rnorm(5*n),ncol=5) # rescale cur.mat to a [0,3] interval cur.mat <- 3*(cur.mat-min(cur.mat))/diff(range(cur.mat)) dimnames(cur.mat)[[2]] <- LETTERS[(length(LETTERS)-4):length(LETTERS)] if(j=='C') { # select 30% of the points cur.rws <- sample(n,round(n*0.3,0)) # select 2 columns at random cur.cls <- sample(ncol(cur.mat),2) # create an artificial sub population cur.mat[cur.rws,cur.cls] <- 4*cur.mat[cur.rws,cur.cls] } return(cur.mat) } ) names(my.samples) <- LETTERS[1:3] # check the population size lapply(my.samples,nrow) # assemble a sample matrix my.samples.mat <- do.call('rbind',my.samples) my.samples.id <- lapply(names(my.samples), function(cur.spl) rep(cur.spl,nrow(my.samples[[cur.spl]]))) my.samples.id <- unlist(my.samples.id) # Estimate the maximum required Hilbert order hilbert.order(my.samples.mat) # Estimate the cut positions my.cuts <- make.cut(my.samples.mat,n=5,count.lim=5) # Visualize the cuts show.cut(my.cuts) # Cut the matrix & compute the hilbert index my.samples.cut <- do.cut(my.samples.mat,my.cuts,type='combined') system.time(my.samples.index <- do.hilbert(my.samples.cut,horder=4)) # Visualize samples as density plots my.samples.dens <- density(my.samples.index) my.samples.dens$y <- (my.samples.dens$y-min(my.samples.dens$y))/diff(range(my.samples.dens$y)) plot(my.samples.dens,col='grey3',lty=2) ksink <- lapply(names(my.samples),function(cur.spl) { cat(cur.spl,'\n') cur.dens <- density(my.samples.index[my.samples.id==cur.spl], bw=my.samples.dens$bw) cur.dens$y <- (cur.dens$y-min(cur.dens$y))/diff(range(cur.dens$y)) lines(cur.dens$x, cur.dens$y, col=match(cur.spl,names(my.samples))+1) } ) legend('topright', legend=names(my.samples), co=seq(length(my.samples))+1, pch=16, bty='n' ) # assemble a contingency table my.samples.table <- table(my.samples.index,my.samples.id) dim(my.samples.table) heatmap(log10(my.samples.table+0.00001), col=colorRampPalette(c('white',blues9))(24), Rowv=NA,Colv=NA, scale='none') # compute the Jensen-Shannon distance my.samples.dist <- js.dist(t(my.samples.table)) my.samples.clust <- hclust(my.samples.dist) plot(my.samples.clust)
The Jensen-Shannon distance is a method to measure the distance between discrete probability distributions. To measure the distance between 2 high-dimensional datasets, we cut the space into sub-cubes, then count the number of events per cube. The resulting probability distributions can be compared using the Jensen-Shannon distance.
js.dist(mat, pc = 1e-04)
js.dist(mat, pc = 1e-04)
mat |
a matrix of counts, where rows correspond to samples and columns to Hilbert index |
pc |
a pseudo-count that is added to all samples to avoid divide-by-zero errors |
a S3 distance object
Yann Abraham
# generate 3 samples over 5 dimensions # sample 1 and 2 are similar, sample 3 has an extra population # set the seed for reproducible examples set.seed(1234) my.samples <- lapply(LETTERS[1:3],function(j) { # each sample has a different number of events n <- floor(runif(1,0.5,0.8)*10000) # matrix is random normal over 5 dimensions cur.mat <- matrix(rnorm(5*n),ncol=5) # rescale cur.mat to a [0,3] interval cur.mat <- 3*(cur.mat-min(cur.mat))/diff(range(cur.mat)) dimnames(cur.mat)[[2]] <- LETTERS[(length(LETTERS)-4):length(LETTERS)] if(j=='C') { # select 30% of the points cur.rws <- sample(n,round(n*0.3,0)) # select 2 columns at random cur.cls <- sample(ncol(cur.mat),2) # create an artificial sub population cur.mat[cur.rws,cur.cls] <- 4*cur.mat[cur.rws,cur.cls] } return(cur.mat) } ) names(my.samples) <- LETTERS[1:3] # check the population size lapply(my.samples,nrow) # assemble a sample matrix my.samples.mat <- do.call('rbind',my.samples) my.samples.id <- lapply(names(my.samples), function(cur.spl) rep(cur.spl,nrow(my.samples[[cur.spl]]))) my.samples.id <- unlist(my.samples.id) # Estimate the maximum required Hilbert order hilbert.order(my.samples.mat) # Estimate the cut positions my.cuts <- make.cut(my.samples.mat,n=5,count.lim=5) # Visualize the cuts show.cut(my.cuts) # Cut the matrix & compute the hilbert index my.samples.cut <- do.cut(my.samples.mat,my.cuts,type='combined') system.time(my.samples.index <- do.hilbert(my.samples.cut,horder=4)) # Visualize samples as density plots my.samples.dens <- density(my.samples.index) my.samples.dens$y <- (my.samples.dens$y-min(my.samples.dens$y))/diff(range(my.samples.dens$y)) plot(my.samples.dens,col='grey3',lty=2) ksink <- lapply(names(my.samples),function(cur.spl) { cat(cur.spl,'\n') cur.dens <- density(my.samples.index[my.samples.id==cur.spl], bw=my.samples.dens$bw) cur.dens$y <- (cur.dens$y-min(cur.dens$y))/diff(range(cur.dens$y)) lines(cur.dens$x, cur.dens$y, col=match(cur.spl,names(my.samples))+1) } ) legend('topright', legend=names(my.samples), co=seq(length(my.samples))+1, pch=16, bty='n' ) # assemble a contingency table my.samples.table <- table(my.samples.index,my.samples.id) dim(my.samples.table) heatmap(log10(my.samples.table+0.00001), col=colorRampPalette(c('white',blues9))(24), Rowv=NA,Colv=NA, scale='none') # compute the Jensen-Shannon distance my.samples.dist <- js.dist(t(my.samples.table)) my.samples.clust <- hclust(my.samples.dist) plot(my.samples.clust)
# generate 3 samples over 5 dimensions # sample 1 and 2 are similar, sample 3 has an extra population # set the seed for reproducible examples set.seed(1234) my.samples <- lapply(LETTERS[1:3],function(j) { # each sample has a different number of events n <- floor(runif(1,0.5,0.8)*10000) # matrix is random normal over 5 dimensions cur.mat <- matrix(rnorm(5*n),ncol=5) # rescale cur.mat to a [0,3] interval cur.mat <- 3*(cur.mat-min(cur.mat))/diff(range(cur.mat)) dimnames(cur.mat)[[2]] <- LETTERS[(length(LETTERS)-4):length(LETTERS)] if(j=='C') { # select 30% of the points cur.rws <- sample(n,round(n*0.3,0)) # select 2 columns at random cur.cls <- sample(ncol(cur.mat),2) # create an artificial sub population cur.mat[cur.rws,cur.cls] <- 4*cur.mat[cur.rws,cur.cls] } return(cur.mat) } ) names(my.samples) <- LETTERS[1:3] # check the population size lapply(my.samples,nrow) # assemble a sample matrix my.samples.mat <- do.call('rbind',my.samples) my.samples.id <- lapply(names(my.samples), function(cur.spl) rep(cur.spl,nrow(my.samples[[cur.spl]]))) my.samples.id <- unlist(my.samples.id) # Estimate the maximum required Hilbert order hilbert.order(my.samples.mat) # Estimate the cut positions my.cuts <- make.cut(my.samples.mat,n=5,count.lim=5) # Visualize the cuts show.cut(my.cuts) # Cut the matrix & compute the hilbert index my.samples.cut <- do.cut(my.samples.mat,my.cuts,type='combined') system.time(my.samples.index <- do.hilbert(my.samples.cut,horder=4)) # Visualize samples as density plots my.samples.dens <- density(my.samples.index) my.samples.dens$y <- (my.samples.dens$y-min(my.samples.dens$y))/diff(range(my.samples.dens$y)) plot(my.samples.dens,col='grey3',lty=2) ksink <- lapply(names(my.samples),function(cur.spl) { cat(cur.spl,'\n') cur.dens <- density(my.samples.index[my.samples.id==cur.spl], bw=my.samples.dens$bw) cur.dens$y <- (cur.dens$y-min(cur.dens$y))/diff(range(cur.dens$y)) lines(cur.dens$x, cur.dens$y, col=match(cur.spl,names(my.samples))+1) } ) legend('topright', legend=names(my.samples), co=seq(length(my.samples))+1, pch=16, bty='n' ) # assemble a contingency table my.samples.table <- table(my.samples.index,my.samples.id) dim(my.samples.table) heatmap(log10(my.samples.table+0.00001), col=colorRampPalette(c('white',blues9))(24), Rowv=NA,Colv=NA, scale='none') # compute the Jensen-Shannon distance my.samples.dist <- js.dist(t(my.samples.table)) my.samples.clust <- hclust(my.samples.dist) plot(my.samples.clust)
Given a density object, find the position of local maxima (inflection points)
localMaxima(x)
localMaxima(x)
x |
a vector of density values, as generated through a call to |
a vector of index corresponding to local maxima
Tommy http://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima
x <- c(rnorm(100),rnorm(100,3)) dx <- density(x) plot(dx) abline(v=dx$x[localMaxima(dx$y)],col=2,lty=2)
x <- c(rnorm(100),rnorm(100,3)) dx <- density(x) plot(dx) abline(v=dx$x[localMaxima(dx$y)],col=2,lty=2)
Given a density object, find the position of local minima (inflection points)
localMinima(x)
localMinima(x)
x |
a vector of density values, as generated through a call to |
a vector of index corresponding to local minima
Tommy http://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima
x <- c(rnorm(100),rnorm(100,3)) dx <- density(x) plot(dx) abline(v=dx$x[localMinima(dx$y)],col=2,lty=2)
x <- c(rnorm(100),rnorm(100,3)) dx <- density(x) plot(dx) abline(v=dx$x[localMinima(dx$y)],col=2,lty=2)
For a given column cur.ch
that belongs to a matrix, and a given number of cuts n
,
compute n-1
bins using either fixed of combined limits
make.cut(mat, n = 5, count.lim = 40)
make.cut(mat, n = 5, count.lim = 40)
mat |
the matrix to cut |
n |
the number of cuts to generate (defaults to 5) |
count.lim |
the minimum number of counts to consider for density (defaults to 40) |
the fixed limits correspond to 5 equally spaced values over the range of the column.
the combined limits take the local minima and maxima determined using the localMinima
and localMaxima
functions, to adjust the limits using the following algorithm:
define d
as half the distance between 2 fixed limits
merge local minima and local maxima that are closer than d
if any fixed limit is closer to a local minima than d
, move the fixed limit to the local minima;
move the limits that are not been moved yet, and that are before and after the moved limit
so that they are evenly spread; repeat until no fixed limit can be moved
if some limits have been moved to a local minima, remove limits that are closer than d
to
a local maxima; move the limits that are not been moved yet, and that are before and after
the deleted limit so that they are evenly spread; repeat until no fixed limit can be moved
if no limits has been moved to a local minima, move limits that are closer than d
to
a local maxima; move the limits that are not been moved yet, and that are before and after
the moved limit so that they are evenly spread; repeat until no fixed limit can be moved
The function returns a list of lists, one for each column in mat
, consisting of
cur.dens
the density used to describe the data
cur.hist
the histogram used to describe the data
fixed
the fixed, evenly spaced cuts
minima
the local minima detected in the data
maxima
the local maxima detected in the data
combined
the cuts defined using a combination of fixed positions, local minima and local maxima
a list of of cuts for each column in mat
, see details
Yann Abraham
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
Visualize the cuts in relation with the distribution of the data for each dimension in the original matrix
show.cut(cuts, type = "all", local = FALSE)
show.cut(cuts, type = "all", local = FALSE)
cuts |
the output of the |
type |
which cuts to show. This must be one of "all", "fixed" or "combined". Any unambiguous substring can be given. |
local |
defaults to |
"fixed" will show n
equally spaced cuts (see make.cut
for the definition of n
).
"combined" will show the cuts after adjustment for local minima and maxima.
"all" will show both. Setting local
to TRUE
will enable the visualization of
local minima and maxima detected by the algorithm in each dimension.
the function returns an invisible 'NULL'.
Yann Abraham
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)
# generate a random 3D matrix with 2 peaks mat <- rbind(matrix(rnorm(300),ncol=3), matrix(rnorm(300,5,1),ncol=3)) dimnames(mat)[[2]] <- LETTERS[1:3] # estimate the Hilbert order hilbert.order(mat) # generate 2 bins with a minimum bin size of 5 cuts <- make.cut(mat,n=3,count.lim=5) show.cut(cuts) # Generate the cuts cut.mat <- do.cut(mat,cuts,type='fixed') head(cut.mat)