Help for package BoSSA

Type:

Package

Title:

A Bunch of Structure and Sequence Analysis

Version:

3.7

Date:

2020-10-16

Author:

Pierre Lefeuvre

Maintainer:

Pierre Lefeuvre <pierre.lefeuvre@cirad.fr>

Depends:

R (≥ 3.3.0)

Imports:

ape, RSQLite, jsonlite, phangorn, plotrix

Suggests:

prettydoc, knitr, rmarkdown, XML, rentrez, httr

VignetteBuilder:

knitr

Description:

Reads and plots phylogenetic placements.

License:

GPL-2 | GPL-3 [expanded from: GPL]

NeedsCompilation:

Packaged:

2020-10-20 04:28:26 UTC; lefeuvre

Repository:

CRAN

Date/Publication:

2020-10-20 07:20:05 UTC

A Bunch of Structure and Sequence Analysis

Description

Reads and plots phylogenetic placements.

Details

The DESCRIPTION file:

Package:	BoSSA
Type:	Package
Title:	A Bunch of Structure and Sequence Analysis
Version:	3.7
Date:	2020-10-16
Author:	Pierre Lefeuvre
Maintainer:	Pierre Lefeuvre <pierre.lefeuvre@cirad.fr>
Depends:	R (>= 3.3.0)
Imports:	ape, RSQLite, jsonlite, phangorn, plotrix
Suggests:	prettydoc, knitr, rmarkdown, XML, rentrez, httr
VignetteBuilder:	knitr
Description:	Reads and plots phylogenetic placements.
License:	GPL

Index of help topics:

BoSSA-package           A Bunch of Structure and Sequence Analysis
circular_tree           Plot an inside-out circular tree
plot.pplace             Plot a pplace or jplace object
pplace                  A placement object as obtained with the
                        read_sqlite function
pplace_to_matrix        Pplace to contingency matrix
pplace_to_table         Merge the multiclass and the placement table of
                        pplace object
pplace_to_taxonomy      Convert a pplace object to a taxonomy table
print.pplace            Compact display of pplace and jplace objects
print.protdb            Compact display of protdb object
read_jplace             Read a jplace file
read_protdb             Read Protein Data Bank (PDB) file
read_sqlite             Read a pplacer/guppy sqlite file
refpkg                  Summary data and plots for reference packages
sub_pplace              Subsets a pplace object
write_jplace            Write a jplace or pplace object to the disk

BoSSA contains functions to read and plot phylogenetic placement files obtained using softwares such as pplacer, guppy, EPA and RAPPAS.

Author(s)

Pierre Lefeuvre Maintainer: Pierre Lefeuvre <pierre.lefeuvre@cirad.fr>

References

- pplacer and guppy http://matsen.fhcrc.org/pplacer/ http://matsen.github.io/pplacer/ - EPA https://sco.h-its.org/exelixis/web/software/epa/index.html - RAPPAS https://github.com/benclaff/RAPPAS - Common file format http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031009

Plot an inside-out circular tree

Description

Plot a tree in a circular manner with the tips pointing inward

Usage

circular_tree(phy,ratio=0.5,def=1000,pos_out=FALSE,tip_labels=TRUE,cex_tips=0.5)

Arguments

phy

a class phylo object

ratio

the ratio of the tree size compared to the plot size

def

the def parameter controls the granularity of the curves

pos_out

a matrix with the x and y coordinates of the branches extremities (i.e. nodes and tips) is outputed when set to TRUE

tip_labels

whether or not the tiplabels should be plotted

cex_tips

the size of the tiplabels

Details

The function plot a tree in a circular manner. Note that the tree will produce a correct output only if there is no topology modifications after reading the original tree using the ape read.tree function.

Value

a plot

Author(s)

pierre lefeuvre

Examples


library(ape)

test_tree <- rtree(20)

circular_tree(test_tree)

Plot a pplace or jplace object

Description

Plot the tree and placements from a pplace or a jplace object

Usage

## S3 method for class 'pplace'
plot(x,type="precise",simplify=FALSE,
		main="",N=NULL,transfo=NULL,legend=TRUE,stl=FALSE,
		asb=FALSE,edge.width=1,max_width=10,cex.number=0.5,
		cex.text=0.8,transp=80,add=FALSE,color=NULL,discrete_col=FALSE,
		pch=16,run_id=NULL, ...)

Arguments

x

A pplace or jplace object

type

The type of ploting desired with either, "precise", "color", "fattree" or "number". For each option, placement sizes represent the multiplication of the N value with the placement ML ratio.

simplify

If set to TRUE, only plot the best position for each placement. default is FALSE.

main

An optionnal title to plot along the tree

N

An optionnal vector of the weight of each placement. Must be of the same length and order as placements in the multiclass table. Note that the placement mass (potentially) available from the original files are imported into R but aren't use in the analysis. The N parameter should be used instead.

transfo

An optionnal function to transform the placement size when type set to "precise". Beware that it is also applied to the legend so that it does not anymore correspond to the placement size but to the transform dot size

legend

Plot a legend. Not available for type "number" or "fattree"

stl

Show tip labels

asb

Add scale bar

edge.width

The tree edge width

max_width

The maximum edge width when type is set to "fattree"

cex.number

Control the size of the text when type is set to "number"

cex.text

Control the size of the main

transp

Control the transparency of the placement when type is "precise" and the transparency of the branch without placement when type is set to "color". Encoded in hexadecimal scale (i.e. range from "00" to "FF")

add

Add placement to an existing plot when type is set to precise. Default is FALSE. If it was drawn, the legend won't be updated. Beware to use the same value for the "transfo" option in each plot. Dots color scale won't be accurate when using the "add" option. It is highly recommanded to use a single color.

color

The colors used for pendant branch length scale when type is set to "precise". Default is a color ramp with "blue", "green", "yellow" and "red"

discrete_col

Discretise the color scale for pendant branch length

pch

The dot style used for placements when type is set to "precise"

run_id

A vector of run_id to subset

...

Further arguments passed to or from other methods.

Author(s)

pierre lefeuvre

Examples


data(pplace)

### number type
plot(pplace,type="number",main="number")

### color type without and with legend
plot(pplace,type="color",main="color without legend",legend=FALSE)
plot(pplace,type="color",main="color with legend",legend=TRUE)

### fattree type
plot(pplace,type="fattree",main="fattree")

### precise type
plot(pplace,type="precise",main="precise vanilla")
plot(pplace,type="precise",simplify=TRUE,main="precise simplify")

# using the read number information encoded here in the name (if available)
Npplace <- sample(1:100,nrow(pplace$multiclass),replace=TRUE)
# in the following exemple, the dots are too large...
plot(pplace,type="precise",main="precise N",legend=TRUE,N=Npplace,simplify=TRUE)

# using the transfo option to modify dot sizes
# note that placements sizes inferior to 1 won't
# behave properly with log10 as a transformation function.
# In this case, you rather use simplify (all the placement
# will corresponds to at least one sequence).
# Beware that when using the transfo option, 
# the legend does not anymore correspond to the actual placement
# size but to the transform placement size
# (i.e. the transform function applied to the dot size).
# we will use the the log10 function
plot(pplace,type="precise",main="precise log10",
	legend=TRUE,N=Npplace,transfo=log10)
# or without simplify, you can use a custom function
# as transfo that will produce positive sized dots
plot(pplace,type="precise",main="precise custom"
	,legend=TRUE,N=Npplace,transfo=function(X){log10(X+1)})

A placement object as obtained with the read_sqlite function

Description

A placement object as obtained with the read_sqlite function. In this example, a set of 100 sequence reads are placed over a 16S phylogeny. This example is a subset of those available for download at http://fhcrc.github.io/microbiome-demo/

Usage

data("pplace")

References

http://fhcrc.github.io/microbiome-demo/

Examples

data(pplace)
str(pplace)

Pplace to contingency matrix

Description

Convert the pplace object into a contingency matrix OTUs / sample

Usage

pplace_to_matrix(pplace, sample_info, N = NULL, tax_name = FALSE
				 ,run_id=NULL,round_type=NULL)

Arguments

pplace

A pplace object

sample_info

A vector or list specifying the association between placement (in the multiclass table) and sample. In the case of a list, multiple sample can be associated with a single placement.

N

An optionnal vector or list with a number of occurence (or weight) associated to each placed sequence. If "sample_info" is a list, "N" must also be a list. Note that the placement mass (potentially) available from the original files are imported into R but aren't use in the analysis. The N parameter should be used instead.

tax_name

Either the tax ids (when set to FALSE, default) or the tax names (when set to TRUE) are used as column names. The tax names are obtained form the "taxo" table of the pplace object.

run_id

A vector of run_id to subset

round_type

The name of the rounding fonction to apply to the product of the number of individuals classified in a given category and the likelihood ratio of this classification. Should be set to NULL (no rounding) or one of "trunc", "round", "ceiling" or "floor".

Value

A contingency matrix with OTUs / species in rows and samples in columns.

Author(s)

pierre lefeuvre

Examples


data(pplace)

### simple example
pplace_to_matrix(pplace,c(rep("sample1",27),rep("sample2",50),rep("sample3",23)))

### using the N option to specify the number of sequence each placement represents
Npplace <- sample(1:20,100,replace=TRUE)
pplace_to_matrix(pplace,c(rep("sample1",27),rep("sample2",50),rep("sample3",23)),N=Npplace)

### with tax_name=TRUE
pplace_to_matrix(pplace,c(rep("sample1",27),rep("sample2",50),rep("sample3",23)),tax_name=TRUE)

Merge the multiclass and the placement table of pplace object

Description

Merge the multiclass and the placement table of pplace object

Usage

pplace_to_table(pplace, type = "full",run_id=NULL)

Arguments

pplace

a pplace object

type

the placement type to consider

run_id

A vector of run_id to subset

Details

For the type argument, either "full" or "best" are accepted. Whereas for the "full" type, all the placements are considered, only the best placement for each sequence is considered for the "best" type.

Value

a data frame with the same column names as the mutliclass and placements tables

Author(s)

pierre lefeuvre

Examples


data(pplace)

### with every placement
pplace_to_table(pplace)

### keeping only the best placement for each sequence
pplace_to_table(pplace,type="best")

Convert a pplace object to a taxonomy table

Description

Convert a pplace object to a taxonomy table

Usage

pplace_to_taxonomy(pplace,taxonomy,
rank=c("phylum","class","order","family","genus","species"),
type="all",tax_name=TRUE,run_id=NULL)

Arguments

pplace

A pplace object

taxonomy

The taxonomy table as obtained using the refseq fonction with type set to taxonomy

rank

The desired rank for the taxonomy table

type

Wether all the possible classification available in the multiclass table are outputed (type="all") or only the best (type="best")

tax_name

Wether to use taxonomy names (default) or tax_id number

run_id

A vector of run_id to subset

Value

A matrix with taxonomic ranks for each sequence

Author(s)

pierre lefeuvre

Compact display of pplace and jplace objects

Description

Compact display of pplace and jplace objects

Usage

## S3 method for class 'pplace'
print(x, ...)

Arguments

x

a pplace or jplace object

...

further arguments passed to or from other methods

Author(s)

pierre lefeuvre

Examples

data(pplace)
print(pplace)

Compact display of protdb object

Description

Function to print the header section of the protdb object.

Usage

## S3 method for class 'protdb'
print(x, ...)

Arguments

x

a protdb class object

...

further arguments passed to or from other methods

Author(s)

pierre lefeuvre

Examples

pdb_file <- system.file("extdata", "1L2M.pdb", package = "BoSSA")
pdb <- read_protdb(pdb_file)
print(pdb)

Read a jplace file

Description

Read a jplace file

Usage

read_jplace(jplace_file, full = TRUE)

Arguments

jplace_file

A jplace file name

full

If set to FALSE, only the tree is read from the jplace file

Details

When the jplace or sqlite files are imported into R, the node numbering available in the original file is converted to the class "phylo" numbering. The class phylo is defined in the "ape" package.

Value

A list with

arbre

The tree in class "phylo" over wich placements are performed

placement

The placement table

multiclass

The multiclass table

run

The command line used to obtained the jplace file

Author(s)

pierre lefeuvre

Read Protein Data Bank (PDB) file

Description

Read Protein Data Bank (PDB) file

Usage

read_protdb(X)

Arguments

X

The path/name of a pdb file.

Value

The output is a list of objects

header

The header of the pdb file

compound

A data frame summarizing the CMPND part of the pdb file. This include the molecule ID, the molecule name and the chain ID

atom

A data frame with the atom type, the amino acid, the amino acid number, the chain and the euclidian X, Y, Z coordinates of the atoms

sequence

A list with the numbering of the amino acid and the amino acid sequence for each chain

Author(s)

pierre lefeuvre

References

http://www.rcsb.org/pdb/home/home.do

Examples

pdb_file <- system.file("extdata", "1L2M.pdb", package = "BoSSA")
pdb <- read_protdb(pdb_file)
pdb

Read a pplacer/guppy sqlite file

Description

Read a pplacer/guppy sqlite file

Usage

read_sqlite(sqlite_file,jplace_file=gsub("sqlite$","jplace",sqlite_file),
rank="species")

Arguments

sqlite_file

A pplacer/guppy sqlite path/file name

jplace_file

An optionnal jplace file name. By default, the sqlite file name with the suffix changed from "sqlite" to jplace" is used. If different, the jplace path/name must be specified.

rank

The desired taxonomic assignation rank to extract. default is "species".

Details

As the tree informations are not available in the sqlite file, the jplace file is also required. When the jplace or sqlite files are import into R, the node numbering available in the original file is converted to the class "phylo" numbering.

Value

A list with

runs

The command line used to obtained the sqlite file

taxa

The taxonomic information table

multiclass

The multiclass table

placement_positions

A data frame with the posiotn of each placement in the reference tree

arbre

The tree in class "phylo" over wich placements are performed

edge_key

A matrix with correspondance of node numbering between the original tree in the jplace file and the class phylo tree of the "arbre" component

original_tree

The tree string from the jplace file

For details on the other components (i.e. "placements, "placement_classifications", "placement_evidence", "placement_median_identities", "placement_names", "placement_nbc", "placements", "ranks" and "sqlite_sequence", please, refer to http://erick.matsen.org/pplacer/generated_rst/guppy_classify.html)

Author(s)

pierre lefeuvre

References

http://erick.matsen.org/pplacer/generated_rst/guppy_classify.html

Examples

### the path to the sqlite and jplace files
sqlite_file <- system.file("extdata", "example.sqlite", package = "BoSSA")
jplace_file <- system.file("extdata", "example.jplace", package = "BoSSA")
pplace <- read_sqlite(sqlite_file,jplace_file)

Summary data and plots for reference packages

Description

Summary data and plots for reference packages

Usage

refpkg(refpkg_path,type="summary",rank_tree="species",
rank_pie=c("phylum","class","order","family","genus"),
scale_pie=TRUE,alpha_order=TRUE,cex.text=0.7,
cex.legend=1,asb=TRUE,rotate_label=TRUE,
out_krona="for_krona.txt",text2krona=NULL)

Arguments

refpkg_path

The path of the reference package directory

type

The type of summary to perform with "summary", "taxonomy", "info", "tree", "pie" or "krona" available

rank_tree

The desired rank for tree coloring

rank_pie

The ranks to be plot for the taxonomy pie chart

scale_pie

Wether or not to take into account the number of sequences available within the reference package for the pie chart

alpha_order

Wether or not the color should follow taxa alpahabetic order when type set to "tree"

cex.text

The tip labels cex parameter when type is set to "tree" and the text cex parameter when type is set to "pie"

cex.legend

The size of the legend when type set to "tree"

asb

Add a scale bar on the tree

rotate_label

Rotates the pie slice labels

out_krona

The name of the output file when type is set to "krona".

text2krona

The full path to the krona "ImportText.pl" script when KronaTools is installed and you wish to directly produce the html krona file.

Value

A summary print on screen when type set to "summary". A data frame when type set to "taxonomy" or "info". A file written to the disk when type is set to "krona". A plot otherwise.

Author(s)

pierre lefeuvre

References

https://github.com/marbl/Krona/wiki/KronaTools http://fhcrc.github.io/taxtastic/

Examples


refpkg_path <- paste(find.package("BoSSA"),"/extdata/example.refpkg",sep="")

### summary
refpkg(refpkg_path)

### taxonomy
taxonomy <- refpkg(refpkg_path,type="taxonomy")
head(taxonomy)

### info
refpkg(refpkg_path,type="info")

### tree
refpkg(refpkg_path,type="tree",rank_tree="order",cex.text=0.5)

### pie
refpkg(refpkg_path,type="pie",rank_pie=c("class","order","family"),cex.text=0.6)

### krona
# it will produce a flat text file
# this file can be use as input for the the "ImportText.pl" krona script
# see https://github.com/marbl/Krona/wiki/KronaTools for more details on krona
## Not run: 
refpkg(refpkg_path,type="krona",out_krona="for_krona.txt")

## End(Not run)

Subsets a pplace object

Description

Subsets a pplace or jplace object based on the placement_id, the name of the placement or a regular expression of the name of the placement

Usage

sub_pplace(x, placement_id = NULL, ech_id = NULL, ech_regexp = NULL, run_id = NULL)

Arguments

x

The pplace or jplace object to subset

placement_id

A vector of the placement_id to subset

ech_id

A vector of the names of the placement to subset

ech_regexp

A regular expression of the name of the placement to subset

run_id

A vector of run_id to subset

Details

When using placement_id, the subset is performed based on the placement_id column of the multiclass, placements, placement_positions, placement_names, placement_classifications, placement_evidence, placement_median_identities and placement_nbc data frames. When using ech_id and ech_regexp, the subset is performed from the multiclass$name column. When using run_id, the subset is performed based on the placements$run_id column.

Value

A pplace object

Author(s)

pierre lefeuvre

Examples


data(pplace)

### subsetting using placement ids. Here placements 1 to 5
sub1 <- sub_pplace(pplace,placement_id=1:5)
sub1

### subsetting using sequenes ids
id <- c("GWZHISEQ01:514:HMCLFBCXX:2:1108:1739:60356_90",
"GWZHISEQ01:514:HMCLFBCXX:2:1114:13665:31277_80")
sub2 <- sub_pplace(pplace,ech_id=id)
sub2

### subsetting using a regular expression of sequence ids
sub3 <- sub_pplace(pplace,ech_regexp="^HWI")
sub3

Write a jplace or pplace object to the disk

Description

Write a jplace or pplace object to the disk in the jplace JSON format

Usage

write_jplace(x,outfile)

Arguments

x

A pplace or jplace object

outfile

The name of the output file

Note

Note that the placement mass (potentially) available from the original files are imported into R but aren't use in the analysis. Anyway, the write_jplace function takes into account possible weight/mass information available in the the "nm" column of the multiclass table for jplace objects and in the "mass" column from the placement_names table for the pplace objects. The values in these column can be edited before writing the jplace file if one want to use distinct mass/weight in downtstream analysis (e.g. using the guppy program functionalities).

Author(s)

pierre lefeuvre

Examples


data(pplace)
## Not run: 
write_jplace(pplace,"test.jplace")

## End(Not run)

A Bunch of Structure and Sequence Analysis

Description

Details

Author(s)

References

Plot an inside-out circular tree

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Plot a pplace or jplace object

Description

Usage

Arguments

Author(s)

Examples

A placement object as obtained with the read_sqlite function

Description

Usage

References

Examples

Pplace to contingency matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Merge the multiclass and the placement table of pplace object

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Convert a pplace object to a taxonomy table

Description

Usage

Arguments

Value

Author(s)

Compact display of pplace and jplace objects

Description

Usage

Arguments

Author(s)

Examples

Compact display of protdb object

Description

Usage

Arguments

Author(s)

Examples

Read a jplace file

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Read Protein Data Bank (PDB) file

Description

Usage

Arguments

Value

Author(s)

References

Examples

Read a pplacer/guppy sqlite file

Description

Usage

Arguments

Details

Value

Author(s)