| Title: | Functional Shannon Entropy for Virome Mutational Analysis |
|---|---|
| Description: | Estimates Shannon entropy, per gene and per genomic position, associated with non-synonymous mutation frequencies in viral populations, such as wastewater samples. The package uses codon translations for functional insights. Each amino acid can be treated as an individual state, resulting in a 20-state entropy computation, or grouped into one of six physicochemical classes, adding further functional context. Provides normalized values (0-1 scale) to facilitate the direct comparison of different genomic positions or total functional entropy across multiple metagenomes. Designed to analyze mutational data using tabular 'Single Nucleotide Variant' (SNV) frequency tables generated by variant callers (e.g., 'iVar' or 'LoFreq'), operating independently of consensus sequence estimation and multiple sequence alignment. |
| Authors: | Leandro Roberto Jones [aut, cre] (ORCID: <https://orcid.org/0000-0002-5877-4194>), Julieta Marina Manrique [aut] (ORCID: <https://orcid.org/0000-0001-8712-6666>) |
| Maintainer: | Leandro Roberto Jones <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3 |
| Built: | 2026-06-07 07:16:54 UTC |
| Source: | https://github.com/cran/MetaEntropy |
entropyProfile to a Data FrameFunction to extract summary information from an entropyProfile
object. This function is internally used for plotting.
## S3 method for class 'entropyProfile' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'entropyProfile' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An object of class |
row.names |
Please see |
optional |
Please see |
... |
Additional arguments passed to the function. |
A data frame with tabular information on an entropy profile.
This information includes the name of the proteins presenting
mutations, the corresponding genomic positions, and the resulting
entropies in the metagenome.
Graphical and formal analyses of contiguous amino acids.
assessHotSpot(profile, boundaries, chartType = "boxplot")assessHotSpot(profile, boundaries, chartType = "boxplot")
profile |
An object of class |
boundaries |
Numeric vector with the first and last genomic positions of the region to be evaluated. To be set interactively if not provided. |
chartType |
Chart type; either "boxplot", "stripchart" or "swarm". |
The query stretch (e.g. a protein domain with neutralizing epitopes) is compared against the full set of proteins. Hot spot boundaries should be indicated relative to the reference genome used in variant calling.
htest object. This function is called primarily for its side
effects.
omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) # Entrpy hotspot at SARS-CoV-2 receptor binding domain assessHotSpot(omicron, c(22517, 23186), chartType = "swarm")omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) # Entrpy hotspot at SARS-CoV-2 receptor binding domain assessHotSpot(omicron, c(22517, 23186), chartType = "swarm")
This function is intended primarily for internal use by
getEntropySignature.
entropyProfile( polymorphisms, position = "position", linkage = "linkage", ref = "ref", alt = "alt", protein = "protein", aa_position = "aa_position", ref_aa = "ref_aa", alt_aa = "alt_aa", alt_aa_freq = "alt_aa_freq", entropies = NA_real_, genome = mn908947.3 )entropyProfile( polymorphisms, position = "position", linkage = "linkage", ref = "ref", alt = "alt", protein = "protein", aa_position = "aa_position", ref_aa = "ref_aa", alt_aa = "alt_aa", alt_aa_freq = "alt_aa_freq", entropies = NA_real_, genome = mn908947.3 )
polymorphisms |
A data frame. Please see Details and Examples in
documentation for |
position |
Name of the |
linkage |
Information on linked positions. |
ref |
Column name with reference bases. |
alt |
Column name with the alternative bases observed in the metagenome. |
protein |
Name of the column carrying protein names. |
aa_position |
Name of the column that indicates the protein positions of the mutated amino acids. |
ref_aa |
Name of the column that carries the reference amino acids. |
alt_aa |
Name of the column carrying alternative amino acids observed in the metagenome. |
alt_aa_freq |
Name of the column giving the frequencies of alternative amino acids in the metagenome. |
entropies |
|
genome |
A list providing CDS data and length of the reference genome. |
The documentation for getEntropySignature details the type of
input needed to create a profile. entropyProfile uses the same parameters as
getEntropySignature, with the exception of categories and
entropies.
An (empty) object of class entropyProfile.
Calculates genome-wide Shannon entropies from SNV data.
getEntropySignature( polymorphisms, position = "position", linkage = "linkage", ref = "ref", alt = "alt", protein = "protein", aa_position = "aa_position", ref_aa = "ref_aa", alt_aa = "alt_aa", alt_aa_freq = "alt_aa_freq", categories = "robust", genome = mn908947.3 )getEntropySignature( polymorphisms, position = "position", linkage = "linkage", ref = "ref", alt = "alt", protein = "protein", aa_position = "aa_position", ref_aa = "ref_aa", alt_aa = "alt_aa", alt_aa_freq = "alt_aa_freq", categories = "robust", genome = mn908947.3 )
polymorphisms |
A data frame. Please see Details and Examples. |
position |
Name of the |
linkage |
Information on linked positions. |
ref |
Column name with reference bases. |
alt |
Column name with the alternative bases observed in the metagenome. |
protein |
Name of the column carrying protein names. |
aa_position |
Name of the column that indicates the protein positions of the mutated amino acids. |
ref_aa |
Name of the column that carries the reference amino acids. |
alt_aa |
Name of the column carrying alternative amino acids observed in the metagenome. |
alt_aa_freq |
Name of the column giving the frequencies of alternative amino acids in the metagenome. |
categories |
Whether a class per amino acid should be used ("sensitive") or they should be grouped into aliphatic, aromatic, polar, positively charged, negatively charged, and special ("robust") (Mirny and Shakhnovich, 1999). |
genome |
A list providing CDS data and length of the reference genome. |
You provide a data frame with SNVs information including reference
and alternative aminoacids, their frequencies, and corresponding positions
relative to a reference sequence.
This type of data can be generated by numerous programs and pipelines.
The objective is to assess the biological impact of nonsynonymous
variation within a viral population, such as an environmental sample (e.g.
wastewater) or a single infection (aka quasisepecies).
Entropy is calculated within the metagenome and is therefore independent
of the reference sequence.
Some mutations may be part of a same codon.
This is to be indicated in the linkage column, providing a downstream
linked position, or the closest upstream position if there are no downstream
positions that are part of the same codon.
For example, in the wWater dataset, mutations T22673C and C22674T are linked
to each other and affect codon 371 of the S gene:
| wave | position | linkage | ref | alt | protein | ... | |
| ... | |||||||
| 105 | third | 22599 | NA | G | A | S | ... |
| 106 | third | 22673 | 22674 | T | C | S | ... |
| 107 | third | 22674 | 22673 | C | T | S | ... |
| 108 | third | 22679 | NA | T | C | S | ... |
| ... |
The genome parameter is a list that provides data on the topology of
protein-coding regions in the genome and its length, used internally
primarily for graphical and summary purposes.
The package provides an example (mn908947.3) of how this
information is to be organized.
An object of class entropyProfile. It contains a tidy,
summarized version of the SNV table, a data frame with
information on genome-wide entropy, a data frame with
information on each CDS and corresponding mutations observed in the
virome, and a list with CDS data and length of the reference
genome used in variant calling.
Mirny and Shakhnovich, 1999. J Mol Biol 291:177-196. doi:10.1006/jmbi.1999.2911.
Shannon, 1948. Bell System Technical Journal, 27:379-423. doi:10.1002/j.1538-7305.1948.tb01338.x.
# Entropy across the genome in ancestral lineages ancestral <- getEntropySignature(wWater[wWater$wave == "first", ], categories = "sensitive") # Inspect profile plot(ancestral, chartType = "entroScan")# Entropy across the genome in ancestral lineages ancestral <- getEntropySignature(wWater[wWater$wave == "first", ], categories = "sensitive") # Inspect profile plot(ancestral, chartType = "entroScan")
This type of data can be obtained from .gff (General Feature Format)
files using applications such as the rtracklayer package, or manually
from the corresponding entry in the GenBank database.
mn908947.3mn908947.3
An object of class list of length 2.
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. MN908947.3, Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. Available from: https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3.
Creates entropy charts along a genome.
## S3 method for class 'entropyProfile' plot(x, chartType = "bp", ...)## S3 method for class 'entropyProfile' plot(x, chartType = "bp", ...)
x |
Object of class |
chartType |
Whether to graph per-protein summaries ("bp"), per-protein stripcharts ("stripchart" / "swarm"), or position-wise entropy ("entroScan"). |
... |
Additional arguments passed to the function. |
Unrendered gg/ggplot object produced by ggplot2. This
function is primarily called for its side effects.
ancestral <- getEntropySignature(wWater[wWater$wave == "first", ]) omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) # Enhanced Spike entropy plus pervasive negative selection in Omicron # sublineages anc_plot <- plot(ancestral, chartType = "stripchart") omi_plot <- plot(omicron, chartType = "stripchart") patchwork::wrap_plots(anc_plot/omi_plot)ancestral <- getEntropySignature(wWater[wWater$wave == "first", ]) omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) # Enhanced Spike entropy plus pervasive negative selection in Omicron # sublineages anc_plot <- plot(ancestral, chartType = "stripchart") omi_plot <- plot(omicron, chartType = "stripchart") patchwork::wrap_plots(anc_plot/omi_plot)
profileSummary objectsThis function formats and prints compact entropy profile summaries
(profileSummary objects), on the console.
## S3 method for class 'profileSummary' print(x, ...)## S3 method for class 'profileSummary' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to the function. |
Invisibly returns NULL. This function is used for its side
effect.
tidyMutations objectsThis function formats and prints compact mutation summaries
(tidyMutations objects), on the console.
## S3 method for class 'tidyMutations' print(x, ...)## S3 method for class 'tidyMutations' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to the function. |
Invisibly returns NULL. Called for side effect.
Displays SNVs, and corresponding protein mutations, at specific genomic positions.
showMutations(profile, positions)showMutations(profile, positions)
profile |
An object of class |
positions |
A vector with genome positions relative to the reference genome. |
The user provides a list of genome positions and the function prints the mutations associated with them. The output format is "ref_res###alt_res / protein:ref_res###alt_res", where ref_res is the residue (eiter nucleotide or aminoacid) in the reference strain, alt_res is the alternative residue in the metagenome, "###" is the position (either nucleotide or aminoacid) where the mutation was observed, and "protein" is the name of the affected protein.
An object of class c("tidyMutations", "data.frame"),
containing summary information about user-supplied genomic
positions. This information includes the mutations themselves
relative to the reference genome, their positions within it, and the
corresponding abundances in the virome. Intended to be displayed by
print.tidyMutations.
# High entropy at the RBD in Omicron lineages omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) plot(omicron, chartType="stripchart") # Identify the high-entropy positions omicron$Entropy$position[ omicron$Entropy$entropy > 0.3 ] #[1] 22882 22898 22917 23013 23040 23048 23055 23063 # Get a descriptive table showMutations(omicron, c(22882, 22898, 22917, 23013, 23040, 23048, 23055, 23063))# High entropy at the RBD in Omicron lineages omicron <- getEntropySignature(wWater[wWater$wave == "third", ]) plot(omicron, chartType="stripchart") # Identify the high-entropy positions omicron$Entropy$position[ omicron$Entropy$entropy > 0.3 ] #[1] 22882 22898 22917 23013 23040 23048 23055 23063 # Get a descriptive table showMutations(omicron, c(22882, 22898, 22917, 23013, 23040, 23048, 23055, 23063))
Prints a report about an entropy profile (an object of class "entropyProfile").
## S3 method for class 'entropyProfile' summary(object, ...)## S3 method for class 'entropyProfile' summary(object, ...)
object |
An object of class |
... |
Other parameters passed to the function. |
An object of class c("profileSummary", "list") summarizing
an entropy profile. Intended to be displayed via
print.profileSummary.
SNVs inferred from Illumina (2 x 150) sequences from pooled ultra-pure virus concentrates representative of the 1st and 3rd COVID-19 waves in Trelew. Reads were mapped against the Wuhan-Wu-1 reference genome (MN908947.3) by bwa, and variants were called with iVar with a 3% frequency cutoff for minor variants. First wave cases were caused by ancestral strains whereas third wave cases were mainly due to highly human-adapted Omicron sublineages.
wWaterwWater
An object of class data.frame with 148 rows and 10 columns.
Manrique, Julieta Marina, and Leandro Roberto Jones. 2025. A Cost-Effective Wastewater-Based Workflow for Community-Level Insights into SARS-CoV-2 Evolution. Unpublished 0 (0): 000-000.