Documentation

The gelviz.basic module

gelviz.basic.createGeneNameMap(gene_name_mapping_filename)

Function that creates a mapping between gene ids

Parameters

gene_name_mapping_file (str) – Path to a tab separated file, for which the first column is a ensemble gene id, and the second column is the HUGO gene name

Returns

Dictionary containing the gene id mapping.

Return type

dictionary

gelviz.basic.determineYPosGene(genes_bed, region_size, distance_ratio)

Function that determines the max y position for gene plotting via function plotGenes.

Parameters
  • genes_bed (pybedtools.BedTool) – pybedtools.BedTool object containing genes to be plotted.

  • region_size (int) – Size of region to be plotted in base pairs.

  • distance_ratio (float) – Minimal distance between two genes, as ratio of ax width, such that two genes are plotted side by side. If this ratio is underwent, the genes will be stacked.

Returns

Tuple of

  1. max_y_pos: Defines the number of stacked genes.

  2. y_pos_dict: Dictionary with keys = gene ids and values = y position of gene.

Return type

tuple

gelviz.basic.distanceEqualizer(genomic_segments, start, end, direction='top_down', color='k', ax=None)

Function that plots arcs from unequal distances of genomic segments to equal distances.

Parameters
  • genomic_segments (list) – List of segments for which distances shall be equalized (each segment is of the form [<chrom>, <start>, <end>])

  • start (int) – Start position of the genomic region.

  • end (int) – End position of the genomic region.

  • color (str, optional) – Color of lines equalizing distances, defaults to “k”.

  • direction (str, optional.) – Direction of distance equalization (top_down | bottom_up), defaults to “top_down”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which to plot, defaults to None.

Returns

List of equalized region midpoints.

Return type

list

gelviz.basic.plotCNVs(cnvs_bed, chromosome, start, end, ploidy=2, cnv_threshold=0.7, color_gain='g', color_loss='r', color_neutral='k', ax=None)

Function for plotting CNV segments

Parameters
  • cnvs_bed (pybedtools.BedTool) –

    pybedtools.BedTool object containing CNVs with following entries:

    1. Chromosome,

    2. Start Position,

    3. End Position,

    4. Deviation from ploidy,

    5. True Copy Number)

  • chromosome (str) – Chromosome for which to plot CNVs.

  • start (int) – Start position on chromosome.

  • end (int) – End position on chromosome.

  • ploidy (int, optional) – Assumed ploidy of tumor, defaults to 2.

  • cnv_threshold (float, optional) – Minimal deviation from ploidy to be considered as a CNV, defaults to 0.7.

  • color_gain (str, optional) – Plot color of copy number gains, defaults to “g”.

  • color_loss (str, optional) – Plot color of copy number losses, defaults to “r”.

  • color_neutral (str, optional) – Plot color of copy number neutral regions, defaults to “k”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis used for plotting.

Returns

Nothing to be returned

Return type

None

gelviz.basic.plotCNVsHeat(cnvs_bed, chromosome, start, end, ploidy=2, cnv_threshold=0.7, cmap='bwr', max_dev=None, ax=None)

Function for plotting CNV segments as heatmap

Parameters
  • cnvs_bed (pybedtools.BedTool) –

    pybedtools.BedTool object containing CNVs with following entries:

    1. Chromosome,

    2. Start Position,

    3. End Position,

    4. Deviation from ploidy,

    5. True Copy Number)

  • chromosome (str) – Chromosome for which to plot CNVs.

  • start (int) – Start position on chromosome.

  • end (int) – End position on chromosome.

  • ploidy (int, optional) – Assumed ploidy of tumor, defaults to 2.

  • cnv_threshold (float, optional) – Minimal deviation from ploidy to be considered as a CNV, defaults to 0.7.

  • cmap (str, optional) – Colormap used for plotting CNVs, defaults to “bwr”.

  • max_dev (float, optional) – Maximal deviation from ploidy to plot, defaults to None.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis used for plotting, defaults to None.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.plotChIPSignals(chip_signals, r_chrom, r_start, r_end, ax=None, color='b', offset=None, merge=None)

Function that plots bedGraph like iterators.

Parameters
  • chip_signals (iterator) –

    Iterator for which each element is a list-ike object containing:

    1. Chromosome

    2. Start postion

    3. End position

    4. Value to be plotted as bar

  • r_chrom (str) – Chromosome of region to be plotted.

  • r_start (int) – Start position of region to be plotted.

  • r_end (int) – End position of region to be plotted.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis of plot

  • color (str, optional) – color of bars, defaults to “b”.

  • offset (int, optional) – Length of intervals, defaults to None.

  • merge (int, optional) – Number of elements to be merged. If this value is not equal to 0, than merge elements will be averaged an plotted, defaults to 0.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.plotCoordinates(chrom, start, end, color='k', ax=None, upper=True, loc_coordinates='up', revert_coordinates=False, rotation=0)

Function that plots genomic coordinates in a linea fashion.

Parameters
  • chrom (str) – Chromosome of the region to be plotted.

  • start (int) – Start position of the region to be plotted.

  • end (int) – End position of the region to be plotted.

  • color (str, optional) – Color of the genomic scales elements, defaults to “k”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis of plot, defaults to None.

  • upper (bool, optional) – If True, make less ticks, else if False make more ticks.

  • loc_coordinates (str, optional) – Either of “up” | “down”. If “up”, plot ticks to upper direction, else if “down”, plot ticks to lower direction, defaults to “up”.

  • revert_coordinates (bool, optional) – If True, coordinates are reverted to decreasing order. Else, coordinates stay in increasing order, defaults to False.

  • rotation (int, optional) – Rotational angle of coordinate strings, defaults to 0.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.plotGeneExpression(genes_bed, region_bed, expression_df_g1, expression_df_g2, gene_names_map, blacklist=None, ax=None, plot_legend=False, color_g1='#fb8072', color_g2='#80b1d3', g1_id='tumor', g2_id='normal', plot_gene_names=True)

Function for plotting paired gene expression (e.g. tumor and normal) on a gene region scale retaining the position of genes.

Parameters
  • genes_bed (pybedtools.BedTool) – pybedtools.BedTool object containing TXstart, and TXend of genes.

  • region_bed (pybedtools.BedTool) – pybedtools.BedTool object containing the region to be plotted

  • expression_df_g1 (pandas.DataFrame) – pandas.Dataframe containing the expression values of g1 samples (columns: sample ids; index: gene ids)

  • expression_df_g2 (pandas.DataFrame) – pandas.Dataframe containing the expression values of g2 samples (columns: sample ids; index: gene ids)

  • gene_names_map (dict.) – Dictionary with keys: ENSEMBL GENE IDs, and values: HUGO GENE SYMBOLs.

  • blacklist (set, optional) – Set containing gene ids not to be plotted, default to None.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis used for plotting, defaults to None.

  • plot_legend (bool) – If True legend is plotted, False otherwise, defaults to False.

  • color_g1 (str, optional) – Color used for plotting g1 samples expression, defaults to “#fb8072”.

  • color_g2 (str, optional) – Color used for plotting g2 samples expression, defaults to “#80b1d3”.

  • g1_id (str, optional) – ID of g1 used for legend plotting, defaults to “tumor”.

  • g2_id (str, optional) – ID of g2 used for legend plotting, defaults to “normal”.

  • plot_gene_names (bool.) – If True, the HUGO GENE SYMBOLs will be shown, else the GENE SYMBOLs are hidden.

Returns

Axis on which plot was placed.

Return type

matplotlib.axes._subplots.AxesSubplot

gelviz.basic.plotGeneExpressionEqualDist(genes_bed, gene_mid_points, region, expression_df, groups, gene_names_map=None, blacklist=None, ax=None, plot_legend=False, colors=None, ids=None, plot_gene_names=True, position_gene_names='bottom', log_transformed=True, plot_points=False, alpha=0.5)

Function for plotting grouped gene expression (e.g. tumor and normal) on a gene region scale equalizing the position of genes.

Parameters
  • genes_bed (pybedtools.BedTool) – pybedtools.BedTool object containing gene regions.

  • gene_mid_points (list) – list of integer values containing center positions of genes.

  • region (list) – List containing the region to be plotted ([<chrom>, <start>, <end>]).

  • groups (list) – List of lists containing the IDs of the different groups.

  • gene_names_map (dict.) – Dictionary with keys: ENSEMBL GENE IDs, and values: HUGO GENE SYMBOLs.

  • expression_df (class:pandas.DataFrame) – class:pandas.DataFrame object containing the expression values of all samples (columns: sample ids; index: gene ids).

  • blacklist (set, optional) – Set containing gene ids not to be plotted, defaults to None,

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – (default: None) Axis used for plotting, defaults to None.

  • plot_legend (bool, optional) – If True plot legend, False otherwise, defaults to False.

  • colors (str, optional) – List of colors used for plotting samples expression. The number of colors must be the same as the number of groups, defaults to None.

  • ids (list, optional.) – IDs used for legend plotting, defaults to None. Number of ids must be the same as the number of groups.

  • plot_gene_names (bool, optional) – True if gene names shall be plotted, False otherwise, defaults to True.

  • position_gene_names (str, optional) – Either of “top”, or “bottom”, defaults to “bottom”.

  • log_transformed (bool, optional) – If True use log transformed values for plotting, non-transformed values otherwise.

  • plot_points (bool, optional) – If True, a point per expression value is plotted in addition to the boxplot, no points are plotted otherwise, defaults to False.

  • alpha (float, optional) – Alpha value for the background color of the boxplots boxes, defaults to 0.5.

Returns

Plots axis.

Return type

matplotlib.axes._subplots.AxesSubplot

gelviz.basic.plotGenes(genes_bed, exons_bed, introns_bed, region_bed, blacklist=None, gene_map=None, plot_gene_ids=True, y_max=None, distance_ratio=0.1, ax=None, plot_legend=False, legend_loc='lower right', color_plus='#80b1d3', color_minus='#fb8072')

Function for plotting gene structures, i.e. introns exons of genes.

Parameters
  • genes_bed (pybedtools.BedTool) – pybedtools.BedTool object containing TX start, and TX end of genes.

  • exons_bed (pybedtools.BedTool) – pybedtools.BedTool object containing exons of genes.

  • introns_bed (pybedtools.BedTool) – pybedtools.BedTool object containing introns

  • region_bed (pybedtools.BedTool) – pybedtools.BedTool object containing the one region, for which the gene plot is created.

  • blacklist (list, optional) – List of gene names, for genes that should not be shown on the plot, default is None

  • plot_gene_ids (bool, optional) – If True, all gene ids will be included in the plot, False otherwise, default is True

  • y_max (bool, optional) – Max y value in the gene plot. If not set, then y_max is the max number of stacked genes, default is None.

  • distance_ratio (float, optional) – Minimal distance between two genes, as ratio of ax width, such that two genes are plotted side by side. If this ratio is underwent, the genes will be stacked, default is 0.1.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axes instance on which the genes are plotted, default is None.

  • plot_legend (bool, optional) – If True, a legend describing plus or minus stranded genes is plotted, False otherwise. Default is False.

  • legend_loc (str, optional) – Location of the legend. Either of “lower left”, “lower right”, “upper left”, “upper right”, default is “lower right”.

  • color_plus (str, optional.) – Color code for plus stranded genes, default is “#80b1d3”.

  • color_minus (str, optional.) – Color code for minus stranded genes, default is “#fb8072”.

Returns

Tuple of max_y_pos+1.5, patch_list, patch_description_list, where

  1. max_y_pos+1.5 is the max_y_position + 1.5. max_y_pos defines the number of stacked genes.

  2. patch_list is the list of patches drawn on the ax.

  3. patch_description_list is the list of descriptions for the patches drawn on the ax.

Return type

list

gelviz.basic.plotGenomicSegments(segments_list, chrom, start, end, ax=None)

Function for plotting genomix segments in different colors

Parameters
  • segments_tabix_filename – Path to tabixed bed file containing (chrom, start, end, name, score, strand, start, end, color). The color field is used to determine the color for plotting (R,G,B).

  • chrom (str) – Chromosome of the region to be plotted.

  • start (str) – Start position of the region to be plotted.

  • end (str) – End position of the region to be plotted.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis used for plotting, defaults to None.

Returns

Dictionary with keys = names of segments, and values patch

Return type

dict

gelviz.basic.plotHiCContactMap(contact_map, start, end, segment_size, cmap='Greys', vmin=None, vmax=None, location='top', ax=None)

Function that plots HiC contact maps as pyramid plots

Parameters
  • contact_map (pandas.DataFrame) – Matrix that contains the intensity values of HiC contacts.

  • start (int) – Chromosomal start position of region to be plotted.

  • end (int) – Chromosomal end position of region to be plotted.

  • segment_size (int) – Size of the segments for which contacts were called.

  • cmap (str, optional) – Name of the colormap to be used for plotting HiC intensities, defaults to “Greys”.

  • vmin (float, optional) – Minimal value of intensity range to be plotted, defaults to None

  • vmax (float, optional) – Maximal value of intensity range to be plotted, defaults to None.

  • location (str, optional) – Either of “top” | “bottom”. If location == “top”, the pyramid points upwards, else if location == “bottom” the pyramid points downwards, defaults to top,

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which to plot contact map, defaults to None.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.plotMethylationProfile(meth_calls, chrom, start, end, color='k', ax=None)

Function that plots methylation values as dot plots.

Parameters
  • meth_calls

    Iterator containing list-like elements with the following entries:

    1. Chromsome

    2. Start position

    3. end position

    4. Number methylated cytosines

    5. Number unmethylated cytosines

  • chrom (str) – Chromosome of region to be plotted.

  • start (int) – Start position of region to be plotted.

  • end (int) – End position of region to be plotted.

  • color (str, optional) – Color of points representing methylation values, defaults to “k”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis of plot, defaults to None.

Returns

Nothing to be returned

Return type

None

gelviz.basic.plotMethylationProfileHeat(methylation_bed, chrom, start, end, bin_size=1000, ax=None)

Function for plotting methylation values as heatmap

Parameters
  • methylation_bed (pybedtools.BedTool) – Methylation calls. Following fields must be included: Chrom, Start, End, Methylated Cs, Unmethylated Cs.

  • chrom (str) – Chromosome of region to be plotted.

  • start (int) – Start position of region to be plotted.

  • end (int) – End position of region to be plotted.

  • bin_size (int, optional) – size of bin to average methylation values, defaults to 1000.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis to be used for plotting, defaults to None.

Returns

Nothing to be returned

Return type

None

gelviz.basic.plotMotifDirections(motifs_bed, start, end, head_width=0.2, head_length=1000, overhang=0, color_plus='#80b1d3', color_minus='#fb8072', ax=None)

Function that plots TF motifs as arrows, indicating their directionality.

Parameters
  • motifs_bed (pybedtools.BedTool) – pybedtools.BedTool object containing regions of the TF sited to be plotted.

  • start (int) – Start position of the region to be plotted.

  • end (int) – End position of the region to be plotted.

  • head_width (float, optional) – Width of the arrow head as proportion of the arrow, defaults to 0.2

  • head_length (int, optional) – Length of the arrow in bp (depends on the region that is plotted), defaults to 1000.

  • overhang (float, optional) – Fraction that the arrow is swept back (0 overhang means triangular shape). Can be negative or greater than one. Defaults to 0.

  • color_plus (str, optional) – Color of plus stranded TF regions, defaults to “#80b1d3”.

  • color_minus (str, optional) – Color of plus stranded TF regions, defaults to “#fb8072”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which to plot contact map, defaults to None.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.plotRegions(regions, start, end, color='#cbebc4', edgecolor=False, alpha=1, ax=None)

Functions that plots genomic regions as simple rectangles.

Parameters
  • regions (iterator) –

    Iterator containig list-like elements with the following entries:

    1. Chromosome

    2. Start position

    3. End position

  • start (int) – Start position of the region to be plotted.

  • end (int) – End position of the region to be plotted.

  • color (str, optional) – Color of the rectangles representing the regions to be plotted, defaults to “#cbebc4”.

  • edge_color (str, optional) – Color of region edge. If False, no edge is plotted, defaults to False.

  • alpha (float, optional.) – Alpha value of the rectangle, representing the region to be plotted, defaults to 1.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis of plot, defaults to None.

Returns

Nothing to be returned

Return type

None

gelviz.basic.plotTX(chrom_r, start_r, end_r, TX_pos, direction='right', color='k', ax=None)

Function that plots a translocation event as a bar, showing the part of the genome that is translocated.

Parameters
  • chrom_r (str) – Chromosome of the region to be plotted.

  • start_r (int) – Start position of the region to be plotted.

  • end_r (int) – End position of the region to be plotted.

  • TX_pos (int) – Position of the translocation.

  • direction (str, optional) – Direction of the genomic part that is translocated. Either of “left” (upstream), or “right” (downstream), defaults to “left”.

  • color (str, optional) – Color of the bar representing the translocation, defaults to “k”.

  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis of plot, defaults to None.

Returns

Nothing to be returned.

Return type

None

gelviz.basic.readACESeqAsBed(input_filename)

Function that reads CNVs from ACESeq (“most_important”) files and converts them to pybedtools.BedTool object

Parameters

input_filename (str) – Full path to ACESeq “most_important” file

Returns

pybedtools.BedTool object containing CNVs from ACESeq

Return type

pybedtools.BedTool