User Manual

Overview

NMPFamsDB is a database which hosts novel metagenome protein clusters with no or weak hits to Pfam or Reference genomes and aiming at significantly expanding the protein family space known till today.

With NPFAMsDB you can:

Explore the novel protein families and their genomic content
See their habitat distribution
Follow their biogeographical trademarks
See their taxonomy information
Explore potential novel structures and folds
Use Hidden Markov Model (HMM) and sequence-based searches for querying the protein clusters
Filter information by smart filters at any stage of analysis

NMPFamsDB has the following pages/sections:

Home: The NMPFamsDB Home Page
Browse: The database browser. You can browse NMPFamsDB contents from the following subcategories:
Sequence Search Tools: Perform sequence-based queries. The following options are offered:
- LAST Search: Perform pairwise alignment searches using LAST.
- HMMER Search: Perform HMM-powerred sequence searches using HMMER.
- Pattern Search: search sequences using PROSITE-like sequence patterns or regular expressions.
Visualization Tools: Perform various types of analyses. The following options are offered:
- Ecosystem & Phylogeny: Create various types of plots for subsets of the database.
- Geographical Distribution: Visualize the geographical distribution of database components.
Programmatic Access: The NMPFamsDB Application Programming Interface (API).
Statistics: An overview of NMPFamsDB's contents and their distributions.
Downloads: Download NMPFamsDB contents in various file formats.
Manual: This Help Section.
Contact: The database's contact information.

Availability

NMPFamsDB is publicly available through http://nmpfamsdb.pavlopouloslab.info or https://bib.fleming.gr/NMPFamsDB.

How to use this manual

Topics in the Manual are divided in separate tabs, accessible through header buttons at the top of the page. Click on any of these header buttons to navigate to its respective section.

Long sections are divided into subsections, that can be scrolled up and down. At the end of each subsection a link exists, labeled [Back to top]. Clicking on it will return you to the top of the section.

Ecosystem & Phylogeny

The Ecosystem & Phylogeny tool allows you to visualize and plot the distribution of NMPFs and their relationships, based on their associations with user-defined ecosystems or taxonomic groups. Through the tool, you can create a number of different graph types (matrix plot, venn diagram, circos plot, total-vs-specific bar chart and Upset plot), customize them at will and export them as images.

The tool is available by navigating to Visualization Tools → Ecosystem & Phylogeny. From the top menu. You will be redirected to the following input form:

The input has two panels, labeled "Ecosystem" and "Phylogeny" and accessible through the respective tab buttons at the top of the form.

Through "Ecosystem" you can select different ecosystem types, by expanding or collapsing the tree structure at the left of the form and selecting the ecosystems you desire by clicking at the checkbox next to each ecosystem name. The names of the selected ecosystems will appear at the text panel to the right of the form. In addition, you can set an association cut-off, by clicking the Apply association cut-off checkbox and dragging the slider button to the value of your choice (in the above example, we set it at 5 %). Finally, to limit your run to only families with exclusive association to the selected ecosystems (i.e. 100% association cutoff), click on the relevant checkbox.

Similar to "Ecosystem", you can select different taxonomic groups by navigating to the "Phylogeny" tab and using the search form there. The taxonomic groups are organized in a hierarchical tree in the same manner as the ecosystems. An association cutoff can also be set, by clicking the Apply association cut-off checkbox and dragging the slider button to the value of your choice.

When you are ready to begin, click the "Submit" button.

After the tool has finished, you will see the results in the following manner:

The results are presented in an interactive viewer at the top of the page. These include the run type, association cutoff and a table with all selected categories (ecosystems or taxonomy groups), their taxonomy tree, and number of associated families. The number of families values are links; clicking them will open a new tab into the Browse Families page and retrieve them.

The interactive viewer is vertically split into two parts. To the left, there is a control panel. At the top of the control panel you will find your selected category datasets. You can select or deselect them by clicking on the checkboxes or using the Select / Deselect all checkbox. You can rename or remove them by clicking on the Rename or Remove buttons, respectively. You can change the order of a set in the list by selecting only that set and clicking the Move Up or Move Down buttons at the right of the list. Finally, you can apply your changes by clicking the Create Plots button.

Note: The above actions will re-calculate and re-create all visualizations and plots. This can be time-consuming, especially if the number of families in the datasets is large.

At the bottom of the control panel, you will see a Color Palette panel. Each item in the palette corresponds to a dataset. By clicking on it, a colorpicker window appears, enabling you to change the color. To use the updated colors in your plots, click the Update Plots button. To reset the palette to its original values, click the Reset Palette button. Note that clicking the Update Plots button will only change the color of the plots, not re-calculate them. You can export your color palette to a tab-delimited format using the Export color profile button. You can also import your own color palette by using the Import Color Profile upload button. The palette should be in a tab-delimited format. The first column should contain the dataset names, while the second should contain color values in the hexadecimal format. The first line should contain the column names, Dataset and Color, respectively. An example of a color pallette file is shown below:

Dataset Color
Terrestrial #996600
Air     #00CC00
Aquatic #00FFFF
Human   #0066FF
Mammals #FF0000

The plots of the results are shown to the right of the interactive viewer. Each plot is given in its own subtab, accessible by clicking the tab buttons at the top of the plots panel. At the bottom of each plot, an Export Image button appears, enabling you to download each graph in the PNG image format. The different plot types are shown below:

A color-coded matrix is a square, N * N map (where N = number of datasets) showing all combinations of family values among the datasets. The names of each set are given in the horizontal and vertical axes, with the number of each cell showing the intersection between two sets (e.g. the intersection between Aquatic and Mammals, in this case, is 15). The diagonal of the matrix presents the dataset-specific components, i.e. the families that appear only in that specific dataset.
A Venn diagram shows the overlaps between the different sets in a graphical manner. All the different combinations are shown. For reasons of efficiency, Venn diagrams are created for collections of up to 5 datasets.
The circos plot shows the relationships between the sets in a circular chart.
The bar plot visualization plots the total vs dataset specific numbers of families in a bar chart.
The Upset plot is a combinatorial chart, featuring three plots in one: a horizontal bar chart, showing the total components of each dataset, a dot plot, displaying various combinations of sets, and a vertical bar chart, showing the values of each combination.

As the Upset is the most complex of all plot types offered, a number of additional options are given in its panel to allow further customization:

Type: Change the type of information given. The available options offered are:
- Default: The default type, containing a combination of intersections and unions
- Intersection: show only dataset intersections
- Distinct interesection: show only intersection combinations that appear nowhere else, including the distinct per dataset components
- Distinct per file: Show the distinct per dataset families.
The above are also connected to the two sliders below:
- Number of datasets in plot: define the number of datasets used in the plot
- Number of datasets in combinations: define the minimum and maximum number of sets in the combinations of the upset.
Colors: display the chart in color, or in black and white.
Bar labels: show or hide bar labels.

To update the upset plot click the Update Upset button. Note that this recalculates the values of the upset, so it can be slow, depending on the set sizes.

Overview

Availability

How to use this manual

Data organization in NMPFamsDB

Browsing and Searching NMPFamsDB

Families

Browse Families

Keyword

Sequence & Structure

Environment

Phylogeny

Performing a search query

Family Entry Organization

Overview

Alignments

Structure & Topology

Gene Neighborhood

Phylogeny

Habitats

Sequences

Downloadable Files

Sequences

Browse sequences

Keyword

Environment

Phylogeny

Performing a search query

Scaffolds

Browse Scaffolds

Keyword

Environment

Phylogeny

Performing a search query

Scaffold Entry Organization

Overview

Ecosystem & Geography

Associated Families

Sequences

Datasets

Browse Datasets

Keyword

Environment

Phylogeny

Performing a search query

Dataset Entry Organization

Overview

Ecosystem & Geography

Associated Families

Associated Scaffolds

Sequences

Ecosystems

Browse Ecosystems

Sequence Searches

LAST Search

HMMER Search

Sequence vs Sequence

Sequence vs HMM

Ecosystem & Phylogeny

Geographical Distribution

Programmatic Access

Individual Family Entry Data

Application Programming Interface (API)

Downloads

Privacy Policy

1. Who controls your personal data and how to contact us?

2. What is the lawful basis for data collection?

3. What personal data is collected from users?

4. Who has access to your personal data?

5. Will your personal data be transferred to other organisations?

6. How long is your personal data kept?

7. Cookies Policy

8. The website’s data controllers provide the following rights regarding your personal data