OVERVIEW
1.1 - ABOUT DENOVOSEQ
DeNovoSeq is a Client-Desktop Application of the GPRO suite with custom dedication to manage and run pipelines and workflows for de novo reconstruction and annotation of new genome and transcriptome sequences using NGS data and a protocol based on the State-of-the-Art (Fig.1). The application is coupled with an infrastructure of server-side dependencies (pipelines, databases, and tools) that we distribute in a container that can be installed on a remote server or on a PC with sufficient RAM. The application also includes a File Transfer Protocol system (FTP) to facilitate the upload and download of files from the user’s computer to or from the server; a progress tracker (job tracking system), and two different execution modes (a “step-by-step” mode, and a “pipeline-like” mode).
The Step-by-Step mode is a procedure similar to those implemented in Galaxy (Afgan et al 2018) and others GUI-based solutions for NGS data analysis. This mode organizes the different steps of the protocol for de novo analysis (i.e. quality analysis, preprocessing, de novo assembly, gene prediction, annotation and functional analysis) into an intuitive menu providing a selection of command line interface (CLI) third party software for each step. At the same time each CLI tool has an interface implementation with distinct fields to declare the inputs and outputs files or for tuning the options and parameters provided by that tool. In contrast, the pipeline mode is a pipeline configuration system allowing the user to execute all the steps of a given protocol automatically one after the other. To this end the user just need to select a specific pipeline from a list, declare the experiment design, as well as, the input and output data, then configure the option and parameters and finally run the pipeline where the distinct analyses will be executed sequentially one after another.
In contrast, the pipeline mode is a pipeline configuration system allowing the user to execute all the steps of a given protocol automatically one after the other. To this end the user just need to select a specific pipeline from a list, declare the experiment design as well as the input and output data, then configure the option and parameters and finally run the pipeline where the distinct analyses will be executed sequentially one after another.
Figure 1: Bioinformatic protocol implemented in DeNovoSeq for reconstruction and annotation of new genome and transcriptome sequences for a first time and without the aid of a reference sequence template. The tool provides two execution modes (Step-By-Step and Pipeline-like). An interactive version of this protocol is available at GENIE our virtual assitant.
CITING DeNovoSeq:
Hafez A, et al. DeNovoSeq an application of the GPRO suite for de novo characterization of genomes and transcriptomes and other GPRO updates (in preparation))CITING THE GPRO SUITE:
Futami R, Muñoz-Pomer A, Viu JM, Dominguez-Escribá L, Covelli L, Bernet GP, Sempere JM, Moya A, Llorens C. 2011. GPRO: the professional tool for management, functional analysis and annotation of omic sequences and databases. Biotechvana Bioinformatics: 2011-SOFT3, http://bioinformatics.biotechvana.com/index.php/article/351.2 - VERSIONS AND DOWNLOADS
DeNovoSeq includes an installer for Windows 7 (64 bit), a self-extracting disk image for Mac OS X 10.6 or later (64 bit), and a compressed tarball archive for Linux 2.6 kernel series or later (64 bit). You can download the lastest version of these executables at this link:1.3 - INSTALLATION AND REQUIREMENTS
1.3.1 - INSTALLING DENOVOSEQ IN YOUR PC
DeNovoSeq is a Java application that can be easily installed on PCs with at least 2GB of RAM and that have installed the Java Runtime Environment (Java JDK) version 17 or above.To check if you already have a JDK installed, open a command line interface and type:
java -version |
$ java -version
|
For installing JRE, go to the official JRE repository here and download the version that suits your operating system. Once installed, check again the output of the java -version command show above on your command line interface. Sometimes, although the JRE is installed, it is not set at the root path.
To install the Windows version
Download the
DeNovoSeq-win32.win32.x86_64.zip file and unzip it.
Then browse to the executable file “DeNovoSeq.exe” and execute/run it.
To install the Mac version
Download the
DeNovoSeq-macosx.cocoa.x86_64.zip file and unzip it. Then browse to the executable binary file "DeNovoSeq.app" and execute/run it.
To install the Linux version
Download the
DeNovoSeq-linux.gtk.x86_64.zip file and unzip it. Then browse to the executable binary file “DeNovoSeq” and execute/run it.
1.3.2 - SERVER SIDE DEPENDENCIES
DeNovoSeq is a Client Side + Server Side solution thus meaning that the application is coupled via API with a bioinformatic infrastructure called GPRO Server Side that contains all the dependencies needed by DeNovoSeq to execute the workflows and pipelines. These dependencies are scripts, databases and the following third party CLI software:
- for quality Analysis and Preprocessing:
- Fastx Toolkit (Hannon Lab 2016)
- Cutadapt ( Martin 2011 )
- Prinseq ( Schmieder and Edwards 2011 )
- FastQC ( Andrews 2016 )
- and Trimmomatic ( Bolger et al. 2014 )
- for assembly, gap filling and scaffolding:
- Oases ( Schulz et al. 2012 )
- SOAPdenovo-trans ( Xie et al. 2014 )
- Velvet ( Zerbino and Birney 2008 )
- Spades ( Bankevich et al. 2012 )
- CANU ( Koren et al. 2017 )
- SOAPdenovo2(Luo et al. 2012)
- Gap Closer (Luo et al. 2012)
- BESST ( Sahlin et al. 2014 )
- OPERA ( Gao et al. 2011 )
- for Gene Prediction and Annotation:
- AUGUSTUS ( Stanke et al. 2008)
- NCBI BLAST ( Altschul et al. 1990 )
- HMMER3 ( Eddy 2011)
The GPRO Server Side can be installed in the PC of the user or in remote servers as a Cloud Computing resource. However, its installation is a complex task due to the amount of dependencies and requirements (besides of the CLI software) for installing and running this infrastructure. For this reason, we distribute the GPRO Server Side in a Docker container that can be easily installed for the user in a couple of steps. Indications for installation of the GPRO Server Side Docker are available here here.
1.3.3 - LINKING DENOVOSEQ WITH THE SERVER SIDE
Once the GPRO server side docker has been installed you need to link DeNovoSeq to it. To do this, go to [Preferences → Pipeline connection settings] in the top menu and type the following into the configuration Dialog (Fig.2):- Your email address: to receive notifications from the server.
- Host / IP address: here you should type localhost (see figure 2).
- Port number:This field should be filled only in case of you installed the server side manually and need to access via SSH. In that case the default number will be 22.
- Username and password: Your ID credentials provided to access the host server.
As also shown in Figure 2 you can also check the option “Run GPRO server locally using Docker” to let you to automatically start the GPRO container each time you run DeNovoSeq (Also note that if you have this option checked you do not need to type the port). You can test if the app is connected to the Server Side clicking on the tab “Test connection settings”. Alternatively, if you install the Server Side manually (without the Docker) just add the IP of the remote server where the Server Side is hosted, add the port information (by default 22) and keep the Option “Run GPRO server locally using Docker” unchecked.
Figure 2: Server connection dialog.
1.3.4 - RAM ASSIGNATION TO YOUR PC
To modify the RAM assigned to DeNovoSeq, you can edit two parameters (‘Xms’ and ‘Xmx’) in the “DeNovoSeq.ini” configuration file. In Linux or Windows computers, the “DeNovoSeq.ini” configuration file is located inside the DeNovoSeq app folder. In macOS computers, the file can be found by right-clicking on[DeNovoSeq.app → Show package contents → Contents → MacOS → DeNovoSeq.ini].
Within the “DeNovoSeq.ini” file, the Xms and Xmx parameters look like this:
Xms1024m (Minimum allocated memory)
|
1.4 - GETTING FAMILIAR
1.4.1 - DENOVOSEQ LAYOUT
The layout of DeNovoSeq is structured in the following sections: the "Directory Browser", the "FTP Browser", the "Working Space", the "Top menu" and the "Step-by-step Interface Menu". (Fig. 3)- DIRECTORY BROWSER: This provides access to any folder already contained in the user’s PC so that it can be uploaded into the DeNovoSeq application.
- FTP BROWSER: This provides access to the files contained in the server, allowing for files to be transferred from or from the server / directory browser. This is achieved by dragging the files from one side to the other.
- WORKSPACE: A central working space for viewing the different pipeline interfaces.
- TOP MENU: Main menu that allows for the selection of work mode, selection and execution of pipelines, and server connection preferences.
- STEP-BY-STEP INTERFACE MENU: The menu for managing the step-by-step protocols.For details of the Step-by-step menu see the Section “Step-by-step mode usage” of this manual.
Figure 3: Main layout of DeNovoSeq. Both the Directory and FTP Browser windows can be either resized or masked by clicking on the window icons at the top right corner of their respective windows. All files and folders contained in either of these browsers can be managed manually using the mouse. Please keep in mind that the window views shown in this manual will change depending on the operative system used.
1.4.2 - FUNCTIONS OF THE TOP MENU
The Top Menu presents the following tabs, each of which has a scroll down list with the following functions:- DIRECTORY:
[Directory → Select directory folder ] :Selects a workspace from the directory browser.[Directory → Show ] :Shows the Directory Browser.[Directory → Hide ] :Hides the Directory Browser.- DENOVO PROTOCOLS:
[Denovo protocols -> Pipeline Mode ]:Configuration view to set up your assembly project and launch a the de novo assembly pipeline.[Denovo protocols -> Input Configuration file ] :Configuration the file used to declare number and type of samples, as well as type of experiment.[Denovo protocols -> Step-by-Step Mode → De Novo Protocols] :View of the available Protocols for de novo analysis in manual mode- PIPELINE JOBS:
[Pipeline Jobs → Jobs Tracking System ] :For jobs tracking.[Pipeline Jobs → FTP Transfers ] :screen for tracking the jobs of the sFTP protocol.- PREFERENCES:
[Preferences → Pipeline Connections Settings] :Gives access to the server login details setup your user credentials for accessing the server.- HELP:
[Help → About DeNovoSeq ] :Technical details and copyright of DeNovoSeq, as well as links to the license and user manual.
1.4.3 - SOME BASICS ON THE DENOVOSEQ INTERFACE
A typical DeNovoSeq interface presents at least two blocks of fields. The first presents the fields for managing input and output files and folders and the second is the set of forms for options and parameters configuration. The procedure is illustrated in the animation provided above in Fig.4.
Figure 4: Example of interface for a CLI tool provided by the ”DeNovoSeq” Step-By-Step mode. The figure shows the interface for Oases.
From the FTP browser select the input file/s with the mouse and drag it/them to their respective fields. The same can be done for the folder/s where you would like to have the output deposited. Next, fill all other mandatory field(s) in the input/output block. If an input field is invalid or missing, you will get an error icon
beside the field (to see the error message, you will need to hover the mouse over that icon). Next, configure the form for options and parameters available in the second block of fields. If you set a parameter out of the possible range, you will receive a warning icon
. If you need more information about any input or parameter, click on the exclamation mark ! in front of the input field name.
Filling the interface block for options and parameters is not mandatory. If you run the program without inputting any parameters and/or condition, DeNovoSeq will execute this program using the default conditions for such a program. Once the job input fields and the parameters have been fulfilled and/or configured, click to start button at the end the interface form to run the Job. If the job has been successfully launched, you will get a confirmation message. Otherwise revise again all input and outputs upload and the options if any.