At Biotechvana we are pleased to offer our users de novo data analysis services aimed at assembling and annotating new genomes and transcriptomes, both prokaryotic and eukaryotic, for which no prior reference is available. These services also allow the study of microbial metagenomes and metatranscriptomes for the characterization of complex communities and non-model species.
Within the scope of de novo studies, we offer different types of analyses tailored to the needs and objectives of each project, including:
- Genomes and metagenomes
- Transcriptomes and metatranscriptomes
- Viruses and Mobile Genetic Elements
Each project is approached in a personalized manner, adapting the workflow to the experimental design.
We perform assembly and analysis of complex sequencing data to obtain reliable genomic and transcriptomic representations in the absence of a reference. This approach allows work at both individual-organism and community levels, enabling the study of gene content and expression profiles as a basis for subsequent functional analyses.
- Quality analysis of sequencing reads (sff, fastq, sam, fasta, etc.).
- Read preprocessing, including demultiplexing and removal of low-quality sequences, primer/adapter remnants, and artifacts.
- Assembly of processed reads (into contigs) and scaffolding of contigs.
- Gap filling and re-scaffolding in genomic studies.
- Consensus reference reconstruction by merging two or more assemblies.
- Isoform reconstruction (for transcriptome-oriented studies).
- ORF prediction and extraction (prokaryotic) or exon–intron structure prediction (eukaryotic).
- Inference of assembly metrics.
- Repeat masking where appropriate.
- Automatic annotation of coding and non-coding genes.
- Functional analysis and metabolic pathway characterization.
- Detection of regulatory elements such as start/stop codons, promoters, etc.
- Data integration.
- Correction of homopolymers and artifactual frame shifts in coding sequences.
- Characterization of paralogous and orthologous genes.
- Phylome annotation.
- Data mining oriented to knowledge discovery.
- Comparative analysis.
- Curation and post-processing of sequences.
- Database implementation.
We address the identification, assembly, and structural characterization of viral genomes and mobile genetic elements present in sequencing data. These analyses allow description of their genomic organization, evaluation of their diversity, and study of their role in the genetic and evolutionary dynamics of the systems analyzed.
- Quality analysis of sequencing reads (sff, fastq, sam, fasta, etc.).
- Read preprocessing, including demultiplexing and removal of low-quality sequences, primer/adapter remnants, and artifacts.
- Curation of reads when appropriate.
- De Novo assembly of processed reads.
- Genome circularization of the mobile element, if applicable.
- Inference of assembly metrics.
- Characterization of LTR and TIR where applicable.
- ORF annotation for genes and other regulatory elements of the viral or mobile element genome.
- Phylogenetic analysis.
We carry out functional and taxonomic annotation of viromes and mobilomes to interpret the genetic content of viruses and mobile elements present in a sample. This analysis enables identification of functions, families, and genes of interest, as well as exploration of processes related to genetic transfer, adaptation, and evolution.
- Masking of repeats and transposons using comparative analysis with various reference databases.
- De Novo identification of repeats by self-comparison of the characterized genome.
- Search for tandem repeats in the genome.
- Characterization of tight junctions.
- Characterization of LTRs and TIRs.
- ORF annotation for genes and other features.
- Functional analysis using biological vocabularies such as Gene Ontology (GO).
- Characterization of metabolic pathways.
- Annotation and reconstruction of the viral phylome.
- Complete reconstruction of a mobile element or viral genome.
- Analysis of orthologous mobile elements or viruses.
- Analysis of differential insertion in the host genome.
- Data integration.
- Other statistical analyses.