This tab permits you to make downstream curation of transcriptome sequence data in two different ways.
Filter best isoform
By selecting the option “Filter best isoform” you can read the annotation CSV file of the whole transcriptome under analysis and, then, state one or more filters to select the most representative sequences among the distinct cDNAs annotated per gene transcribed. This is done using an algorithm, which is a normalized combination of the most relevant BLAST statistics, such as the high-scoring segment pairs (HSPs) of both, the query and the hit as well as the similarity, the inverse of the E-value and the sequencing depth. You can state the filter based on any of these filters or based on all of them.
You can also filter the clusters by positional redundancy to detect and select all non-overlapping sets of isotigs/contigs of a gene partially characterized and then, select the best isoform within each one of these non-overlapping sets (in a red circle within the figure).
Using this utility, you can upload both the FASTA file with your cDNA sequences and its associated annotation CSV file and, then, trim the FASTA sequences according to two options:
- Option "1" combines two algorithms based on the combination of the HSPs for both, the query and the hit, for detecting and classifying the sequences as full-length cDNA or partial cDNAs depending on if the queries chare a core percentage with the subject hit established by the user. For instance, we can define a sequence as full-length if the query shares a core of more than 80% with the subject. The tool obviously assumes that you use the appropriate subject models as reference.
- Option “2” is simpler than option “1” as it just considers the ratio between the HSPs of the query and the subject to consider full-length sequences or partial domains, depending on the criterion of shared core defined by the user.
Finally, the tool permits you to label the sequences as “full-length cDNAs”. “Partial Sequence” or “Related Domain” depending on the core shared and to trim upstream and downstream each sequence to remove frames respectively upstream and downstream from the start codon and stop codon or from the defined core.