The SeqSphere+ consensus caller is used to calculate a consensus

  • for read alignments resulting of SKESA assemblies that were remapped with BWA
  • for read alignments resulting of SPAdes assemblies that were remapped with BWA
  • for read alignments created by reference mapping with BWA
  • for correcting the consensus of Velvet assemblies

Consensus Base

Search for each position for forward and reverse reads for a base:

  1. if majority of reads at this position contain a gap, a gap is called for the consensus.
  2. else, if the minimum coverage threshold is set, and the threshold is not exceeded for the position, an N is called.
  3. else the sum of read qualities is calculated for each base (A, C, G, T) at the position (or base count, if no qualities are available)
  4. the base with the highest sum of read qualities is taken.

If different bases are found for forward and reverse reads, unite forward and reverse and repeat algorithm above on this single list.

For the result base an ambiguity check is done:

  • If the sum of qualities of reads that support the resulting base at the position is less than 60% of the total sum of read qualities at the position, an ambiguity 'N' is returned. If no quality values are given, the base count sums are used.

Finally, a minimum coverage check is done, if the result base is not a gap:

  • If a minimum coverage threshold is defined and coverage at the position is below this threshold, an N is called in the consensus. The default value for this threshold is depending on the intended use of the consensus caller. For remapping alignments of SPAdes assemblies and for reference mappings with that have an estimated average genome coverage below 50, the threshold is set to 5. Else no threshold is set, so minimum coverage is required.


The thresholds for the required read support (60%) and minimum coverage can be configured in the settings of SKESA, SPAdes, BWA and Velvet.

Consensus Base Quality

The consensus base qualities are caluclated from the two best base qualities for each strand (based on a MIRA-like process):

For each strand, the best quality is taken.
Then 10% of the next best base quality is added.
The total of both strand qualities gives the final quality, with cap 90.
Consensus gaps have always quality 0.

Example:
forward:
    A 30   -> 30     \
    A 20   ->  2     /  -> 32  \
    A 20                        \
    A 20                         \
                                   ->  60     
reverse:                         /
    A 26   -> 26     \          /
    A 20   ->  2     /  -> 28  /
    A 15                         

Uncovered Reference Genome Positions in BAM Files

If the reference genome has regions that are uncovered in the genome under study, those gaps are deleted in the resulting consensus sequence and juxtapositional covered regions are concatenated. This rule applies also for FASTA files that are exported.

Filtering Reads in BAM Files

If one of the following flags are found, the reads are skipped when calling the consensus, and are not imported into samples:

  • Read marked as unmapped
  • Read mapping quality is below 10 (configurable)
  • Read has not a single best hit (i.e., X0 flag is not equal to 1)
  • Read fails vendor quality check
  • Read marked as duplicate (however, marking duplicates is not performed within SeqSphere+)
  • Read alignment start is 0, length is 0 or name is *