2. Representations of pairwise alignments#

Before delving into the intricacies of the alignment algorithms, it’s crucial to understand the matrix representation of alignments. This approach is not just a computational convenience but a fundamental framework that encapsulates the complexity of aligning biological sequences. Here’s an overview of why matrices are used in sequence alignment and the significance of the traceback process.

2.1. Understanding Matrix Representation in Sequence Alignments#

2.1.1. The Essence of Matrices in Alignments#

Aligning sequences involves comparing each character of one sequence with every character of another to identify regions of similarity or difference. Given the combinatorial explosion of possible alignments with even moderate-length sequences, a brute-force approach to comparison is computationally infeasible. This is where matrices come into play, offering a structured and efficient way to represent all possible alignments between two sequence.

2.1.2. How Matrices Facilitate Alignment#

A matrix for sequence alignment is essentially a grid. Each cell in this grid represents a comparison point between a character from one sequence and another from the second sequence. Each cell of the matrix correspond to the characters in the two sequences being aligned. And a path from the first left cell to the bottom right cell of the matrix represent a global alignment.

One step of an alignment

Fig. 2.1 A step in the path representing an alignment.#

If your alignment is in position \(a_i,b_j\) of to sequences \(\mathbf{a}=(a_1,\ldots,a_N)\) and \(\mathbf{b}=(b_1,\ldots,b_N)\) then the next step in the alignment is either a match/mismatch between \(a_{i+1},b_{j+1}\) (diagonal movement), a delete \(a_{i+1},-\), (a vertical movement), or an insert \(-,b_{j+1}\), (a horizontal movement).

2.1.3. Alignments as paths between the cells#

The path we follow tells us how the sequences should be aligned. A move diagonally signifies a match or mismatch, indicating that the characters from both sequences should be aligned with each other. Moving up or to the left indicates a gap in one of the sequences.

2.1.4. Examles of alignments#

An example of some matrix representations of an alignment is found in Fig. 2.2.

Matrix representation of three alignments

Fig. 2.2 Matrix representation of three full length alignments of ACGTACT and ACTACGT, as well as one partial alignment of the same sequences.#

The path of Alignment 1 is given in red, Alignment 2 in green, and Alignment 3 in blue. We also included a partial (local) alignment, Alignment 4 in orange.

Alignment 1 (red)

Alignment 2 (green)

Alignment 3 (blue)

Alignment 4 (orange)

---ACGTACT

ACGTAC-T

ACGTACT----

ACG

ACTACGT---

AC-TACGT

----ACTACGT

ACG

In summary, the matrix representation of alignments and the traceback process are not merely computational techniques; they are integral to understanding the relationships between biological sequences. By breaking down the alignment process into a series of quantifiable steps, matrices allow us to systematically explore the vast space of alignment possibilities, leading us to the optimal alignment that best reflects the evolutionary or functional relationship between the sequences involved.

2.1.5. Excercises#

Exercise 2.1 (Matrix Representations of Alignments)

  • Which alignment does the following path represent?

Exercise on Matrix representation of alignments
Reveal Answer

GATTA-GA
G-CTACTA