Bioinformatics lecture notes
  • Introduction
  • Introduction to biology (for computer scientists)
  • Ethical considerations
  • Finding patterns in DNA
    • Introduction to pattern discovery
    • Looking for frequent k-mers
    • Leveraging biology
    • Finding genes
  • Exact string matching
    • Introduction to exact string matching
    • Semi-numerical matching
    • The Z algorithm
    • The KMP algorithm
  • Multiple sequence alignment
    • Introduction to multiple sequence alignment
    • Motif finding
  • String indexing
    • Introduction to string indexing
    • Introduction to suffix trees
    • Suffix trees: beyond the basics
    • Suffix arrays
    • The Burrows-Wheeler transform and the FM-index
  • Inexact alignment
    • Introduction to inexact alignment
    • Inexact alignment calculation with dynamic programming
    • Example: filling the dynamic programming table
    • Modeling alignment as a graph
    • Backtracking through the dynamic programming table
    • From edit distance to alignment scores
    • Local alignment
    • Exercises
  • Advanced inexact alignment
    • Gap penalties
    • Sequence alignment in linear space
    • Sequence alignment with bounded error
  • Proteomics data analysis
    • Introduction to proteomic data analysis
    • From peptides to theoretical spectra
    • Cyclopeptide sequencing
    • Dealing with errors in experimental spectra
  • Data clustering
    • Introduction to data clustering
    • K-means clustering
    • Hierarchical clustering
  • Phylogenetic analysis
    • Introduction to phylogenetic inference
    • Distance-based phylogenetic analysis
    • Trait-based phylogenetic inference
  • Sequence assembly
    • Introduction to sequence assembly
    • Graph formulations of sequence assembly
    • Finding Eulerian tours
  • Gene finding and annotation
    • Introduction to sequence annotation
    • Gene finding
    • Introduction to Hidden Markov Models
    • Taxonomic and functional annotation
Powered by GitBook
On this page
  1. Multiple sequence alignment

Introduction to multiple sequence alignment

PreviousThe KMP algorithmNextMotif finding

Last updated 3 months ago

In this section we will focus on the computational challenges that arise from trying to relate to each other more than two biological strings. We ask several questions. How do we determine the relationship between multiple protein sequences that have been shown to perform the same biological function? Such a question can be addressed as an extension of global sequence alignment (see Inexact alignment) that can handle multiple sequences, i.e., a global multiple sequence alignment. How do we determine the signals hidden in the upstream region of genes that determine the timing and order in which genes are turned on or off? This is a variant of local multiple sequence alignment (since we are only interested in a small segment of the upstream region of genes), and can be solved through . How do we find the relationship between multiple similar genome sequences? This multiple genome alignment problem has similarities to global multiple sequence alignment, but is substantially more complex given the much longer sequences involved and the potential for large amount of differences between them.

motif finding