# Counting graph traversals

The formulation of sequence assembly in terms of Eulerian traversals of a graph implies that the assembly problem can be solved in polynomial time.&#x20;

This is only partly true – a solution to the assembly problem can be found in polynomial time, however there are an exponential number of possible solutions, i.e. the problem is underconstrained (Kingsford, Schatz et al. 2010). Adding additional constraints, such as adding information about the multiplicity of repeats, generally results in an NP-hard variant of the problem.

## Counting Eulerian tours

Just how many possible Eulerian tours are in a graph? The answer can be fairly easily computed for directed graphs (though it's intractable in undirected graphs). The reasoning is based on a simple algorithm for finding an Eulerian tour:

1. Pick an arbitrary node in the graph n
2. Construct a directed spanning tree having n as a root, such that all the edges are pointing upwards (the tree is a collection of directed paths connecting the leaves with the root)
3. Start traversing the graph in a greedy fashion, starting from n – after entering a node, leave the node using an edge that does not belong to the spanning tree, if possible. In other words, whenever a node is encountered on the traversal of the graph, the unique spanning tree edge leaving this node (linking the node to its parent) is traversed last.

This algorithm completes only after finding an Eulerian tour – the requirement that the tree edge is the last visited ensures that there is always an "out", thus the algorithm cannot get stuck before visiting all the edges in the graph.

Thus, the number of possible tours can be computed as a function of two terms: (i) the number of spanning trees in the graph; and (ii) the different possible traversals of the graph once the spanning tree is fixed. The latter is simply the product of all different permutations of the edges exiting each node in the graph (minus the spanning tree edge), i.e.:

$$N\_{\text{Eulerian tours}} = N\_{\text{spanning trees}} \prod\_{v \in V(G)} (\text{outdeg}(v) - 1)!$$

## Counting spanning trees

The number of spanning trees in a graph can be computed in polynomial time by examining the adjacency matrix A of the graph (for edge $$j \rarr k$$, with $$j < k, A\_{jk} = 1$$ and $$A\_{kj} = -1$$. More specifically, the computation focuses on the Laplacian matrix of the graph: $$L = AA^T$$. Kirchhoff's theorem states that the number of spanning trees of the graph is the product of the non-zero eigenvalues of the Laplacian matrix divided by the number of vertices:

$$N\_{\text{spanning trees}} = \frac{1}{n}\lambda\_1\lambda\_2...$$

Equivalently, this number is the same as any cofactor of the Laplacian matrix. Note that the Laplacian matrix is simply a matrix that contains the degree of the graph nodes along the main diagonal ($$L\_{ii} = \text{deg}(i)$$)) and -1 for every edge $$j \rarr k$$ in the graph ($$L\_{jk} = L\_{kj} = -1$$) and 0 for all other entries.

The proof is not immediately obvious, however a very basic intuition can be obtained by noting that once we choose a directed edge $$j \rarr k$$ to become part of the spanning tree, we are effectively eliminating a row ($$j$$) and a column ($$k$$) from the adjacency matrix of the graph, thus selecting the other edges can proceed by analyzing the matrix minor defined by row $$j$$ and $$k$$. The similarity to the computation of a determinant should be apparent.

## Conclusion

If you put together the equations described above, you will notice that the number of possible traversals of an assembly graph is very large. Only one of these is the correct sequence of the genome being assembled.  Further constraining the problem to ensure that the assembler finds the correct genome sequence yields problems that are NP-hard. Thus, the apparent fundamental advantage De Bruijn graph assemblers have over overlap-layout-consensus algorithms is misleading.  Genome assembly is difficult irrespective of how you formulate it. &#x20;

## References

* Kingsford,C., Schatz,M.C., and Pop,M. *Assembly complexity of prokaryotic genomes using short reads*. BMC Bioinformatics, 2010, 11, 21.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mpop.gitbook.io/bioinformatics-lecture-notes/sequence-assembly/counting-graph-traversals.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
