Travelogue: ASM NGS 2022, Baltimore

On October 16th around 6 AM the plane taking me to Baltimore, Maryland took off from the Hobby airport in Houston. I generally do not mind very early morning flights, and since I still retain the magical (and much envied) ability to sleep on the planes the commute was fine. I arrive in BWI around 10 AM and after picking up my luggage headed directly for the Sheraton Inner Harbor hotel. The hotel did not have an early check-in option available, so I dropped off my bags and went for a walk. I found a neat coffee shop and spent a bit of time catching up on some reading.

Inner harbor in Baltimore, Maryland, around 10:45 AM. Not the best shot, but I managed to catch a bird in this one, so I am posting it.
Inner harbor in Baltimore, Maryland, still around 10:45 AM. Still not the best shot, but also I didn’t take any particularly good ones this time. I guess being awake since 3 AM ain’t helpful for my artistic eye.
Washington monument at Mt. Vernon in Baltimore, Maryland, circa noon. Quality of the shots improves as the day goes on.

After the coffee break, I met up with Evan Mata and we caught up on school life, travel stories, and I finally got my oyster fix. Two key takeaways from our lunch were as follows: (1) we both wish we took more math courses in college (and in fact this appears to be a common trend for many of my college friends), and (2) Maryland oysters are still delicious, albeit not as cheap as I remembered.

After the lunch I made it back to the hotel where I finally could check in and catch my breath before the opening keynotes for the ASM NGS 2022. Besides the keynotes, and a brief chat over dinner the first day was not jam packed with activities, so I was able to crash early and catch up on some of the sleep that I missed due to the early flight.

Next day I attended a series of the talks in the morning, after which I headed for the University of Maryland, College Park. UMD campus is quite expansive, and in some strange way it reminded me of the University of Tennessee in Knoxville. The computer science building at UMD is quite impressive both in size and design, and many offices tend to have nice floor to ceiling windows, which is an awesome touch. I did not have too much time to explore the campus, as the visit was relatively tightly packed with meetings, and as soon as those were wrapped up I headed back to Baltimore.

E. A. Fernandez IDEA Factory on the University of Maryland College Park campus. The only shot I took at the UMD campus (sadly, since the campus is quite pretty IMHO).

Although, I unfortunately missed most of the afternoon talks on the first day, I still managed to catch the poster session. In general, I am a big fan of poster sessions. The main benefit of a poster (as opposed to a talk) is the ability to ask a multitude of questions about the nitty-gritty details of the work without feeling guilty of encroaching on other people’s time. I found a good number of quite interesting posters relating to SARS-CoV-2, bacteria and horizontal gene transfer, and even a poster on a speed up for neighbor-joining algorithm. After the poster session I briefly caught up with Mike Nute over dinner which importantly included a crab cake (another checklist food item for me on this trip).

Grabbing dinner with Mike Nute at Phillips Seafood in Baltimore, Maryland. Quite solid crab cake and overall good spread of seafood. Very enjoyable dinner option not too far from the conference hotel location.

The crab cake ended up being quite good, although I have a feeling that with a bit of effort I can probably make a comparable if not a better one at home. However, crab meat is not cheap to come by, so I am not likely to test that hypothesis any time in the near future.

Tuesday was a full conference day for me. I attended all of the talks that day, and tried to keep a list of close notes for most of them. Unfortunately the tables at the conference did not have any outlets or power strips connecting to them or running along the floor. Thus, my note taking was cut short towards the second half of the afternoon, as my laptop ran out of its battery.

The last talk session of they day was a wastewater focused mini-symposium. The talks in this session were quite interesting, and indeed reflected a lot of my experiences and concerns when dealign with wastewater data. Namely, issues of the data quality, metadata annotation, and general complexity of the wastewater samples, which in certain cases ends up being overlooked. I was very happy that I was able to give a talk in this session as well. I was presenting the joint work between Treangen and Stadler labs at Rice University in conjunction with the Houston Health Department. My talk focused on QuaID: a software package we developed for sensitive detection of recently emerged SARS-CoV-2 variants in wastewater sequencing data (preprinted at medRxiv).

My first conference talk looked something like this. ASM NGS 2022, Day 2, wastewater session.
Some scallops (which were yummy) at the Watershed in Baltimore, Maryland. Post talk dinner feels good, although sitting on a rooftop terrace might not have been the best decision given it was in the low 50 F range?

Overall the talk went very well, and I enjoyed the experience. Of course, as any public performance giving the talk came with a fair deal of stage fright and nervousness. However, extensive practice runs with my labmates, adviser, and collaborators helped a lot in mitigating anxiety, and polishing the presentation content. As per usual, I do think that several moments could have been executed better on my part, but I’ll hope to use that as the knowledge for preparing my next presentations.

The evening continued on into another poster session with a fair number of exciting wastewater and clinical SARS-CoV-2 work, as well as some benchmarking studies of interest. I felt that the second poster session was a tad bit more lively than the first one, but it also could have been an artifact of my post-talk state. After the poster session, I joined a large-ish group of attendees for dinner and chats. Scallops were not on my Baltimore food checklist, but among the seafood options offered at the Watershed they stood out as an interesting choice. Overall I enjoyed them, although next time I will probably go for the crab cakes.

The last day of the conference was relatively short with only a morning session of the talks. There was a plenty of interesting talks this day, and I was quite pleasantly surprised to hear several talks involving deep learning in a principled and well motivated way. The only drawback of this session was the amount of lightning talks given back to back. It was at times hard to properly note down all the details, and due to the format and the fact that this was the last day it was harder to follow up with the presenters to discuss some of the finer details of their work. However, I still managed to take a good amount of notes, and expand my “papers to read” list by a dozen or so manuscripts.

After the conference, I checked out of the hotel and headed to our last stop before the airport: Johns Hopkins University. First we stopped by the medical campus and grabbed a lunch, which consisted of an absolutely wonderful lamb stew with rice. Then we moved to the Homewood campus pictures of which are attached below.

Afghani food at Kabobi in Baltimore, Maryland near JHU Medical Campus. Amazing lamb stew with veggies, a very hearty and comfortable meal.

Being on JHU Homewood campus reminded me of one of the greatest pleasures in the fall season: observing the colorful trees in a light cool breeze. However, similarly to the UMD visit, I did not have much time to wander around and observe the trees, as we mainly focused on the scheduled meetings.

At the end of the trip I was absolutely exhausted, but overall happy with the experience. I managed to tick off several of the key food checklist items on this visit. Of course more importantly, I really enjoyed attending the ASM NGS conference, and finally being able to experience an in-person conference. Giving a talk was a huge added benefit, and all the meetings, lunches, and dinners with multiple new and old acquaintances and collaborators (some of whom I finally got to see in person) made this a very memorable trip.

Getting back to Houston late at night I was reminded that this is a spooky season 🎃!

Scheduled Maintenance

Hi, if you expected a weekly post here I am extremely glad that you are someone who checks out my blog regularly. However, this week I am skipping my regular post. No, I am not traveling nor am I absolutely crushed by 1001 deadlines. In fact, I had a solid post mapped out and even partially written, but I did not have the energy to fully wrap it up in time.

What’s going to happen next?

First, I am going to be traveling mid of October onwards (within the USA), so catch me on tour (not really). Hence, I am likely to have more travel and conference content rather than the “vignettes” style posts. Nevertheless stay tuned as I will try to provide some form of content be it travel updates or micro-posts at least every Monday.

Second, I am seriously debating taking a tropical vacation. I wish it meant going to an actual tropical resort, but rather I mean a cursory exploration of: (a) tropical geometry (via Maclagan and Sturmfels, 2009); (b) algebraic statistics (via Sullivant, 2018) ; and (c) how all of this relates to bioinformatics (via papers + some of the Lior Pachter’s blog posts). I am not perfectly sure this is the journey I am interested in for the next few months, but it is one of the possibilities I am exploring.

Third, I believe that my most recent posts were avoidant of code snippets which is in part justified by the lack of time to write out meaningful code primers. However, this is not the format I am fully enjoying. I believe that code snippets bring the life to abstract concepts and provide an extra layer of interactive experience that can be beneficial in learning process. Hence, I will be making more effort to bring some code to the posts going forward.

Have a good spooky month! 🎃 👻

Phylogenetic Vignettes: Tree Rearrangements

As I mentioned in the previous post, I am taking some time off to figure out which direction to pursue next in the blog post series. However, that does not mean that the whole series itself is going into hiatus. In the meantime, I am going to cover a few other topics in phylogenetics. Additionally, I will cross post some of the one-off posts on Dr. Treangen’s lab website. Now, with logistics out of the way let’s get into tree rearrangements.

Introduction

Whenever we are faced with a problem in combinatorial optimization it is often convenient (or even necessary as the majority of interesting problems turn out to be NP-hard) to come up with some heuristic methods. In particular, we often want to explore a “neighborhood” of an object and locate optimal candidates within this neighborhood after which the search can continue from the new local optima. In order to talk about neighborhoods we need to define a metric on the space of objects we are working with (we talked about metric spaces on phylogenetic trees in several previous posts, but in this case we are focusing solely on the tree topology and associated finite metric spaces). In the context of a search problem it is useful to think about the neighborhood of an object as a set of objects that can be obtained from the initial object via some fixed operation. In phylogenetics such operations on trees are typically called tree rearrangements (Felsenstein, 2004). In this post we will take a look at three classical tree rearrangement operations, but keep in mind that there are other ways to mutate trees which lead to different geometries on the tree space.

In the next three sections we will introduce in detail each of the rearrangement operations in the order of increasing neighborhood sizes considered and then briefly discuss the relationships between these three classes, as well as some implications for computational complexity of related problems.

Nearest neighbor interchange

Nearest neighbor interchange (NNI) operation on a tree consists of picking an internal edge (i.e. an edge that is not incident to a leaf) deleting it along with the four edges incident to the vertices of the original edge, after obtaining four components as the result of the deletion there are three distinct ways of reconnecting them back out of which one yields the original tree. It is easy to see that for an unrooted bifurcating tree on n taxa (i.e. a tree with n leaves) there is a total of n-3 internal edges, and hence a total of 2(n-3) NNI-neighbors. We also note that the operation in this classical sense is defined for bifurcating trees (hence the guarantee that the NNI yields for connected components after the edge removal step). The figure below provides an example of NNI and the resulting tree neighbors.

The original tree is shown in the top left corner and the marked edge is the internal edge along which we perform the NNI. The four connected components labeled A, B, C, and D are shown in the top right corner. The resulting two new trees are shown at the bottom. For each of the three trees we also include the split corresponding to the marked edge.

For a small value of n we can reasonably visualize the adjacency structure on the tree space induced by the NNI operation. Namely we can identify non-isomorphic (read: distinct from the point of view of the properties we care about) labeled tree topologies as vertices and let two vertices be connected by an edge iff we can obtain one topology from another via a single NNI.

An example of adjacency graph on the space of 5 taxa unrooted bifurcating trees under the NNI operation. Original figure and caption are taken from Felsenstein, 2004, p. 40.

In general a phylogenetic tree encountered (or I guess inferred) in the wild might not be bifurcating, so it is quite natural to ask how does one generalize the notion of NNI to an arbitrary labeled unrooted tree. At the core of this operation is the process of picking a distinguished edge and then swapping components resulting from an edge deletion operation. Thus, in the case of a multifurcating tree we simply can do the same operation but with a larger set of neighbors being generated. An example is provided in the figure below.

A few possible neighbors of the original multifurcating tree (top left) under the NNI on trees with varying internal node degrees. Note that in the case where we allow multifurcations we ought to be careful in the way we define the NNI operation as some interchanges can be problematic (consider what happens if we reconnect 5 components into a star graph).

There is some amount of work on the NNI operations and NNI induced metric on phylogenetic trees (Hon and Lam, 1999), although the area tends to feel relatively sparse in terms of the research done.

Subtree prune and regraft

Subtree prune and regraft (SPR) operation on a tree consists of snipping off a branch (which can be an internal or external edge) and then inserting the snipped off subtree onto one of the edges of the remaining tree. This operation is quite well illustrated by its name, especially if you ever had to deal with grafts on actual trees (my grandfather loved to experiment with fruit tree grafting, which to his credit managed to give us a wonderful apricot half-tree on a wild apricot that used to grow in the garden). The figure below is an example of a SPR operation performed on a tree.

A subtree is snipped off from the original tree along the edge marked in red (top left) and then re-attached to the remaining tree on an edge marked in orange. It can be helpful to think of this operation as temporarily turning one of the subtrees into a rooted tree and then reinserting its root into the remaining tree.

It is easy to see that SPR operation results in a larger neighborhood set than the NNI operation. It also useful to note that any tree T’ that can be obtained from a tree T via a single NNI can also be obtained via a single SPR. In other words, if we define the set NNI(T) of trees that can be obtained from T via a single NNI, and the set SPR(T) of trees that can be obtained from T via a single SPR, then the following inclusion holds: NNI(T)SPR(T) (Maddison, 1991). In the spirit of ancient geometers instead of a formalized proof, we will provide the following figure and leave it up to the reader to convince themselves that the inclusion above is indeed true.

Similarly to the NNI case we can compute the exact number of neighbors under the SPR operation for an unrooted bifurcating binary tree on n taxa. The size of this neighborhood similarly to the NNI case is independent of the topology of the tree T and is equal to 2(n-3)(2n-7) with a detailed counting argument provided by Allen and Steel, 2001 on p. 4.

We also note that the way in which we graft a subtree back in will always create an internal node of degree 3, so without a modification to this operation the question of generalizing it to multifurcating trees can be ill-posed.

Tree bisection and reconnection

Tree bisection and reconnection (TBR) operation is the most expansive operation among the three we are considering in this post, in the sense that it yields the largest neighborhoods. TBR consists of splitting the original tree T into two subtrees T’ and T” along a branch and then reconnecting them by joining any two branches of T’ and T” via an edge (in case if one of the subtrees is just a leaf the leaf is simply joined back to the tree via a new branch; in all cases vertices might have to be added to maintain a proper bifurcating tree). The figure below illustrates a TBR operation being performed.

The original tree is shown in the upper left corner. The edges to be joined are marked in orange (bottom), and the new joining edge is marked in red (bottom right). Note, that the tree obtained after the specified TBR operation is isomorphic to the tree obtained via SPR with regrafting onto the orange edge of the bottom left subtree.

The inclusion we observed for the NNI and SPR neighborhoods extends further and the following holds: NNI(T)SPR(T)TBR(T) (Maddison, 1991). Furthermore, by noting that any TBR can be thought of as a two step process where we first re-attach T” onto the specified edge of T’ and then reposition the attached T’ in order to get correct edge match, we can show that any TBR operation can be replicated by at most two successive SPR operations.

Finally, we note that unlike NNI and SPR, the TBR neighborhood size does depend on the topology of the tree T. However, and upper bound on the size of the neighborhood can be computed and is given by (2n-3)(n-3)2 (Allen and Steel, 2001, p.5).

Induced metrics and computational (in)tractability

We already noted in the introduction that the tree rearrangement operations induce corresponding metrics on the space of phylogenetic trees (more precisely in our current case: the space of unrooted bifurcating tree topologies). We will denote these metrics dNNI(T, T’), dSPR(T, T’), and dTBR(T, T’), respectively. From the inclusion relation on the neighborhoods it immediately follows that for any two unrooted bifurcating trees we have the following inequality: dTBR(T, T’)dSPR(T, T’)dNNI(T, T’). Furthermore, from our observation in the previous section we also can conclude that dSPR(T, T’)dTBR(T, T’).

Although we have provided an explicit example of the adjacency graph under NNI metric for the space of unrooted bifurcating trees on 5 taxa, it is not immediately obvious that in the general case the NNI adjacency graph will be connected (i.e. we are not a priori guaranteed to have a connected metric space). However, it turns out that indeed the adjacency graph under the NNI metric is indeed connected, and hence the metric space induced by dNNI(T, T’) is connected. It is easy to see that as a corollary we also get connectedness for the SPR and TBR spaces.

Since the metric spaces are connected and finite we can ask what are their respective diameters (i.e. the smallest distance between two points furthest apart in the space; this is the same definition as the graph diameter). An interesting non-trivial bound on the diameter of the NNI space was derived by Li et al., 1996: authors show that the diameter of the NNI adjacency graph (which we will denote as ΔNNI) is bounded below and above by nlogn terms. More precisely, the following inequality holds: \frac{n}{4}\log_2 n - o(n\log n)\leq \Delta_{NNI}\leq n\log_2 n + O(n). Allen and Steel, 2001 extend this result to the SPR and TBR cases giving the following inequalities: (a) n/2-o(n)ΔSPRn-3; (b) n/4-o(n)ΔTBRn-3.

However, knowing the diameter of a space does not imply that we can easily compute the distances within it (i.e. shortest paths via NNI, SPR or TBR operations from one tree to another). In fact, the problems of computing NNI, SPR or TBR distance between two arbitrary unrooted bifurcating trees are all NP-hard (NNI: DasGupta et al., 2000; SPR: Bordewich and Semple, 2005, Hickey et al., 2008; TBR: Hein et al., 1996). Allen and Steel, 2001 show that the SPR distance is fixed-parameter tractable, but in general the fixed-parameter tractability results in algorithms that can work efficiently only on small trees or pairs of trees which are relatively close to each other (for a more principled discussion see Whidden and Matsen, 2018). In general, it appears to be a rare case for a metric that is biologically interpretable to be efficiently computable, and vice-a-versa (take for example the Robinson-Foulds distance which is efficiently computable, but not biologically well grounded). There are some recent attempts at developing variants of tree rearrangement operations that have both biological interpretability and are computationally tractable (Collienne and Gavryushkin, 2021).

While we could possibly discuss other tree rearrangement operations or explore the discrete geometries arising from the ones we mentioned in more detail, such work would likely require multiple posts. Thus, we will wrap up here and invite the reader to continue the exploration via following the links provided in the references section.

Data and code availability

All illustrations with the exception of the one from Felsenstein, 2004 were made with draw.io. No additional materials or code were used in this post.

What’s next

I am still deliberating on the exact direction to take for the next stretch of posts, so in the meantime I am intending to continue the one-off posts where I pick a random, but interesting to me, topic in mathematical phylogeny and try to write a coherent text about it. As per usual, if you are interested in me writing about a specific topic do not hesitate to reach out.

References

Allen, Benjamin L., and Mike Steel. “Subtree transfer operations and their induced metrics on evolutionary trees.” Annals of combinatorics 5, no. 1 (2001): 1-15. https://doi.org/10.1007/s00026-001-8006-8

Bordewich, Magnus, and Charles Semple. “On the computational complexity of the rooted subtree prune and regraft distance.” Annals of combinatorics 8, no. 4 (2005): 409-423. https://doi.org/10.1007/s00026-004-0229-z

Collienne, Lena, and Alex Gavryushkin. “Computing nearest neighbour interchange distances between ranked phylogenetic trees.” Journal of Mathematical Biology 82, no. 1 (2021): 1-19. https://doi.org/10.1007/s00285-021-01567-5

Felsenstein, Joseph. Inferring phylogenies. Vol. 2. Sunderland, MA: Sinauer associates, 2004.

Gupta, Bhaskar Das, Xin He, Tao Jiang, Ming Li, and John Tromp. “On computing the nearest neighbor interchange distance.” In Discrete Mathematical Problems with Medical Applications: DIMACS Workshop Discrete Mathematical Problems with Medical Applications, December 8-10, 1999, DIMACS Center, vol. 55, p. 125. American Mathematical Soc., 2000. PDF on author’s page

Hein, Jotun, Tao Jiang, Lusheng Wang, and Kaizhong Zhang. “On the complexity of comparing evolutionary trees.” Discrete Applied Mathematics 71, no. 1-3 (1996): 153-169. https://doi.org/10.1016/S0166-218X(96)00062-5

Hickey, Glenn, Frank Dehne, Andrew Rau-Chaplin, and Christian Blouin. “SPR distance computation for unrooted trees.” Evolutionary Bioinformatics 4 (2008): EBO-S419. https://doi.org/10.4137/EBO.S419

Hon, Wing-Kai, and Tak-Wah Lam. “Approximating the nearest neighbor interchange distance for evolutionary trees with non-uniform degrees.” In International Computing and Combinatorics Conference, pp. 61-70. Springer, Berlin, Heidelberg, 1999. https://doi.org/10.1007/3-540-48686-0_6

Li, Ming, John Tromp, and Louxin Zhang. “On the nearest neighbour interchange distance between evolutionary trees.” Journal of Theoretical Biology 182, no. 4 (1996): 463-467. https://doi.org/10.1006/jtbi.1996.0188

Maddison, David R. “The discovery and importance of multiple islands of most-parsimonious trees.” Systematic Biology 40, no. 3 (1991): 315-328. https://doi.org/10.1093/sysbio/40.3.315

Whidden, Chris, and Frederick A. Matsen. “Calculating the unrooted subtree prune-and-regraft distance.” IEEE/ACM transactions on computational biology and bioinformatics 16, no. 3 (2018): 898-911. https://doi.org/10.1109/TCBB.2018.2802911