I went to Chicago

Usually I tend to write long winded posts with many side topics, but today is not the day. The circumstances of our current days are not the happy ones, so I wish to share something that is dear to my heart (and recent to my memory).

I visited Chicago (again) from March 11th to 20th in the year of 2022. My wife was attending APS March meeting and I was just tagging along for a fun visit. The weather was more pleasant than I usually recall from March in Chicago (thanks no thanks global warming). It was a fun trip, less so on the sightsee side (due to the busy schedule), and much more so on the social side, since I finally reconnected in person with my friends.

The feeling of being back in the city was amazing. The CTA, albeit being slower due to worker shortages, was lovely. Food and bar scene was great as always. I was sad about the closure of Lost Lake, and even more so about my favorite mezcal bar Todos Santos going out. However, Three Dots and a Dash was still a shining spot with all the usual buzz, and even though I didn’t end grabbing a spot at Havana Grill I am glad it’s still around. I finally made my way into the Aviary (with many thanks to Carter Grieve and Hana for actually pushing though on this), and it was a lovely experience.

I didn’t catch any museums this trip due to my stupid stomach acting up for a few days in a row, but I still caught a few murals and architectural pieces in my free time.

All in all, I am just happy to share a few photos, that capture some of the highlights of the trip, and I hope I’ll be back to more regular blogging some time in the near future.

One day I had a bunch of free time to take a couple hour walk from the West Loop area to the Downtown and then to Adler Planetarium. It was a beautiful day, so I snapped this little panorama of the Chicago
skyline on the way.
Salonica diner on 57th Street. The place with associated with so many memories and good times. Two goofy fellas accompanying me for most of undergraduate times are attached. Steak and eggs as per usual are my go to choice, and it never dissapoints.
Huge thanks to Carter Grieve for hosting us for a few nights and providing a wonderful short ribs dinner. After spending a few evenings at his place I am convinced to start making my own clear ice for the cocktails. Also the gears of kitchen optimization have started turning in my head. Thanks for the lovely ambiance and wholesome hangout time.
Catching a dinner at Las Tablas. If you have some extra time in Chicago, I highly recommend stopping by this place. The best skirt steak I ever had is served here, and combining it with some seafood or chicken turns the whole experience into a protein loaded fiesta.
Diego pictured in the anticipation of the above mentioned steak and seafood combination. By the way, do not skip on the Cafe Colombiano option either, it’ll stimulate your appetite and give you enough energy to work through the big portions served here.
Good old Ryerson Laboratory on University of Chicago campus never fails to greet me with the same quote.
Last but not least, hanging out by the mural right outside of the Honeybear Cafe in Rogers Park, Chicago. Many thanks to Carter for introducing us to the place, and to Hana for snapping this wonderful picture.

One day I might have something intelligent to say about the world and what we are doing in it, for now however I will stick to being happy about having wonderful friends and sharing great memories together.

2021: A year in review

Good (local time option) world! I am back with another year in review. This time we are back to thematic groupings, since time ceased existing (not really but something like that). The intro won’t be long because either (a) you also lived through 2021 and are probably quite tired OR (b) you are doing some sort of history research; in both cases let’s not waste too much time.

Travel

Two major trips this year, one back home to Chisinau, Moldova, and one to Paris, France. Let’s start with Chisinau.

I went back to Chisinau around May 20th, which meant that I got to spend my birthday back home for the first time since 2014. Now, if you know me, you will realize that this fact actually meant nothing in particular, since I am not the biggest fan of my birthday. However, having home cooked meal that I didn’t even have to cook, I’ll take that as a solid option. Other than that my visit was quite routine: some paperwork here and there, some bars (mostly in the city center), and a missed opportunity to see Morgenshtern live. On the brighter side I got a personal tour of some artworks by @e.art.n and got to hang out with two adorable beagles for a bit.

Paris was amazing. That’s it. I had a good time being a tourist, as expected the food was great and weather cloudy. Versailles was quite interesting, gave me flashbacks to Hermitage. Louvre was also quite cool, and just as expected Mona Lisa is way too overhyped. Musee d’Orsay was sadly somewhat disappointing due to subpar room organization and endless flocks of people. Musee de l’Orangerie was the highlight of the trip. However, not due to the water lilies, but due to a temporary exhibit of Chaïm Soutine’s works. Say what you want about my taste in art, but now I have a nice bœuf magnet on my fridge (pic of the original below).

You thought I was done with the trips? Nope. Minor trip time: Denver, Colorado and Yosemite, Sacramento and Berkeley, California. Both Rocky Mountain and Yosemite National Parks were great. Outdoors with beautiful views, long (apparently they were labeled as strenuous) hikes, and nice animal spotting episodes (skunk, pika, marmot?, elk, and a black bear cub). Sacramento and Berkeley allowed me to reconnect with some of my friends who I haven’t seen in a very long time. It was nice to see familiar faces accompanied by the reminiscence of days past.

Books/Shows/Films/Audio

This year I have read the following books:

  • Hilbert by Constance Reid
  • The end of everything by Katie Mack
  • Flash boys by Michael Lewis
  • You look like a thing and I love you by Janelle Shane
  • What is life? by Paul Nurse
  • The statue within by François Jacob

Sadly all of the above are non-fiction books, since my two attempts (The servant by Fatima Sharafeddine, and Slaughterhouse-five by Kurt Vonnegut) at reading some fiction failed at various points in the corresponding books. I am planning to actually read both, but that is a job for 2022.

Overall my reading pace and habits were sporadic. I read What is life? in a single evening (the book is short), and I binged The statue within for a bit shy of 11 hours on the flight back from Paris. At the same time after my return from Moldova (where I finished The end of everything) I had a two month hiatus, which got interrupted by reading spree of August, which then gave way to desolation of September-October.

All six books make it into my recommendations list, but if I had to only pick one it would be What is life?. The book is brief, crisp and extremely inspiring. Ideas explained in the book hit a great balance between simplicity and profoundness. Finally, I guess since I am myself somewhat involved in biology, I think this book gives a great in on the modern view of life.

Besides books I have discovered a wonderful world of Thelonious Monk‘s music, and got extra excited about long awaited new album from Oxxxymiron. In the realm of TV shows I picked up and binged Expanse after returning from France, and earlier in the year watched Dopesick and Arcane. Also for the first time since the beginning of the pandemic I went to the cinema to see Dune and it was amazing.

Work/Research/Classes

It’s been a busy year for work. For a quick summary you can check my Google Scholar profile, but the main focus this year has been on the wastewater monitoring. Long story short: SARS-CoV-2 sheds into human waste, so by screening wastewater we can get insights into what’s happening in a region (e.g. which variants of concern circulate around) [1, 2, 3, 4, 5].

I’ve felt that my overall rate of paper reading went down in 2021. My suspicion is that I suffered from what I’ll call “COVID research fatigue”. High pace of anything that has to do with SARS-CoV-2 research meant that every week I got an extra set of 5-10 open tabs added to my never ending life of 4 browser windows. Thus, reading non-SARS-CoV-2 papers was much more exciting, but due to the energy expenditure and limited battery in my brain it also was less frequent. Overall I think I’ve been keeping up with most of the stuff I was interested in, but I wish I had those extra 4 hours every day just for reading.

Classes wise this year has been aggressively meh, with the sole exception of Information Theory (ELEC 535 @ Rice) course by Ashutosh Sabharwal. It was a delightful course, which while only covering the basics still managed to inspire me, and as a result led to me going doing down a few lengthy rabbit holes of reading (maybe one day I’ll write more on that).

Finally, I spent 6 weeks this summer working as a Research Mentor at the Summer STEM Institute (fully online). I have mixed feelings about the experience, since on one hand I learned a lot about trying to lead multiple independent research projects, but on the other hand the end results could have been more impressive. Overall, I take this experience as an exercise in management, which pointed out to me some weak spots I plan to address in the future.

Social

The social life in 2021 was full of ups and downs, as we were navigating the world after the vaccination, but with re-emerging threats posed first by Delta and then by the Omicron variants of SARS-CoV-2. One constant presence in my social life was the D&D campaign in the (heavily customized) world of Ixalan led by the long standing DM of our group Diego Bejarano. Ability to escape the world for 4 hours every week and explore dense jungles, tall mountains, and vast oceans (and rivers), while RPing as a devil from Hell turned into a giant pug (named Pugmeister) or Jinx from League/Arcane is definitely something that helped me stay sane otherwise.

Coda

Overall it’s been an incredibly busy year that had some good and some bad in it. I don’t think I managed to write up a particularly exhaustive review, and it definitely ended up being more biased towards the end of the year. I guess my memory does some sort of exponentially moving weighted average over the experiences, so that’s what I end up with without a weekly journal.

As is customary, I will try to write more on this blog, and most likely I won’t. See you in a year or so!

2020: A different kind of year in review

Good evening, world. First of all, this is not exactly a review of the year 2020, but rather a review of a specific period roughly starting around mid-March 2020 and lasting till today, mid-March 2021. The “lasting till” part is not particularly meaningful, since as of today the future is not much more certain than it was a year ago, but it rather demarcates an amount of time I am willing to write about and feel the need to write about. As per usual I will try to break this down into sections for easier navigation, although this time the sections are likely to be chronologically rather than thematically ordered.

March-April 2020

Around beginning of March I was self-isolating in my apartment, so by the time WHO declared COVID-19 a pandemic and Rice University has suspended the classes (to be followed immediately by the Spring break) I was already settling into the realities of the work from home routine. I already have been working on SARS-CoV-2 sequence analyses for about 2.5 months by then, and was steadily getting deeper into the genome and variant analyses and related topics. By mid-April we have been working in collaboration with Baylor College of Medicine and Weill Cornell Medical College on the study of intrahost genomic variation in SARS-CoV-2.

Outside of the immediate research work this period was also full of the challenges of the schoolwork adjustment to the virtual environment. On my end most of these transformations were not particularly impactful, since in the spirit of good old college days I simply stopped attending most of the lectures opting for a 2x rewinds of recordings at convenient times. However, for the seminar-style classes the impact was noticeable. Giving a presentation into the void of a Zoom call felt eerie and rather disappointing, a feeling that hasn’t really left me ever since, but gained a new set of characteristics as the Zoom fatigue became more of a well-known phenomenon.

May-June 2020

By early to mid-May classes were mostly wrapped up, and to the credit of my instructors for this semester, I did feel that I managed to learn something new and exciting, even if a good chunk of the process had to be scrambled and re-built in less than two weeks. On the flip side, the Zoom call festival was just picking up. This time period was likely the busiest so far for me with respect to the number of Zoom calls I have been present on. In retrospect I can’t help but feel that in some way the work from home routine and initial panic about “social distancing” (which in fact really is more of a physical distancing) made a lot of people crave the meetings a bit more than they should. Nevertheless, a lot of the meetings I had attended in this period were instrumental for my work, and helped greatly in shaping the initial draft of the paper that we ended up submitting towards the end of June.

This two month stretch was also the most work intense part of the whole year and arguably that was in big part made possible by the work from home imperative. Without spending time on grocery shopping (we were rather panicked at that time and relied on Amazon Fresh…), bar hopping, museums, or pretty much any activities outside the apartment (except periodically joining for walking the dog), I could funnel all of my waking time into research, interspersing it with cooking and home-brewing to main an ounce of sanity. Needless to say, that was also only possible since the exhaustion from the said routine haven’t kicked me hard yet, and I could indeed focus for a solid 8 to 8 on a typical day. Well, you probably know what’s coming next.

July-August 2020: The swamp

First, I hate Houston summers! Second, I hate Chicago summers. Third, I am sure that I just genuinely do not enjoy summers, in all ways shapes and forms, since graduating high school. “The swamp” comes, of course, from the great humid and hot weathers of the bayou that I am living upon nowadays, but also from the general feeling I had throughout that period of time. The April-June push took it’s toll on me, and I was not in the best working mood for the rest of the summer. I mostly concerned myself with another collaboration, this time looking into the Influenza A virus. However, my overall progress was slow, and on certain weeks I could barely squeeze out 8-12 hours of solid work for the entire week.

Additionally, dropping out from a work only mindset made me observe the realities of the world a bit more clearly and with COVID-19 cases spiking in Texas and globally, racial injustices being brought up front and forward in the USA, and the relatively shaky situation with the visa policies those realities were not pretty. For people who might have had more energy at that point in time it might have been a time of action and/or reflection, for me it was mostly a time of sadness, discomfort and a general overwhelming feeling of “the swamp”.

September-October 2020

With summer being over the air in Houston became more breathable, but my exhaustion was still going strong. Beginning of classes helped alleviate that to some extent, and at some point almost re-energized me to the April levels of productivity. That period was relatively short. The month of September went by relatively uneventful for me, although I had to take care of a few things in the preparation for October.

October was hectic. My parents were visiting me from Moldova (a trip they booked way before the pandemic, and re-scheduling which would be an even bigger mess), so I had to figure out safe housing options and some form of an itinerary for their visit. On top of that, I was going to give a talk at a Ken Kennedy Institute Data Science conference mid-October, and that required some polishing and prepping. At the end of the day, both went smoothly, my parents were lucky to visit roughly in between two major COVID-19 waves in Texas, they took proper precautions, and most of their visit consisted of a road trip and hiking in the Grand Canyon/Sedona area (which I could not join, due to the aforementioned talk). The talk went by fine, as by that time I got accustomed to the practice of speaking into the Zoom void.

Towards the end of this period I felt more accustomed to the new life. Not the “dive into work” accustomed as in the beginning but rather routine accustomed. I had my balance of work and off-days, meetings I wanted or needed to attend and the ones that could be skipped, and some workable sense of interaction with the world through mostly digital means.

November 2020

Most of the November was consumed with responding to reviewers’s feedback on the SARS-CoV-2 paper, tying loose ends on other projects, and getting into the groove of wrapping up schoolwork. Importantly, I also finally got back to reading (no, scientific papers don’t count, even if I do enjoy a solid 60-75% of them). I finished my way through the I want to be a mathematician: An automathography by Paul Halmos, which was a nice and witty book and to a degree helped me in the process of recovering some of the energy and excitement about work. I also finally completed the reading of A Random Walk Down Wall Street, which while not extremely underwhelming was somewhat of a medium tier prose in my opinion. Also inspired by my parents adventures in the parks of Arizona I started planning a short retreat of my own into the wilderness.

December 2020

Classes were now officially done for the semester, revised manuscript has been submitted, and I was ready to take a week off. We have travelled West and South to a small town of Terlingua, TX, where we found a nice clean little motel. The two main attractions (both at about 20-40 minute drive from our base) were the Big Bend National and State parks. Both spots were amazing, with nice mountain hikes, desert views and scenic drives along Rio Grande it was a good way to let my eyes rest for a bit. The location was not busy with people making hiking in these places a much more insular and reflective experience (not to mention that it was also nice to travel without too much worry about COVID circumstances).

The rest of December went by fast and wasn’t particularly eventful.

2021

At the start of the new year I was getting back into the groove of things trying to spend a bit more time walking outside in parks and along the bayou. Work got into a steady and solid flow and with a new semester picking up, I got a few more things to work on, which brought a bit of fresh air into the workflow. I also continued with reading more, just finishing Hilbert by Constance Reid and setting my eye on a few fiction books, since I have been missing out on reading fiction for a while. I also had a great opportunity to give a talk at UChicago Center for Translational Data Science discussing the SARS-CoV-2 work we did over the past year. In general, even though in the global sense life is still far from stable, I am feeling like I have managed to find some local stability, which makes me somewhat optimistic for the next few months to come.

Other notes and miscellanea from past year

I heavily ignored personal projects throughout this period. Well, at the very least the ones I’d typically write about here. I did spend a lot of time cooking, had a long foray into home brewing mead, and dabbled in some music for a bit at different points in time. The closest thing to come to the personal projects scale was getting comfortable with ggplot2 and RShiny, but at the current moment neither really got beyond the level of purely professional engagement. I think that a rather sizable increase in screen time drove me away form my standard set of personal projects, since I got reasonably tired of spending time in front of my laptop with the regular work and study activities.

Talks in the Zoom age: Speak into the void and let the void speak back

I don’t think that this note warrants its own post, so I will tuck it in at the end of this review for those interested in my opinion on the Zoom talks.

At the start I felt discouraged by the Zoom talks. First, they completely remove the energy feedback loop between presenter and the audience, hence the “speaking into the void” claims. Not to mention, that the same problem of speaking into a device, also makes it somewhat physically uncomfortable for the presenter, since you need to maintain relatively same posture, limit hand gesturing and stare into the tiny dot of a webcam for the “engaging with the audience” feeling (to be precise, the feeling is not for you, it’s for the people watching the talk). Second, Zoom fatigue is very real, and concentrating on a speaking screen can be hard for some people (including me). Hence, “the void” is really often empty, with just a few people tracking with the presentation for limited blocks of time. In the meantime I developed some techniques that help me stay more engaged, including chat discussions, sidebar banter with friends attending the same talk, and paper and pen note taking. All of these help to a degree, but nevertheless the problem of fatigue still stands. Third, just as with the regular talks, when “the void” speaks back the questions are often disconnected form the talk, and without the clear prospect of the “taking it offline” it feels that some interesting in-depth discussion never end up happening.

However, with time and practice comes some form of acceptance. I ended up finding the positives in Zoom talks as I attended and given more of them. First, it really helps with boosting my confidence. People not asking questions or being disengaged doesn’t burn the same way when their cameras are off as they fall asleep or switch to watching TikToks. Second, for lectures (on the receiving end) Zoom or YouTube is great. You can play the recording at the speed you want, rewind, and replay the fragments as you need. Interactiveness takes a hit, but as long as there is some venue for asking more questions it is really convenient. Third, Zoom really opens up the possibility of attending talks globally. I’ve been to Berkeley, Chicago and a couple other places to attend the talks without the cumbersome process of getting onto a plane.

At the end of the day, I hope that some changes that we see Zoom bring into the world of lectures and talks are here to stay, since not all change is bad. Of course, I also hope that at some point later in the year, I will be able to grab a beer after a talk to discuss the finer points with the presenter (not over Zoom).

Week(ly quick)ies: Random walks

Introduction

Since my blog has been experiencing a shortage of content (mostly due to the fact that I am busy with my current research work), I decided to try out a new format that will hopefully motivate me to post more often. “Weekies” i.e. “weekly quickies” are going to be weekly posts of small puzzles that I come up with or come across on the web. I plan to follow up each “weeky” with a longer write-up of solution and discussion roughly within a month or so of the original post. In the meantime do not hesitate to post solutions and discuss the puzzle in the comments under the original post.

Random walks

Consider the K4 graph/tetrahedron depicted below. The following questions concern different random walks starting at the vertex A.

Q1. Assume that at any time step we will move to a random neighboring vertex with the equal probability of 1/3. What’s the probability that we will visit all vertices of the graph before returning to A?

Q2. Given the same walk as in question 1, what’s the probability that we will visit any one vertex out of {B, C, D} at least n times before returning to A?

Q3. Now, let’s change our walk to have a 1/2 probability of remaining at the current vertex and probabilities of 1/6 to move to each of the neighboring vertices. How do answers to questions 1 and 2 change for this walk?

Q4. With the walk described in question 3, what is the expected number of steps we need to take to visit all vertices at least once?

Q5. Let us record the sequence of visited vertices as a string of characters A, B, C, D. Consider a random walk in question 3 and record its first 10 vertices. What’s the probability that the string “ABCD” occurs as a subsequence in the recorded sequence?

Q6. We can generalize the walk described in question 3 using the parameter q ∈ [0, 1] by letting the probability of staying at the current vertex be q and probability of going to any given neighbor be (1-q)/3. Additionally, let the start vertex be picked uniformly at random. Let n be the length of the sequence of vertices that we record, analogously to question 5. Let Pq,n be the distribution on the Ω={A, B, C, D}n generated by the described process. Let Un be the uniform probability distribution on the Ω. Let D be the total variation distance function and let us denote d(q,n) = D(Pq,n, Un). Investigate behavior of the function d : [0, 1] ⨉ ℕ → [0, 1].

Monte Carlo your way out of a puzzle: Why events of measure 0 matter.

I was sitting in my office, a couple of weeks ago, minding my regular bioinformatics flavored business with a cup of oolong tea, when Senthil stopped by and told me that he has a puzzle he’d like some help with. Now, both me and Senthil enjoy math and we especially enjoy small nifty tricky problems. Thus, the puzzle he shared was the following.

Consider a unit grid on a plane. Pick a non-negative real number L. Now, we can place a random segment of length L in the plane by picking a point and an angle uniform at random and drawing a segment of length L originating at the chosen point and oriented based on the chosen angle. Which value of L maximizes the probability that this segment will intersect exactly one line of the grid?

We spent about 5 minutes chatting about different placements of the segment, probabilities, and our lack of desire to compute a nasty triple integrals. We had to wrap up quickly since, I was short on time and had to finish up some other tasks.

Last Thursday I was co-organizing our department’s happy hour, and Senthil reminded me of the outstanding problem. I made a pinky promise that I’ll help on Friday, which brings us to the story I am about to tell you.

Do not integrate, just simulate!

Since, I was not sold on the idea of computing integrals first thing on a Friday morning, I decided to quickly write up a simulation that would do exactly what the problem statement described. After about 20 minutes of typing away, I had my tiny C program up and running and spitting out probabilities with 6 sweet significant digits. Senthil came by quite soon, and within 40 minutes of back and forth of computer assisted linear search we were quite certain our answer for L was 1. Yay, problem solved, L = 1, probability is approximately 0.63774, and we can all go drink coffee and do whatever Computer Science PhD students do on a Friday. Well, no.

Of course we were not happy, I meant what is 0.63774? Have you heard of this number before? Does it have any meaning? Is this the number of goats you must sacrifice to Belial before you learn the great mysteries of the world? We took to OEIS and found (among other amazing things) that 0, 6, 3, 7, 7 is a subsequence in the decimal expansion of Mill’s constant, assuming the Riemann hypothesis is true. Other than that the number didn’t quite give us an immediate answer. We also checked the magic of WolframAlpha, which gave us some neat ideas, like (3 π)/(2 e^2). However, as you can guess reverse engineering this fraction was 1) stupid; 2) hard; and finally 3) something we didn’t do.

In the meantime, we decided to put our theory hats back on and figure out what is happening, and how we can get an analytical expression of this probability. Senthil recalled that in fact there is a very similar well-known question, called Buffon’s needle problem, which has an amazingly clean solution. However, we could not see how one can easily translate the setting of that problem to ours. After some amount of drawing, complaining about integrals and debating which events are measure zero, we decided to take a coffee break and walk outside for a bit. This is when the major theoretical breakthrough happened.

Senthil noticed that instead of thinking about the grid as a whole, we can consider two sets of parallel lines independently. The advantage of this view was clear, now we have the Buffon’s needle problem embedded into ours directly! We multiplied some probabilities, got wrong numbers and complained about futility of theoretical efforts. Then we of course reimagined how stuff should be multiplied and counted things again, and again, and again. Are the events independent? Do we count this in or subtract it out? Do we go back to the integral formula?

Clearly the two intersection events are not independent, since the orientation (based on angle) that works well for horizontal intersection, is not favorable for the vertical one. Thus we can’t just multiply stuff out and subtract from one. Thus, we went back to the integral. This is where things got tangled up. We recomputed the triple integral a few times, but something was not tying out. We computed integral of a product of integrals and something still didn’t tie out. We have been running in circles, seeing numbers like 0.202 and 0.464 all over the place. Finally, the original answer to Buffon’s needle, the 2/π, was temptingly close to our own, since it was 0.636619. Finally, around 4pm we threw in the towel and parted ways.

Simulation clearly saved the day, and even if we were not sure of the theory behind this number, we knew the answer with high confidence, and that was already a win. But…

Back to the drawing and integration

I took out a piece of paper, folded it in four, and started sketching. There is a nice video out there that explains how to derive solution to the Buffon’s needle problem. Thus, I just went through the steps, drawing out the circle, and thinking what are the exact conditions we need to satisfy to get our segment intersect exactly one grid line.

The shaded region above represents the range of admissible positions for the needle (black arrow) when the angle formed by needle and x-axis does not exceed 90 degrees. Note that we only consider intersection with the line A_0A_1, since the intersection with the line B_0B_1 is identical by symmetry.

Now, considering the angle of the needle to x-axis to be \theta\leq\frac{\pi}{2} and distances to horizontal and vertical lines to be y\phantom{\frac{1}{1}} and x\phantom{\frac{1}{1}} respectively, we can define the conditions for the admissible position as 0\leq\theta\leq\frac{\pi}{2},\,y\leq\frac{L}{2}\sin\theta,\,x\geq\frac{L}{2}\cos\theta. Thus, we can write down our desired integral, and evaluate it as \displaystyle\int_0^{\frac{\pi}{2}}\displaystyle\int_0^{\frac{L}{2}\sin\theta}\displaystyle\int_{\frac{L}{2}\cos\theta}^{\frac{1}{2}}dx\,dy\,d\theta=-\frac{1}{8}(L-2)L. Now, dividing through by the total area and multiplying by 2, due to symmetry for the intersection to the second line, we arrive at the coveted \frac{2}{\pi}\approx 0.636619. Case can be rested, we now have a neat theoretical solution to the question, and a beautiful mathematical expression as an answer. However, I am still unhappy, because I cannot ascribe the error we see in the simulation to pure numerical precision at this scale.

The devil is in the details: wording and measure zero events

One of the first things you learn in a probability theory class is that events of measure zero don’t matter. Picking a specific point on a line, or even picking a rational point on a line are measure zero events, hence they don’t really matter when you calculate probabilities. Nevertheless simulations are not only countable, they are finite! Which means accidentally dropping in an event of measure zero that shouldn’t be there can trip you up. This is exactly what happened in our case. We want to count the number of times a random segment intersects exactly one line of the grid. Now, I assumed this condition meant instead: segment intersects the grid at exactly one point! The difference shines for the points that are intersections of the two sets of parallel lines forming the grid, because they are a single grid point, but a segment crossing this point in fact intersects not one, but two lines!

I have adjusted my counting scheme to correctly reflect this edge case as intersecting two lines, and therefore, not being a favorable outcome. Guess what happened next… Exactly, the sweet shiny 0.6366313 popped out of my simulation as the predicted probability of the event. Of course, this number is closer to \frac{2}{\pi} than our previous candidate, and now all the doubts are gone, the theory and simulation agree!

Discussion and lessons learned

First of all, if my simulation runs in about 2 minutes (going through 500,000,000 random segments), why not run a longer simulation to counter the effect of nasty accidental event? Well, numbers have their limits on computers, and overflows happen. I tried running the simulation for 2,000,000,000 and 20,000,000,000 segments. The first run gave a slightly smaller number than the original 0.6377, but still not quite 0.6366. The second one, gave a whooping probability of -8.648, which even those not too familiar with measure theoretic probability would call out as absurd. Can I spend more time bugfixing this, and get a great simulator, that can scale to 1,000,000,000,000 segment trial? Yes. Is it reasonable to pour time into this? No!

Second, why not do the theory neatly from the start? At the end of the day the problem really takes about 30 minutes of careful drawing and setup and 3 minutes of WolframAlpha-ing the integral. Well, first this involves dissecting the problem theoretically which takes time. Second, writing up the simulator took 20 minutes, for contrast understanding how the Buffon’s needle problem is solved took about the same 15-20 minutes. The latter on its own didn’t give a solution, the former at least provided a good numerical estimate. Finally, recall how the problem is actually about finding the optimal length L? Well, that was quite a block to developing nice theoretical answer, once we were quite certain L is 1, we could actually focus on the theory that mattered.

Third, why not read the problem statement carefully and not jump to wild assumptions? Well, this one goes into lessons learned bucket, alongside the entire finiteness of simulations shenanigans. You would expect that years of problem solving would teach me that, and yet again, I make the same mistakes.

Overall I can say that this was an amazing exercise in basic probability, geometry and mocking up simulations to solve puzzle problems. I am happy to learn that writing a quick program can help in figuring out theory behind a question. Furthermore, I am also quite delighted to experience nuances of probability first hand, without putting important questions and results on the line.

Final notes and resources

Many thanks to Senthil Rajasekaran for bringing this problem to my attention and guiding me through the crucial steps on getting the theoretical answer in the scopes. Looking forward to more puzzles to solve!

My code for the simulator is available in this (zipped) file. Note that the random number generator was lifted from a course by Benoit Roux at UChicago, and the rest of the code is mine. The commented out portion of the intersect function is exactly what caused the annoying of by 0.0011 error.

Thanks for the attention, and happy bug catching y’all!

2019: A year in review

Travel

This year was quite eventful in terms of travel, both domestic and international. I started off with getting on a project in Connecticut, so starting the last week of January I was flying from Chicago to LaGuardia every Monday, and flying back out every Thursday. Of course this was a prime opportunity to rack up miles and hotel points, which came in handy later throughout the year. I also used my constant travel as an excuse to visit a few of my friends spread all across the US. I made stops in NYC, Philadelphia, Austin, and Boston. I also took a trip to New Orleans, which was a long standing bucket list item for me and my friend Cris. In between these fun trips, I also squeezed in a few more career related ones, including visiting Baltimore and Houston for graduate program weekends, and returning to Baltimore again for BPS’19.

The next major trip I undertook was going back home to Chisinau, Moldova for the entire month of July. It was a relaxing and fun time, and I mostly used it to unwind after a work intensive year. It also was an interim in my moving process from Chicago down to Houston. In the beginning of August, I flew back to Chicago, packed the last few suitcases (not really) and headed south.

I got to Houston mid-August, and the entire move-in process went extremely smooth thanks to my awesome roommate Robert, who helped organize common spaces in the apartment. I took another quick trip to Boston, to enjoy the last grill session of the (northern) season, and began my studies. I then also took two trips to Chicago, one for the autumn recess and one for Thanksgiving. Both trips were fun, and coming back to Chicago felt like coming home.

Finally at the end of this year I took a trip to Spain. It was an ambitious itinerary listing more than 5 cities and mostly organized and planned by Rachel. I pre-gamed the trip by spending a day in NYC, and then headed to Barcelona. From Barcelona we visited Girona and Figueres as day trips. Both trips were amazing, and I would totally recommend them to anyone spending some time in Barcelona. Then we took a train to Madrid. It was a packed schedule, and seeing Prado in one day is obviously a challenge, but we succeeded. The next stops were all in the south of Spain. Starting from Seville (with a day trip to Cordoba), onto Ronda and finally Granada, it was a route packed with great views and amazing foods. We rounded everything up by flying back to Barcelona and then in a day to NYC.

Thus, this year beings me into the great New York City! I will be heading back to Houston soon, but in the meantime, I am going to enjoy some bagels, pizza and public transport.

Work

I spent the first half of this year working on a project as a consultant with TruQua Enterprises, which involved quite a bit of travel and interactions with clients. It was an engaging opportunity to learn more about inner workings of a large business, and assist people by providing technological insights into their processes. I feel like I learned to be a better speaker and conversationalist, as well as, an attentive and active listener. Even if some days were slower or more heavy on the grunt work, I feel like I grew a lot through this experience.

In the second half of the year I have started my PhD in Computer Science at Rice University. My first semester was a combination of learning from courses and from peers in the lab. I took a great course on optimization taught by Tasos Kyrillidis. It was a mix of nice and fun math with solid practical motivation coming from the field of machine learning. I would absolutely recommend this course to anyone who has a chance of attending and is at all interested in anything mathematical or machine learning related. I also spent a lot of time working on a project in Treangen Lab, which was both stimulating and fruitful. I learned a lot of new material through this work, and strengthened some of my skills in data preparation and visualization.

Research

In the first half of this year I continued my work on the conformational transition pathways of insulin degrading enzyme and in early March I presented a poster on it at Biophysical Society Annual Meeting. It is an exciting project and it involved a lot of hours and effort both in learning and implementing simulations and analysis. I have continued this work by conducting more analysis, proposing a coarse grained model and starting out a set of swarm of trajectories simulations for determining the conformational transition dynamics of this protein. As the year progressed, I got busy with other projects and my current involvement with the project became minimal. I am currently finalizing some write ups and planning on handing over the project to the next students.

At Rice, I joined in on a project that involved graph theoretic analysis of metagenomic read data. It was my first research experience in genomics and metagenomics and I felt amazing about it. I was happy to contribute my knowledge of graph theory and general computer science to this exciting area. This project was developed in collaboration with Advait Balaji, a second year PhD student in Treangen lab. Throughout the entire process I had an amazing support from my mentors and our collaborators.

Overall, I’m happy to continue my work in the area of computational genomics and metagenomics. This is an exciting area, and I feel like I can both learn a lot and contribute significant work in the process.

As usual my research interests remain broad and lie at the intersection of mathematics, computer science and biology. I am looking forward to developing more work that can be directly applicable in both research and clinical environments, but of course that is a lengthy and complicated process.

Personal projects

This year not too many of my personal projects saw light, although I made a lot of progress on some. Overall, I feel that I carved out a few main directions for my personal work and I plan to strengthen and pursue those further in 2020.

First major initiative is exploration and adaptation of different data visualization techniques and tools. I have learned more about color pallettes and colorblind friendly design. I feel that this will make my subsequent projects more aesthetically pleasing and accessible. I also plan to round up a solid review of some basic data analysis pipelines and eventually release it as useful resource for my lab, as well as the community at large. Finally, I plan to finish up my “Where in the world” project, which was a foray into D3.js and geographic visualization.

Next, I am looking to reinforce my paper sorting and reading habits. I was meaning to organize my personal digital library for a while, but was falling short on convenient tools to do so. Therefore, I might eventually to settle either for a mix of practices and tools, or possibly write up some code myself to further streamline the process.

I haven’t worked much with Raspberry Pi or Arduino with the exception of Scav projects this year. While home automation appeals to me, I am not sure I’ll have enough time to invest into it next year, so I think most of these projects will remain on the shelf for a while.

Finally, this blog while not too active, will stay alive and will be an outlet for updates throughout the next year.

Thanks for reading, and see y’all in 2020!

Update on my whereabouts

Good evening world! Long time no see.

A lot of stuff has happened since I the last time I have updated this blog. I’ll go through most of it briefly while focusing on the things relevant for the blog content and my current plans for the upcoming few months.

I gave a super brief lecture on Fourier analysis to several first year students at the University of Chicago (non-math majors). The full write up for this lecture is long overdue, but I hope to finish it some time soon and post the full thing on here. As per usual, since I was trying to include several worked examples into the write up it got bloated very fast and as a result, I wasn’t able to finish it on time.

I have also developed a lot of new content for teaching basics of Python and programming in general. I plan to roll out those as a series of bi-monthly write-ups with Jupyter notebooks attached. The tentative idea is to post those on the first and third Sundays of the month.

I have finished up my job in consulting just a bit too shy of seeing the project go fully live. I had a wonderful time at TruQua Enterprises, and I am glad that my work made a meaningful contribution to the cause. I am not sure whether I’ll end writing up a note about my experiences and what I have learned while working in consulting, but if it happens I’ll share it here as well.

I also traveled back home to Moldova for a month. It was a relaxing experience and I got a chance to be a tourist in my own country visiting two wineries and touring their production and maturation facilities (with a tasting included). I also wrote up a set of problems for the IOI training camp, but due to the issues with creating enough testing sets, those were not used yet. Once the full set has been used by the team for the training purposes, I will make the contents of it, both problems and solutions available to the general public.

Last but not least, I moved to Houston, Texas and have started the PhD program in Computer Science. I am finishing up the orientation week activities and getting ready for an exciting semester ahead of me.

Cheers, and see you soon.

Teaching kids to code: I’m a computer scientist and I think it teaches kids important skills

Good afternoon world!

To those of you who know it doesn’t come as a surprise that I care a lot about teaching, in particular mathematics and computer science. Recently while browsing r/programming I came across an article that gives a perspective of a software developer on why you should not teach your kids how to code. I think the article brings up several good points, but is not quite complete and draws a conclusion with which I thoroughly disagree. Thus, I have decided to present my take on the issue, and point out a few nuances which I think are important to consider when making claims about usefulness of teaching coding.

Why is teaching coding a bad idea?

First of all, I need to say that the article is well written and definitely tackles some major issues with the hype around learning how to code. However, I would like to take a closer look at the following paragraph.

A former co-worker of mine was trained at a coding boot camp with the motto “Coding Is the New Literacy”. That sentiment is at the heart of all the programming books and games. The description on one popular book says starting coding early is “essential to prepare kids for the future”. This gives the impression that not teaching kids to code is somehow equivalent to not teaching them to read.

Joe Morgan, Slate.com

I think this is a valid criticism of aggressive advertisement of coding boot camps and resources target at children. However, the language of “new literacy” can be taken apart from a different angle. Just like we learn basic reading skills, we also learn basic mathematical reasoning at the young age. In many ways coding is a field that brings together the abstract mathematical reasoning and applied results. Hence, while the branding of “new literacy” is misleading, we can think of coding as a rather “old literacy” repackaged into a modern set of tools and scenarios.

In particular, a good amount of the critique presented in the article is aimed at learning syntax of particular language rather then general problem solving. I do agree that teaching coding, regardless of the age group in fact, should be aimed at cultivating problem solving skills and developing abstract thinking. However, the second point is exactly why I disagree with the article’s author. While we talk about problem solving a lot, it seems like a common pattern to think of teaching problem solving in the context of applied life skills: assembling furniture, cooking, playing with LEGO, etc. However, what many of these examples lack is the abstraction aspect that is essential to mathematics and computer science. Even the notion of something so natural as a natural number hides in itself a deep abstraction leap that is often taken for granted. When we think of the number 3, we are thinking of an abstract notion of count. Three apples are inherently different from three oranges or three chairs, but we are thinking of some abstract property that unites all of these examples, namely the count. The number 3 does not exist on its own, we can’t create a palpable 3, but we still are capable of thinking and knowing exactly what we mean by it.

Hence, mathematics and by a natural extension coding is not only about problem solving, drive and perseverance. It is also about abstract thinking, which is something that needs to be cultivated early on. I have encountered multiple people who struggle in college level proof-based mathematics classes, because the school system has failed at teaching them abstract thinking and reasoning to a satisfactory degree. I want to reiterate, that it is not a flaw within those people, and it is not some special quirk of mathematics as a subject. Anyone can learn mathematics, and everyone should learn some basic skills from it. The most valuable skill being exactly the power of abstract thinking.

So what exactly is abstract thinking?

It is hard to define exactly what do I mean by abstract thinking, but there are a few common trends that occur throughout examples of it. First, there is a pattern recognition part of any abstraction. Namely, an abstract property arises often as a natural recognition of a pattern in observed world. For example, we can go back to counting example. We observe a certain natural property of the objects that surround us. They can appear in different quantities. One way of abstracting the idea of quantity is precisely counting. When we think of apples on the table, we can consider their individual volumes (another abstraction) or masses (abstraction again), but perhaps we can also consider the number of individual apples, i.e. their count. We recognize some pattern to our world, and create an abstract concept to reflect it.

Now, there is a second common trend, classification or identification of the equivalence classes of patterns. This sounds complicated and is probably peppered with strict mathematical definitions (listed in Appendix). However, I am arguing that in fact it is one of the most natural things that people do! This idea was brought to my attention by prof. Laszlo Babai during one of his lectures on discrete mathematics. We do notice same colors, and group things based on the color, without realizing that in fact we are identifying an equivalence class. We do recognize that three apples have the same count as three oranges, therefore identifying an equivalence class among things that can be counted, a class that we simply call 3. The same can be said about 5, and 7 and so on. We identify an abstract equivalence through observation of natural world.

The third commonality of abstractions is generalization or cross-applicability, if you wish. Once we develop an abstraction, we start noticing it new places, and realizing that the same logical process can be repeated and applied anew to a different scenario. First, let me tell you a classic joke.

Working late at night, a mathematician doesn’t notice how one of her candle tips over and sets some of the papers on the table on fire. Once she realizes what is going on, she rushes to the kitchen grabs a glass of water, pours it over the table and extinguishes the fire.

A few weeks later she has her friend over for an evening of whiskey and cigars. The careless friend throws a still lit match into the trashcan setting some papers in it on fire. The mathematician dumps out the flaming papers on the table, rushes to the kitchen for a glass of water, and then puts out the fire.

Her puzzled friend asks: “Why did you dump the papers on the table first?”

Mathematician replies: “Oh, I just reduced the problem, to the one I have solved two weeks ago!”

Folklore

This is a classical example of reducing to the problem previously solved, or if thought about slightly differently, recognizing the applicability of the same abstract pattern to a new case. In our apple arithmetic example, we can think of the following: we already realized the abstract notion of the numbers 3 and 5, and the pattern of them forming the number 8 when put together. Now, if we suddenly find ourselves with the same pattern for oranges, we already will know the answer 3 + 5 = 8. What helps us is the abstraction (the object doesn’t matter, the count does) and its generalization to any type of countable objects.

Thus, while not exactly answering what abstract thinking is I outline three important aspects of it, namely: pattern recognition, equivalence recognition, and generalization.

How does one teach kids to develop abstract thinking and what all of this has to do with coding?

We are incredibly lucky here, because a lot of basics of abstract thinking come to us for free as a perk of being human. Furthermore, a lot of basic children literature is already aimed at developing skills tied to pattern and equivalence recognition. The generalization of the abstractions on the other hand is not always common in early teaching, and is one of the important aspects of mastering abstract thinking. Kids would often struggle with basic algebra concepts, such as a variable or unknown. What is important is teaching them in a way that allows these notions to be genuinely recognizable as common patterns, that simply generalize the rules already learned. In that line of thought, a function f(x)=x+2 can be though of as the notion of taking some number of objects in a countable class and adding two more of the same object. Adding two apples to how many apples you already have (but recall, apples did not matter in the end).

So how does coding tie into this entire story? Well, coding in itself is full of abstractions, and therefore presents a rich playground for maturing the concepts and ideas of abstract thinking. However, unlike mathematics or physics, coding has a unique aspect to it that allows us to see practical implications of our abstract reasoning.

It is exciting to see how some words that you wrote turn into a circle on your screen, or a square, or a flag (more on that in the next post). However, it is also important that this exemplification allows kids to solidify and check their abstract reasoning. A for loop is an abstract construction that allows you to repeat an action some prescribed number of times. It is good to have an understanding of this abstraction, but solidifying it by seeing how changing the number of repetitions, changes the number of figures drawn is extra nice. It brings back that natural observation component to the abstract thinking, which should enable a young learner to thinking creatively and develop a new graspable connection between abstract generalized concepts and tangible everyday observations.

Conclsuion

Coding gives us an opportunity to learn abstract thinking while continuously supporting it with natural observations. In the similar way to cooking and tinkering with LEGO, we get a combination of ideas and observable consequences within one process. We should shift the aim of coding books and boot camps for children from “one true answer” syntax oriented problems, to thinking and skills oriented puzzles. The goal of such education is not to foster a particular skill in a programming language, but to create a thinker, who can notice patterns and effectively generalize them to new problems encountered.

We have an amazing tool that can easily grasp attention and provide a rich and exciting framework for learning. Instead of shunning it due to novelty or perceived shallowness, we should think about how we can use it to teach and learn what is truly fundamental: the abstract thinking!

Appendix

A little bit of dry math for the formal definitions to some of the stuff I have been talking about. Keep in mind that these highly formal and abstract definitions in fact tie back to our natural observations.

Definition. A binary relation R is defined as a set of ordered pairs (a,b) of elements a from a set A and b from a set B, i.e. a subset of A\times B. We say that two elements a,b are related, denoted aRb iff (a,b)\in R.

Definition. We call a binary relation R defined on the pair of sets A and A an equivalence relation iff it satisfies the following properties:

  1. Reflexivity: aRa
  2. Symmetry: aRb \implies bRa
  3. Transitivity: If aRb and bRc, then aRc.

Definition. Given a set A, an element of that set a and an equivalence relation on this set \sim. We call the set [a]=\{x\in A|x\sim a\} the equivalence class of the element a.

Definition. A partition of a set A is collection of disjoint sets B_1, B_2, ..., B_n s.t. their union equals A.

Theorem. The set of equivalence classes of A under an equivalence relation \sim is a partition.

Definition. We call the set of equivalence classes of a set A under an equivalence relation \sim a quotient set, and denote it by A/\sim.

Why any (comprehensive) course on computer programming should cover basics of C: A story.

Good evening world!

Recently I came across a small task that reinforced my belief in the importance of C programming. The task was the following:

  1. Generate 500,000,000 pseudo-random numbers using Linear Congruential Generator algorithm [1, 2].
  2. Use Box-Mueller transform [3] to get normally distributed random variables (RV).
  3. Compute the next step in the evolution of a deterministic system, and add noise using the generated normally distributed RV.
  4. Write the result of every 1000th step to a file.

In other words, we need to run a for-loop for 500,000,000 steps, doing some calculations (generating RV + evaluating deterministic function), and writing to a file once in 1000 steps.

This doesn’t sound particularly challenging, and the whole thing can be done in less than 80 lines of C code. Same task can also be done in about 45 lines of Python. However, LOC is not the metric I want to look at here. I want to talk about performance of the code written, and some general educational caveats.

Is the field set level?

Let’s talk a tiny bit about optimization, under-optimization and over-optimization here.

The moment I shared this story with my friend, he immediately said that the performance comparison doesn’t make sense. However, the argument provided was the following: if the Python code takes that much longer to run, it clearly was not written well. I agree, when I was coming up with the comparison I was not using any fancy race-track wheels for Python. The entire script I wrote is as vanilla Python, as one can possibly imagine. Does this mean that I am cheating by employing “bad” or “inefficient” Python coding practices? I would say no, or at worst, just a little bit.

In both cases: C and Python, I wrote a vanilla implementation of the given task. Hence, no parallelism, no non-standard libraries, no pre-compiled/hardware optimized shenanigans. Did I manage to cheat a bit? Yes, of course I did, I compiled my C code with -O3 optimization flag. This of course is not the full story either. I did run my Python script naively invoking python ./generate.py rather than trying to compile it into optimized binary and then running it. However, for all of these “sins” I have a quite simple answer: I don’t do that with Python 99.9% of the time. I do not compile my Python scripts. I do not roam the web for pre-compiled computational libraries for Python. I do not tend to care that much about performance in the first place, when I code in Python.

How is this a conversation about optimization then? Well, I think we need to consider several parameters to be optimized, and then checkout what we get in terms of the relative performance. Hence, I will be thinking about 3 metrics here:

  1. Human minutes spent writing code (including debugging).
  2. Human minutes spent waiting for the program to finish running.
  3. sys runtime of the programs written.

In the context of these 3 parameters I can clearly define what I mean by optimizing, over-optimizing and under-optimizing performance of a task.

Over-optimizing: This is the case when I will spend a lot of time writing code that supposedly is great in terms of wall and sys times. Not surprisingly majority of the over-optimization in my case does not come from the assembly injections leveraging latest microarchitecture features. When I over-optimize with probability 0.9 it is due to me finding a paper proposing a fast algorithm that I am trying to write from scratch. Clearly this brings a caveat: asymptotically better performance, does not always translate into cycle-count performance on small enough examples.

Optimizing: Once in a blue moon, when working on a one-off personal project, I do hit the right spot. Just enough of complexity in the implementation to get a good average for the runtimes. Any properly optimized code should be optimized both in terms of human minutes spent writing it, and human minutes spent waiting for the results. However, as with anything in the world of computer programming, or life at large, there is a caveat: optimization is context dependent. Spending more development hours over code that has to be reused on a regular basis is worth it, as long as the eventual benefit in runtime pays for it.

Under-optimizing: This is what happens when the deadline is far away. Hacking together 25 lines of your favorite vanilla high-level language, and letting it run overnight, because you still have a week of time left, and one run only takes 14 hours. Surprisingly, I think that from a practical perspective this is more justified than over-optimizing. If I had to choose between code that takes 14 hours to run, but gets the job done, and code that takes 12 hours to develop and only 2 to run, I might go for the first one, because at least I can sleep or read for those 12 hours of difference. However, the caveat here is simple: if you need to run the code more than once, the unoptimized runtime will cost you a lot.

Performance Analysis

I was compiling and running all code on my personal laptop. The specs are listed below.

MacBook Pro (Retina, 13-inch, Mid 2014)

  • 2.6 GHz Intel Core i5 (4278U)
  • 8 GB 1600 MHz DDR3 RAM

Runtimes measured with time utility.

Runtime comparison for the generator scripts.

As you can see all across the times, the performance differs drastically. This is by no means a shocking or unexpected result, but it matters for the rest of the discussion.

Pedagogical Discussion

This post ultimately is about teaching and learning, so let’s finally talk about why any comprehensive course[F1] on computer programming must cover some basics of C language.

First, C is a compiled language. While the intricacies of compiling as a process lie beyond the introduction level, the acknowledgement of compilation as a step in a lifecycle of a program is critical. Virtually anything that has to do with computer programming in its broadest definitions can benefit from a better understanding of the full picture. As an example I can bring up a recent workplace story, where as we discovered certain business logic scripts where ultimately compiled into SQL statements. When the underlying tables changed, SQL statements became invalid, while the surface level logic remained perfectly sound. Thus, it took a bit of tinkering around to find out that in fact we had to trigger a re-compile for the SQL to become valid again. Hence, if you have a better knowledge of the full picture, then your bug fixing abilities are also better.

Second, C has great performance metrics. As the first part of this story shows, C does in fact yield quiet great performance in its vanilla form. Of course you have to be mindful of your project scope. In terms of over-optimization failures C is probably at the top of the list in close competition with C++. Just think of all the linked list implementations ever written in C. Now, think of all double linked list implementations, and FFT implementations, and Dijkstra’s algorithm implementations, and so on ad nauseam. Writing code in C oftentimes feels like re-inventing the wheel. In fair part because it is. However, when the task at hand boils down to arguably simple arithmetic operations that need to performed at medium scale, writing it up in C is probably the best bet.

Third, C is ubiquitous (unless you are on Windows). If you have *nix system it comes with either gcc or clang or some other form of C compiler. No need to download a myriad of heavy IDEs and weird things. To be fair the same can be said about vanilla Python, which in part is why I love it so much (but still use Anaconda).

Fourth, C builds discipline and rigor. I am not talking about painstaking debugging of memory leaks and segfaults. I am not talking about arcane magic of pointer arithmetic. Those things are clearly important, but I am talking about very very basic C. You need to declare variable types. You need to allocate stuff ahead of time. You need to keep track of what moving parts you’ve got in the game. These things amount to cleaner code and better style. You have to at least minimally organize your thoughts before writing in C. Hopefully, that generalizes to the same concept for all of the code you will write.

Finally, C is just another programming language. I firmly believe that breadth of programming knowledge is equally if not more important than depth for about 80% of people who will ever write code. In the job market it is hard to guarantee that your amazing knowledge of C++ will always be equally demanded. In the academia you might realize that the lab you just joined does things in Perl. You can probably still write half of your code in Java, but then you need to interface to the rest of the Perl codebase… You get the general idea. On the other hand, “the jack of all trades, but master of none” kind of programmer will be more likely to pick up a new language from the docs, because it is needed. In this regard C serves as a good training ground for general programming principles.

Hence, in the end we have a performant language that exposes you to some fundamental programming concepts and builds up a better coding discipline.

References

  1. S. K. Park, K. W. Miller. Random number generators: good ones are hard to find. Comm. ACM (31):10, pp. 1192-1201, 1988. [DOI: 10.1145/63039.63042]
  2. P. A. W. Lewis, A. S. Goodman, J. M. Miller. A pseudo-random number generator for the System/360. IBM Systems Journal (8):2, pp. 136-146, 1969. [DOI:10.1147/sj.82.0136]
  3. G. E. P. Box, M. E. Muller. A Note on the Generation of Random Normal Deviates. Ann. Math. Statist. (29):2, pp. 610-611, 1958. [DOI: 10.1214/aoms/1177706645]

Footnotes

[F1] By comprehensive course I mean an academic to a calendar year long introductory sequence on computer programming. Examples would include any “Intro to Computer Science… n = 1, 2, 3, …” sequences, and any bootcamps that aim to teach you computer programming. I do agree that there are shorter courses that clearly cannot cover learning C. However, I would also argue that such courses by no means are comprehensive.

Biophysical Society Annual Meeting 2019

Evening world! ( I have rewritten this 5 times at this point)

My visit to the BPS AM’19 was very short, but nevertheless I am happy that I had the opportunity to travel to this conference. Those few days have been eventful, and I have met and talked with multiple amazing people . Below I would like to share some of my thoughts on the talks, presentations and posters I have seen and engaged with. Since I have left early Monday morning, I cannot comment on the most of the content presented, so take this as a short snippet of my thoughts and not a comprehensive review.

Style

I firmly believe that style and presentation are extremely important in scientific world, especially during conferences. Good content organization and clear, simple to follow figures greatly improve audience’s ability to engage with material. I also think that organizing your content in a parsable way is the minimal courtesy you owe your readers.

I will start with some positive notes and remarks. First of all, black backgrounds in presentations are great! Of course there is a caveat of choosing appropriate color schemes for your figures (which I bet will deter people from going this route, since who wants to spend time making two sets of figures: for your presentation and for the paper), but when done well black background makes the content shine. The few presentations that had black background at the AM, warmed my heart.

Next, I need to applaud the usage of space on the slides. I haven’t seen a single overcrowded slide (maybe I just got lucky). Spreading out plots and figures so that they are readable is absolutely essential. Space usage is something I’ll come back to when talking about posters (including mine).

Now, let me move onto my incoherent rambling about style. I am in the process of learning Beamer, and it is definitely on the list of things I plan to master by the end of the year. It seems that the usage of LaTeX and LaTeX based tools and packages at this conference is almost exclusive to people coming in from the theory angle. The only talk I have seen for which slides were coming in the Beamer generated fashion was a theory talk on the inverse temperature dependence. I am not too surprised that this is the case, for two reasons. On one hand, majority of the CS, Math, Physics theory people would learn and use LaTeX anyway, which makes copy pasting and rearranging stuff easier. On the other hand, picking up LaTeX just for the presentations, especially if they are not peppered with formulas all the way through, seems like an overkill. I do hold a firm belief that people working in STEM fields should have some basic LaTeX skill as a part of the undergrad experience or early professional, but I would not go as far as to claim that we need to build everything in LaTeX. This is mostly an observation, as I reserve my LaTeX evangelism for a different post and time.

One thing that I do not grasp at all is why in 2019 we still have Comic Sans being used in presentations. Honestly, there is a plenty of great fonts that can catch people’s eyes if you are going for catchy. I do not share the ridiculous disgust towards Comic Sans, but even if we look at the original design specs it is supposed to be informal, children oriented font. Of course great research typed in Windings, will be ridiculous to read through, but that won’t change the fact that the underlying work is important. However, the font choices like this scream lack of professionalism and absolute style illiteracy to me. Just to reiterate this point, your style choices affect my perception of your attitude and skill, they do not affect my perception of your research work. Please, stop the Comic Sans abuse at science conferences!

Whew, okay, now let’s move to posters. First of all, while still far from perfect, I am fairly satisfied with the way my poster turned out. My main concerns are the crowded middle region where I had to have partially overlapping plots, and some not properly cleared up VMD renders that I simply did not have enough time to redo due to the data availability. Modulo those nuances I was fairly happy with the color scheme and organization, including some accidental design features. Coloring both the protein renders from VMD, and the lines in the plots with the same color convention was definitely a good move, as it allowed for seamless parsing of the plots, and saved much needed space by reducing number of legends I had to carry. An accidental feature was positioning the initial question with the illustrations at roughly same horizontal level as the conclusions. Since both contain a very similar illustration, it was easy to visually grasp what is that thing that changed over the course of our work. I am not fully pleased with some of the text in the poster, and I definitely would want to have another 2-3 days of re-reading and editing it, but alas.

The vast majority of the posters I saw at the conference were great. Clear organization, good posting sentences, and clear figures that stood out from the distance. However, there were a few common issues that I have noticed, predominantly in undergraduate posters. First is simple lack of content, or rather overabundance of white space. I have seen one too many posters with 2 or 3 figures, 3-4 paragraphs of text, and a canvas large enough to fit at least 3 more figures. I assume that this problem partially stems from the poster size expectation. Hey, we all think of 36″ x 24″ poster, and how this is a standard. However, I’d rather see more 24″ x 18″ posters that are content packed than 36″ x 24″ full of wide margins. Worst thing is that sometimes a poster with huge margins will still overcrowd the plots in the middle of itself. No bueno. Spacing out and arranging stuff is a pain in the butt, I know this myself. Plots come in weird proportions, plotting software is a monster to tame, and things tend to look different on your screen vs matte letter print vs glossy/matte poster print. However, the grind against just going with “Egh, good enough” is essential step in improvement of style. My work on this poster taught me a good principle, that my PI learned from his PI, so let me pass it onto you.

Make your poster, then print it out on Letter/A4 paper. If you can read the main figures clearly, then it’s a good font and line size, and proper spacing. If you are having trouble making sense of it on this print, go back to the drawing board.

NS paraphrasing Esmael Haddadian paraphrasing someone else…

On the other hand, I was impressed by some of the very creative approaches to poster presentations. Namely, the poster on Structure of a Non-canonical Middle Domain in Protocadherin-15 by Brandon L. Neel had an interesting twist to it. In addition to the regular static imagery, there also was a tablet set up, so that the interested people could watch MD trajectories evolve over time. I think that with the technology we have at hand right now, people who go extra mile to get their point across and present data in new and informative ways are the people with the higher potential for an impact.

On this note, I will wrap up my discussion of the style and presentation choices and do a quick dip into some of the research related moments and ideas.

Research

An obligatory, I am not a biophysicist by training. I am a computer scientist, I code things, analyze how fast they run, and think about how math models can help us solve real world problems. Thus, the rant below is not necessarily driven by deep professional concerns from a biophysics angle, but rather by some common sense scientific practices and beliefs.

I have attended some talks by scientists from D. E. Shaw research group, and while the content was impressive, I do feel deeply troubled by the messaging and approaches. Let me elaborate a little bit. Science by a large degree consists of trial and error. Especially repeated trial and repeated error. This applies to both experimental and simulation based results. If you were to measure or simulate something once, you would not publish a paper claiming it as a result. (Well sure in some rare cases you would, but that’s a more involved discussion.) Repeatability and multiple data points are essential to any kind of result that you are willing to cite as evidence for your hypothesis. Now, let us take this one step further. If I say that I did 700 independent experiments, and 95% of them yield a certain result with error margin of epsilon, you probably will agree that this sounds like a solid dataset. However, what if I also tell you that at this point in time no other research group can replicate my results? Sure, that also happens, sometimes the machinery is too complex and in many senses unique to the experiment. I mean after all both LHC and LIGO are not easy to come by at every other lab. On the other hand, those are large projects open to multiple collaborators within the field. They are not exactly a proprietary system with limited access given only to the employees of a particular private company. I think you see where I am going with this.

Work done by D. E. Shaw group is marvelous. However, I do have an issue with irreproducibility of the results. Say someone really wants to push for a certain drug affecting catalytic cycle of a protein involved in a disease. Assume that someone has ability to simulate the said catalytic cycle at high resolution with appropriate timescale. Now, what if someone accidentally messes up the parameters of their simulation? We get a perfect proof of effectiveness backed up by 500+ simulations, which then turns into active drug development. If 3 years down the line we discover that this in fact is bogus and doesn’t work, well we have wasted 3 years. If we don’t then we have a new drug, which might still end up being not quite as good as promised.

This is a long stretch hypothetical, but it has some ground behind it. Datasets have been cooked before (the example I am thinking of is a very particular test/train split in a study on the effects of certain drug combinations in cancer treatment), especially when money is on the line, and when pharmaceutical companies are in play, there always will be money on the line. Thus, while I hope that the researchers are keeping level heads and cold judgement, I cannot simply dismiss the concern about exclusivity of the technology that makes the MD research at D. E. Shaw beyond the cutting edge.

I believe that we should strive for repeatability and cross-validation in science. There always will be obstacles on this path, and I do believe in intellectual property rights, patents and related concepts. However, I think that the balance shall be maintained, and tools could and should be shared. After all making the first hammer is an invention, but once we have one it’s about who can do the most with it.

Concluding remarks

I started writing this almost a month ago, so I am not sure how connected those pieces feel. This was a long ramble about random stuff, so critique away at it.

Cheers, ns.