Discrete Mathematics Vignettes: Counting

This spring semester I am co-teaching a first-year undergraduate course in computer science at Rice University. The name of the course is COMP 182: Algorithmic Thinking. In reality the name of the course can be somewhat misleading, as I would rather title it “Introduction to Discrete Mathematics and Algorithm Design and Analysis”, but it doesn’t quite roll off the tongue as easily. However, this is not a post about the course or teaching experience, this is a post about one of the upcoming topics in the class: counting.


What is counting? In general in discrete mathematics counting concerns deriving mathematical expressions which capture the total number of certain objects that exist. Of course, since we can always make an identical copy of an object and thus increase the count by one, what we really mean is counting the objects up to an appropriate notion of isomorphism, or put plainly, counting distinct objects up to whatever notion of being distinct we choose.

While deriving expressions that capture the counts of objects is an important task, what I want to talk about today goes one step beyond the basics of counting, and is rather aimed to provide you with some applied tools you can sue in the future (of course, in the spirit of Alexander Razborov, here by applied I mean applied to the study of mathematics). However, in order to make this primer relatively complete we will start with some basic results in counting, and then switch to the applications of counting techniques and ideas to more interesting problems.


This is the classic problem with which most counting modules of discrete math books start: Find the number of ways in which n distinct objects can be arranged in a line. Key information here is that we are talking about distinct objects and an arrangement in a line. It is easy to show that the total number of such permutations is n!=1\cdot 2\cdot 3\cdot\ldots\cdot n. We can do so by observing that given an ordering of n-1 objects we have n places to put the new object in, and this holds for every ordering of n-1 objects. More formally, if we denote the number of permutations of n objects by P(n) then we have P(n)=nP(n-1). Also, while we are here, it is useful to note that there is exactly one way of putting no objects down at all, or in other words 0!=1.

Now, we can take a look at the number of permutations in a slightly different way. For the sake of brevity we will be denoting the set of n positive integers as [n]=\{1, 2, 3, ..., n\}. Consider the set of functions f:[n]\to[n], we will call this set \mathcal{F}_n=\{f|f:[n]\to[n]\}. Recall that a function is an assignment of outputs for any given input, such that exactly one output value is assigned to any input value. Thus, we can see that |\mathcal{F}_n|=n^n, as for any given input i\in[n] we have n choices of the output value, and since all choices are independent of each other we have a total of n^n functions. A reasonable follow up question, is to ask how many of those functions are bijections. Recall that we call a function a bijection iff for any two distinct input values it produces distinct outputs, and for every possible output value there exists an input value that maps to it via our function. In other words, when we look at \mathcal{F}_n bijections are precisely the functions that assign to each input i\in[n] a distinct output j\in[n] s.t. for any pair of i_1\neq i_2 we have f(i_1)\neq f(i_2). Note, that since the functions are from [n] to [n] it follows that such an assignment has to use all possible output values. If we look closer at the structure of such bijection, we can realize that it is essentially a permutation of n distinct objects, since we can think of input values as positions on the line, and outputs as the distinct objects we place in those positions. Hence, it follows that there is n! bijections in \mathcal{F}_n.

What we saw above is the fundamental use of counting in combinatorics, we find a combinatorial object of interest (say bijections on the set of n elements) and we count the number of such objects. Often we will see seemingly unrelated families of objects that result in the same total counts, offering us different combinatorial interpretations of the same formula. Furthermore, once we are able to count the number of objects with a given property (say being a bijection) among the total collection of objects (say functions between two sets of n objects) we can immediately get the probability of an object selected uniformly at random to possess the property of interest. In the case of the bijections above we have that Pr(f\in\mathcal{F}_n,\,f\text{ is a bijection})=\frac{P(n)}{|\mathcal{F}_n|}=\frac{n!}{n^n}. We will see why such simple observations can be of use a bit later in the post.

Count once, count twice

Whenever we count number of certain objects we can sometimes derive the formula based on different inputs. This fact becomes useful in cases when we want to show equality between two expressions. In other words, if we want to prove that two expression are equal, one way to do so can be expressing the count of a particular combinatorial object in two ways, each corresponding to one of the expressions. A quick simple example of this proof technique is given by the handshake lemma in graph theory. Consider a simple graph G=(V, E). We want to evaluate the sum of degrees of all vertices \sum_{v\in V}\mathrm{deg}(v). In order to do so, we can think of a counting argument. What is a degree of any given vertex? It is exactly the number of edges incident to that vertex. Hence, if we sum degrees of all the vertices in a graph it follows that we are indirectly counting the edges of the graph. However, each edge in a simple graph is incident to exactly two vertices. This means that when we summed our degrees, we actually counted occurrence of each edge exactly twice (once per endpoint). Thus, it follows that \sum_{v\in V}\mathrm{deg}(v)=2|E|. While this result at first might not appear as a counting argument, if you think about it for a bit you can recognize that we essentially counted the number of edges in a graph in two ways, once by considering degrees of vertices and once, by simply counting all edges.

Another common example for double counting comes from the problem of finding the number of subsets of a set with n elements. Specifically, on one hand each subset is determined by whether each element is in or out of it, i.e. for each element in [n] we have two possible choices either it is in the subset (1) or it is not (0), thus the total number of possible subsets is the number of all possible strings of n characters over the alphabet of {0, 1}, which is 2^n. On the other hand, each subset has some size k where 0\leq k\leq n, and we know that the number of ways to pick k elements from n is \binom{n}{k}. Hence, we can conclude that \sum_{k=0}^n\binom{n}{k}=2^n.

A fun extension of the argument above can give us a way to count a slightly more complicated set of objects. We are now interested in how many subsets of a set of n elements have an even number of elements in them. On one hand we can express this number as the sum of \binom{n}{2k} with k\leq n/2, while on the other hand we can consider the following argument: take an arbitrary subset of a set with n-1 elements, now there is a unique way to extend it to an even sized subset of an n element set. Namely, if our starting subset has an even number of elements, we have to exclude, i.e. assign value 0, to the last element in the n element set, if the starting subset has an odd number of elements then we need to add the last element to it to make the size even, i.e. we assign the value of 1 to the last element. It follows that for any of the 2^{n-1} subsets of a n-1 element set, there is a unique extension to an even sized subset of n element set. Hence, we can conclude that \sum_{k=0}^{n/2}\binom{n}{2k}=2^{n-1}, or in other words, exactly half of all subsets have an even number of elements in them. This is a nice confirmation to an intuitive guess we might have had originally, but sadly the argument does not extend for the number of subsets with a multiple of 3 elements in them. Let’s formulate this more general version of the question. Let S(n, k) be the number of subsets of a n element set that have size divisible by k. We have from our previous work the values S(n, 1)=2^n, and S(n, 2)=2^{n-1}. It would be nice if S(n, 3) was simply 2^n/3, but sadly that is not an integer. However, with a little bit of elbow grease and usage of binomial theorem we can show that |S(n, 3)-\frac{2^n}{3}|< 1. I will leave the solution to this problem as an exercise for a curious reader.

To summarize, we just saw how counting the same combinatorial objects from two perspectives can serve us as a proof technique for showing equalities between expressions. Thus, we are starting to explore the field of applications that counting techniques provide us.

It exists! But I can’t show you an example

Recall how in the bijection counting question we brought up the probability of a randomly picked function from [n] to [n] to be a bijection. Besides the classical uses of probabilities, such as analyses of randomized algorithms, we can also wield probability as a tool for showing existence of objects with desirable properties. This technique is called probabilistic method and it was pioneered in combinatorics by Paul Erdős. The premise for the application is quite simple at the first glance, in order to show that an object x with a desirable property exists, we will show that the probability that a random element x belongs to the set of elements with desirable property x\in A, where x is picked from out universe \Omega is greater than 0.

In order to make this more concrete we will work through an application of this method to a graph theoretic question. Consider a complete graph on n vertices K_n. We will call a tournament an orientation of the complete graph, i.e. for every edge {u, v} in a complete graph we will pick a direction and replace it with a directed edge u\to v or v\to u. It is easy to check that the total number of tournaments on n vertices is 2^{\binom{n}{2}}=2^{\frac{n(n-1)}{2}}. Now, we are interested in determining whether for n\geq 3 there exists a tournament with n vertices that contains at least (n-1)!/2^n Hamiltonian cycles.

To start, consider the sample space of all tournaments on n vertices and assume the uniform distribution on it. Now, we want to introduce a random variable X that counts the number of Hamiltonian circuits (recall that a random variable is a function from the sample space to some set, in our case we have X:\Omega\to\mathbb{N}) in a tournament. Fix a node and consider some permutation of the remaining nodes in the tournament (we have a total of (n-1)! such permutations), we want to know when there is a Hamiltonian cycle that achieves that permutation. Clearly we need to have a directed edge from every predecessor to its successor in the cycle, hence in total we need n edges to point in the correct direction determined by our permutation. Now, let’s define an indicator random variable X_\sigma which is equal to 1 whenever the Hamiltonian cycle defined by the permutation \sigma:[n]\to[n] exists in a tournament. We want to compute Pr(X_\sigma =1). In order to do so, we note that we need exactly n edges to have fixed orientation, and hence it follows that the probability of such an event occurring is 1/2^n (since we can think of flipping a coin for orientation of each edge, it’s up to reader to check that this process indeed results in uniform distribution over tournaments). Now, by construction we have X=\sum_{\sigma}X_\sigma where \sigma goes over all possible permutations on n-1 vertices. Hence, by the linearity of expectation we have that \mathbb{E}[X]=\sum_{\sigma}\mathbb{E}[X_\sigma]=\sum_{\sigma}Pr(X_\sigma =1). Now, using our previous result we can conclude that \mathbb{E}[X]=(n-1)!/2^n. Hence, since the expected value is (n-1)!/2^n it follows that there exists some tournament that has at least (n-1)!/2^n Hamiltonian cycles in it.

While it is a stretch to claim that this result is purely enabled by our ability to count combinatorial objects, a lot of key ingredients to the proof rely on our ability to count. Furthermore, this showcases the power of probabilistic method as a non-constructive proof technique for showing existence. In other words, if we can properly count the number of certain objects, we may have a hope of proving interesting results about existence of objects with certain properties.

Instead of conclusion

Of course this is a cursory and incomplete account of counting techniques and applications in discrete mathematics. We haven’t touched upon some key counting ideas such as the inclusion-exclusion principle nor did we discuss any of the more advanced approaches such as generating functions. Furthermore, my account of probabilistic method was purposely short and lacked some of the proper rigor required. However, my aim here was to showcase some of the fun applications of counting techniques and to kindle the spark of interest in the reader. I am not certain whether I will write a follow up to this post any time soon, but stay tuned in case if I do.

Cheers, and don’t forget to count your chickens both before and after they hatch! 🐣

Teaching kids to code: I’m a computer scientist and I think it teaches kids important skills

Good afternoon world!

To those of you who know it doesn’t come as a surprise that I care a lot about teaching, in particular mathematics and computer science. Recently while browsing r/programming I came across an article that gives a perspective of a software developer on why you should not teach your kids how to code. I think the article brings up several good points, but is not quite complete and draws a conclusion with which I thoroughly disagree. Thus, I have decided to present my take on the issue, and point out a few nuances which I think are important to consider when making claims about usefulness of teaching coding.

Why is teaching coding a bad idea?

First of all, I need to say that the article is well written and definitely tackles some major issues with the hype around learning how to code. However, I would like to take a closer look at the following paragraph.

A former co-worker of mine was trained at a coding boot camp with the motto “Coding Is the New Literacy”. That sentiment is at the heart of all the programming books and games. The description on one popular book says starting coding early is “essential to prepare kids for the future”. This gives the impression that not teaching kids to code is somehow equivalent to not teaching them to read.

Joe Morgan, Slate.com

I think this is a valid criticism of aggressive advertisement of coding boot camps and resources target at children. However, the language of “new literacy” can be taken apart from a different angle. Just like we learn basic reading skills, we also learn basic mathematical reasoning at the young age. In many ways coding is a field that brings together the abstract mathematical reasoning and applied results. Hence, while the branding of “new literacy” is misleading, we can think of coding as a rather “old literacy” repackaged into a modern set of tools and scenarios.

In particular, a good amount of the critique presented in the article is aimed at learning syntax of particular language rather then general problem solving. I do agree that teaching coding, regardless of the age group in fact, should be aimed at cultivating problem solving skills and developing abstract thinking. However, the second point is exactly why I disagree with the article’s author. While we talk about problem solving a lot, it seems like a common pattern to think of teaching problem solving in the context of applied life skills: assembling furniture, cooking, playing with LEGO, etc. However, what many of these examples lack is the abstraction aspect that is essential to mathematics and computer science. Even the notion of something so natural as a natural number hides in itself a deep abstraction leap that is often taken for granted. When we think of the number 3, we are thinking of an abstract notion of count. Three apples are inherently different from three oranges or three chairs, but we are thinking of some abstract property that unites all of these examples, namely the count. The number 3 does not exist on its own, we can’t create a palpable 3, but we still are capable of thinking and knowing exactly what we mean by it.

Hence, mathematics and by a natural extension coding is not only about problem solving, drive and perseverance. It is also about abstract thinking, which is something that needs to be cultivated early on. I have encountered multiple people who struggle in college level proof-based mathematics classes, because the school system has failed at teaching them abstract thinking and reasoning to a satisfactory degree. I want to reiterate, that it is not a flaw within those people, and it is not some special quirk of mathematics as a subject. Anyone can learn mathematics, and everyone should learn some basic skills from it. The most valuable skill being exactly the power of abstract thinking.

So what exactly is abstract thinking?

It is hard to define exactly what do I mean by abstract thinking, but there are a few common trends that occur throughout examples of it. First, there is a pattern recognition part of any abstraction. Namely, an abstract property arises often as a natural recognition of a pattern in observed world. For example, we can go back to counting example. We observe a certain natural property of the objects that surround us. They can appear in different quantities. One way of abstracting the idea of quantity is precisely counting. When we think of apples on the table, we can consider their individual volumes (another abstraction) or masses (abstraction again), but perhaps we can also consider the number of individual apples, i.e. their count. We recognize some pattern to our world, and create an abstract concept to reflect it.

Now, there is a second common trend, classification or identification of the equivalence classes of patterns. This sounds complicated and is probably peppered with strict mathematical definitions (listed in Appendix). However, I am arguing that in fact it is one of the most natural things that people do! This idea was brought to my attention by prof. Laszlo Babai during one of his lectures on discrete mathematics. We do notice same colors, and group things based on the color, without realizing that in fact we are identifying an equivalence class. We do recognize that three apples have the same count as three oranges, therefore identifying an equivalence class among things that can be counted, a class that we simply call 3. The same can be said about 5, and 7 and so on. We identify an abstract equivalence through observation of natural world.

The third commonality of abstractions is generalization or cross-applicability, if you wish. Once we develop an abstraction, we start noticing it new places, and realizing that the same logical process can be repeated and applied anew to a different scenario. First, let me tell you a classic joke.

Working late at night, a mathematician doesn’t notice how one of her candle tips over and sets some of the papers on the table on fire. Once she realizes what is going on, she rushes to the kitchen grabs a glass of water, pours it over the table and extinguishes the fire.

A few weeks later she has her friend over for an evening of whiskey and cigars. The careless friend throws a still lit match into the trashcan setting some papers in it on fire. The mathematician dumps out the flaming papers on the table, rushes to the kitchen for a glass of water, and then puts out the fire.

Her puzzled friend asks: “Why did you dump the papers on the table first?”

Mathematician replies: “Oh, I just reduced the problem, to the one I have solved two weeks ago!”


This is a classical example of reducing to the problem previously solved, or if thought about slightly differently, recognizing the applicability of the same abstract pattern to a new case. In our apple arithmetic example, we can think of the following: we already realized the abstract notion of the numbers 3 and 5, and the pattern of them forming the number 8 when put together. Now, if we suddenly find ourselves with the same pattern for oranges, we already will know the answer 3 + 5 = 8. What helps us is the abstraction (the object doesn’t matter, the count does) and its generalization to any type of countable objects.

Thus, while not exactly answering what abstract thinking is I outline three important aspects of it, namely: pattern recognition, equivalence recognition, and generalization.

How does one teach kids to develop abstract thinking and what all of this has to do with coding?

We are incredibly lucky here, because a lot of basics of abstract thinking come to us for free as a perk of being human. Furthermore, a lot of basic children literature is already aimed at developing skills tied to pattern and equivalence recognition. The generalization of the abstractions on the other hand is not always common in early teaching, and is one of the important aspects of mastering abstract thinking. Kids would often struggle with basic algebra concepts, such as a variable or unknown. What is important is teaching them in a way that allows these notions to be genuinely recognizable as common patterns, that simply generalize the rules already learned. In that line of thought, a function f(x)=x+2 can be though of as the notion of taking some number of objects in a countable class and adding two more of the same object. Adding two apples to how many apples you already have (but recall, apples did not matter in the end).

So how does coding tie into this entire story? Well, coding in itself is full of abstractions, and therefore presents a rich playground for maturing the concepts and ideas of abstract thinking. However, unlike mathematics or physics, coding has a unique aspect to it that allows us to see practical implications of our abstract reasoning.

It is exciting to see how some words that you wrote turn into a circle on your screen, or a square, or a flag (more on that in the next post). However, it is also important that this exemplification allows kids to solidify and check their abstract reasoning. A for loop is an abstract construction that allows you to repeat an action some prescribed number of times. It is good to have an understanding of this abstraction, but solidifying it by seeing how changing the number of repetitions, changes the number of figures drawn is extra nice. It brings back that natural observation component to the abstract thinking, which should enable a young learner to thinking creatively and develop a new graspable connection between abstract generalized concepts and tangible everyday observations.


Coding gives us an opportunity to learn abstract thinking while continuously supporting it with natural observations. In the similar way to cooking and tinkering with LEGO, we get a combination of ideas and observable consequences within one process. We should shift the aim of coding books and boot camps for children from “one true answer” syntax oriented problems, to thinking and skills oriented puzzles. The goal of such education is not to foster a particular skill in a programming language, but to create a thinker, who can notice patterns and effectively generalize them to new problems encountered.

We have an amazing tool that can easily grasp attention and provide a rich and exciting framework for learning. Instead of shunning it due to novelty or perceived shallowness, we should think about how we can use it to teach and learn what is truly fundamental: the abstract thinking!


A little bit of dry math for the formal definitions to some of the stuff I have been talking about. Keep in mind that these highly formal and abstract definitions in fact tie back to our natural observations.

Definition. A binary relation R is defined as a set of ordered pairs (a,b) of elements a from a set A and b from a set B, i.e. a subset of A\times B. We say that two elements a,b are related, denoted aRb iff (a,b)\in R.

Definition. We call a binary relation R defined on the pair of sets A and A an equivalence relation iff it satisfies the following properties:

  1. Reflexivity: aRa
  2. Symmetry: aRb \implies bRa
  3. Transitivity: If aRb and bRc, then aRc.

Definition. Given a set A, an element of that set a and an equivalence relation on this set \sim. We call the set [a]=\{x\in A|x\sim a\} the equivalence class of the element a.

Definition. A partition of a set A is collection of disjoint sets B_1, B_2, ..., B_n s.t. their union equals A.

Theorem. The set of equivalence classes of A under an equivalence relation \sim is a partition.

Definition. We call the set of equivalence classes of a set A under an equivalence relation \sim a quotient set, and denote it by A/\sim.

Why any (comprehensive) course on computer programming should cover basics of C: A story.

Good evening world!

Recently I came across a small task that reinforced my belief in the importance of C programming. The task was the following:

  1. Generate 500,000,000 pseudo-random numbers using Linear Congruential Generator algorithm [1, 2].
  2. Use Box-Mueller transform [3] to get normally distributed random variables (RV).
  3. Compute the next step in the evolution of a deterministic system, and add noise using the generated normally distributed RV.
  4. Write the result of every 1000th step to a file.

In other words, we need to run a for-loop for 500,000,000 steps, doing some calculations (generating RV + evaluating deterministic function), and writing to a file once in 1000 steps.

This doesn’t sound particularly challenging, and the whole thing can be done in less than 80 lines of C code. Same task can also be done in about 45 lines of Python. However, LOC is not the metric I want to look at here. I want to talk about performance of the code written, and some general educational caveats.

Is the field set level?

Let’s talk a tiny bit about optimization, under-optimization and over-optimization here.

The moment I shared this story with my friend, he immediately said that the performance comparison doesn’t make sense. However, the argument provided was the following: if the Python code takes that much longer to run, it clearly was not written well. I agree, when I was coming up with the comparison I was not using any fancy race-track wheels for Python. The entire script I wrote is as vanilla Python, as one can possibly imagine. Does this mean that I am cheating by employing “bad” or “inefficient” Python coding practices? I would say no, or at worst, just a little bit.

In both cases: C and Python, I wrote a vanilla implementation of the given task. Hence, no parallelism, no non-standard libraries, no pre-compiled/hardware optimized shenanigans. Did I manage to cheat a bit? Yes, of course I did, I compiled my C code with -O3 optimization flag. This of course is not the full story either. I did run my Python script naively invoking python ./generate.py rather than trying to compile it into optimized binary and then running it. However, for all of these “sins” I have a quite simple answer: I don’t do that with Python 99.9% of the time. I do not compile my Python scripts. I do not roam the web for pre-compiled computational libraries for Python. I do not tend to care that much about performance in the first place, when I code in Python.

How is this a conversation about optimization then? Well, I think we need to consider several parameters to be optimized, and then checkout what we get in terms of the relative performance. Hence, I will be thinking about 3 metrics here:

  1. Human minutes spent writing code (including debugging).
  2. Human minutes spent waiting for the program to finish running.
  3. sys runtime of the programs written.

In the context of these 3 parameters I can clearly define what I mean by optimizing, over-optimizing and under-optimizing performance of a task.

Over-optimizing: This is the case when I will spend a lot of time writing code that supposedly is great in terms of wall and sys times. Not surprisingly majority of the over-optimization in my case does not come from the assembly injections leveraging latest microarchitecture features. When I over-optimize with probability 0.9 it is due to me finding a paper proposing a fast algorithm that I am trying to write from scratch. Clearly this brings a caveat: asymptotically better performance, does not always translate into cycle-count performance on small enough examples.

Optimizing: Once in a blue moon, when working on a one-off personal project, I do hit the right spot. Just enough of complexity in the implementation to get a good average for the runtimes. Any properly optimized code should be optimized both in terms of human minutes spent writing it, and human minutes spent waiting for the results. However, as with anything in the world of computer programming, or life at large, there is a caveat: optimization is context dependent. Spending more development hours over code that has to be reused on a regular basis is worth it, as long as the eventual benefit in runtime pays for it.

Under-optimizing: This is what happens when the deadline is far away. Hacking together 25 lines of your favorite vanilla high-level language, and letting it run overnight, because you still have a week of time left, and one run only takes 14 hours. Surprisingly, I think that from a practical perspective this is more justified than over-optimizing. If I had to choose between code that takes 14 hours to run, but gets the job done, and code that takes 12 hours to develop and only 2 to run, I might go for the first one, because at least I can sleep or read for those 12 hours of difference. However, the caveat here is simple: if you need to run the code more than once, the unoptimized runtime will cost you a lot.

Performance Analysis

I was compiling and running all code on my personal laptop. The specs are listed below.

MacBook Pro (Retina, 13-inch, Mid 2014)

  • 2.6 GHz Intel Core i5 (4278U)
  • 8 GB 1600 MHz DDR3 RAM

Runtimes measured with time utility.

Runtime comparison for the generator scripts.

As you can see all across the times, the performance differs drastically. This is by no means a shocking or unexpected result, but it matters for the rest of the discussion.

Pedagogical Discussion

This post ultimately is about teaching and learning, so let’s finally talk about why any comprehensive course[F1] on computer programming must cover some basics of C language.

First, C is a compiled language. While the intricacies of compiling as a process lie beyond the introduction level, the acknowledgement of compilation as a step in a lifecycle of a program is critical. Virtually anything that has to do with computer programming in its broadest definitions can benefit from a better understanding of the full picture. As an example I can bring up a recent workplace story, where as we discovered certain business logic scripts where ultimately compiled into SQL statements. When the underlying tables changed, SQL statements became invalid, while the surface level logic remained perfectly sound. Thus, it took a bit of tinkering around to find out that in fact we had to trigger a re-compile for the SQL to become valid again. Hence, if you have a better knowledge of the full picture, then your bug fixing abilities are also better.

Second, C has great performance metrics. As the first part of this story shows, C does in fact yield quiet great performance in its vanilla form. Of course you have to be mindful of your project scope. In terms of over-optimization failures C is probably at the top of the list in close competition with C++. Just think of all the linked list implementations ever written in C. Now, think of all double linked list implementations, and FFT implementations, and Dijkstra’s algorithm implementations, and so on ad nauseam. Writing code in C oftentimes feels like re-inventing the wheel. In fair part because it is. However, when the task at hand boils down to arguably simple arithmetic operations that need to performed at medium scale, writing it up in C is probably the best bet.

Third, C is ubiquitous (unless you are on Windows). If you have *nix system it comes with either gcc or clang or some other form of C compiler. No need to download a myriad of heavy IDEs and weird things. To be fair the same can be said about vanilla Python, which in part is why I love it so much (but still use Anaconda).

Fourth, C builds discipline and rigor. I am not talking about painstaking debugging of memory leaks and segfaults. I am not talking about arcane magic of pointer arithmetic. Those things are clearly important, but I am talking about very very basic C. You need to declare variable types. You need to allocate stuff ahead of time. You need to keep track of what moving parts you’ve got in the game. These things amount to cleaner code and better style. You have to at least minimally organize your thoughts before writing in C. Hopefully, that generalizes to the same concept for all of the code you will write.

Finally, C is just another programming language. I firmly believe that breadth of programming knowledge is equally if not more important than depth for about 80% of people who will ever write code. In the job market it is hard to guarantee that your amazing knowledge of C++ will always be equally demanded. In the academia you might realize that the lab you just joined does things in Perl. You can probably still write half of your code in Java, but then you need to interface to the rest of the Perl codebase… You get the general idea. On the other hand, “the jack of all trades, but master of none” kind of programmer will be more likely to pick up a new language from the docs, because it is needed. In this regard C serves as a good training ground for general programming principles.

Hence, in the end we have a performant language that exposes you to some fundamental programming concepts and builds up a better coding discipline.


  1. S. K. Park, K. W. Miller. Random number generators: good ones are hard to find. Comm. ACM (31):10, pp. 1192-1201, 1988. [DOI: 10.1145/63039.63042]
  2. P. A. W. Lewis, A. S. Goodman, J. M. Miller. A pseudo-random number generator for the System/360. IBM Systems Journal (8):2, pp. 136-146, 1969. [DOI:10.1147/sj.82.0136]
  3. G. E. P. Box, M. E. Muller. A Note on the Generation of Random Normal Deviates. Ann. Math. Statist. (29):2, pp. 610-611, 1958. [DOI: 10.1214/aoms/1177706645]


[F1] By comprehensive course I mean an academic to a calendar year long introductory sequence on computer programming. Examples would include any “Intro to Computer Science… n = 1, 2, 3, …” sequences, and any bootcamps that aim to teach you computer programming. I do agree that there are shorter courses that clearly cannot cover learning C. However, I would also argue that such courses by no means are comprehensive.


General definitions

From Wikipedia:

In mathematics, a set is a collection of distinct objects, considered as an object in its own right. For example, the numbers 2, 4, and 6 are distinct objects when considered separately, but when they are considered collectively they form a single set of size three, written {2, 4, 6}. The concept of a set is one of the most fundamental in mathematics. Developed at the end of the 19th century, set theory is now a ubiquitous part of mathematics, and can be used as a foundation from which nearly all of mathematics can be derived. In mathematics education, elementary topics from set theory such as Venn diagrams are taught at a young age, while more advanced concepts are taught as part of a university degree.

To break this down into simpler terms there are two important aspects of what constitutes a set:

  1. A set is a collection of distinct objects.
  2. A set itself constitutes an object, i.e. we can think of it as a tangible collection.

An example of a set can be pizza offerings at Giordano’s (a pizzeria in Chicago). This set contains distinct elements: Pepperoni pizza, Supreme pizza, Goat cheese and spinach pizza, Italian sausage pizza, Margherita pizza; and is in itself an object: a pizza menu.

The code below illustrates how we can declare a set in Python.

my_set = {0,3,4,0,7,9,13,35,0}
{0, 3, 4, 35, 7, 9, 13}

We can see that in fact, even if we declared some non-distinct (i.e. repeated) elements, the set doesn’t contain them, as evidenced by the print() function.

Set Membership and Subsets

Given an object and a set we can test whether this object belongs to the given set. This is a check for set membership. We can also verify if an object does not belong to a set.

Given a set $A$ and an object $x$, we use the notation $x\in A$ to denote that $x$ is an element of $A$. We also use notation $x\notin A$ to denote that $x$ is not an element of $A$.

The code below illustrates how we can test these conditions in Python.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
print(3 in A)                         # Will print True, because 3 is an element of A.
print(6 in A)                         # Will print False, because 6 is not an element of A.
print(7 not in A)                     # Will print False, because 7 is an element of A.    (note the use of "not")
print(8 not in A)                     # Will print True, because 8 is not an element of A. (note the use of "not")

Another important relation is that of being a subset. If membership is a relation between an object and a set, then being a subset is a relation between two sets. Namely we say that $B$ is a subset of $A$, denoted $B\subseteq A$, if every element of $B$ is also an element of $A$. We also will say that $B$ is a proper subset of $A$, denoted $B\subset A$, if every element of $B$ is also an element of $A$, but there are elements in $A$ that are not in $B$.

The code below illustrates how we can test these relations in Python, and provides some examples of subsets and proper subsets.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 3, 5, 7}                      # B is the set of prime numbers less than 10.

print(B.issubset(A))                  # Check if B is a subset of A. Will print True.
print(B <= A)                         # Check if B is a subset of A. Will print True.
print(B < A)                          # Check if B is a proper subset of A. Will print True,
                                      # since all elements of B are in A (subset condition),
                                      # but 11 is in A, and not in B (proper condition).
print(A.issubset(A))                  # Check if A is a subset of A. Will print True. 
print(A <= A)                         # Check if A is a subset of A. Will print True.
print(A < A)                          # Check if A is a proper subset of A. Will print False,
                                      # since all elements of A are in A.
                                      # Note: a set is always a subset of itself.

Set Operations

Now, let us take a look at some common set operations. As many things in mathematics, these concepts can become more natural if visualized. Hence, let us briefly introduce the idea of Venn diagrams.

A Venn diagram is a schematic representation of a set and its possible relations with other sets. We usually will use (possibly misshapen) circles to denote the “set” and colors or the elements itself to denote the elements of this set. The few examples below will illustrate this idea.

Venn diagram of letters

Venn's four ellipses diagram

Set Union

The first set operation we will look at is set union. We can think of it as addition for the sets. The result of a set union is the set containing elements that appear in either of the sets. The following Venn diagram shows in red the union of sets $A$ and $B$, denoted $A\cup B$.

A union B

We can compute a union of two sets in Python by using the union method or by using | operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A | B
{2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.union(B)
{2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
# Note that any possible "overlapping" elements will be only accounted for once,
# and thus the result will be a set (elements are distinct). This can be seen from
# the Venn diagram (the intersecting region is covered once) and from the example
# below.
A = {4, 5, 6, 7, 8, 9}
B = {6, 7, 8, 9, 10, 11}
C = A | B
{4, 5, 6, 7, 8, 9, 10, 11}

Set Intersection

The next operation is set intersection. A set intersection is a set (possibly an empty one) that contains elements that appear in both sets. In other words, intersection is the overlap of the original sets. The following diagram shows the intersection of sets $A$ and $B$, denoted $A\cap B$.

A intersection B

We can compute an intersection of two sets in Python by using the intersection method or by using & operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A & B
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.intersection(B)
# Only the elements present in BOTH sets get into the intersection. Thus in some
# cases the intersection can be empty. A Venn diagram for thsi case would be two
# non-overlapping circles.
A = {4, 5, 6, 7, 8, 9}
B = {10, 11, 12, 13, 14}
C = A & B

Set Difference

Next operation we will look at is the set difference. It is useful to know which elements belong to one set, but not the other. The set difference is a set that contains elements from the first set, but not the second one. The following diagram shows the difference of sets $A$ and $B$, denoted $A – B$ or $A\setminus B$.

A difference B

We can compute a difference between two sets in Python by using the difference method or by using - operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A - B
{3, 5, 7, 11, 13, 17, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.difference(B)
{3, 5, 7, 11, 13, 17, 19}
# Note that just like the difference of two numbers depends on the order, the difference
# of two sets also depends on which one we want to subtract from. The example below
# illustrates this idea.
A = {4, 5, 6, 7, 8, 9}
B = {7, 8, 9, 10, 11}
C = A - B
D = B - A
print("A - B is {}".format(C))
print("B - A is {}".format(D))
A - B is {4, 5, 6}
B - A is {10, 11}

Set Symmetric Difference

The last set operation we will talk about is the symmetric difference. There are several ways you can think about the symmetric difference, but all of them encapsulate the same idea. We want to have a set that has elements that appear in either $A$ or $B$, but not in the both sets. Using the notation defined above we can write this as $(A\cup B) – (A\cap B)$ (the union/sum of the sets minus their intersection) or alternatively as $(A – B) \cup (B – A)$ (the $A$ without $B$ union $B$ without $A$). The following diagram shows the symmetric difference of sets $A$ and $B$, denoted $A \Delta B$.

A symmetric difference B

We can compute the symmetric difference between two sets in Python by using the symmetric_difference method or by using ^ operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A ^ B
{3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.symmetric_difference(B)
{3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
# The symmetric difference, unlike the regular set difference is symmetric. 
# Which means that the order of sets does not matter, the result will be
# the same, as illustrated by code below.
A = {4, 5, 6, 7, 8, 9}
B = {6, 7, 8, 9, 10, 11}
C = A ^ B
D = B ^ A
{4, 5, 10, 11}
{4, 5, 10, 11}

Additional remarks

Just like arithmetic operations are defined using two numbers, but can be extended to lengthier expressions, the set operation can be applied to multiple sets. In Python the easiest way to encapsulate this concept is using the set operations |, &, -, ^ and the appropriate () to group those operations. Examples below illustrate this idea.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = {3, 5, 7, 9, 11, 13, 15, 17, 19}  # C is the set of odd numbers > 1 and < 20.
D = {3, 6, 9, 12, 15, 18}             # D is the set of numbers divisble by 3, > 0 and < 20. 
# This set is the union of C and D minus A. 
# Effectively it will contain numbers divisible by 3 or odd, that lie between 1 and 20,
# but will not contain the numbers that are prime.
E = (C | D) - A 
{6, 9, 12, 15, 18}
# This set is a symmetric difference of A, B and difference between
# C and D. 
# Effectively it will contain numbers that are either prime, or even, or
# odd, but not divisible by 3. However, it will not contain numbers that satisfy
# more than two of those conditions at the same time (i.e. it won't contain 11,
# since it is both prime and not divisible by 3).
F = A ^ B ^ (C - D)
{3, 4, 6, 8, 10, 12, 14, 16, 18}

Use Cases

We went through the trouble of learning the definitions for basic set arithmetic (the Operations section) and membership and subset relations (the Membership and Subsets section), so now is a good time to present some use cases for these structures and operations.

Besides being an essential building block in modern mathematics, sets often present a highly convenient data structure in programming. The examples below will guide you through some useful applications of sets in programming. Some of these examples are inspired by real production code.

Filtering down unwieldy lists

Sometimes we are faced with the problem of filtering a rather large list to only show unique values. A few common examples include the following:

  1. Identifying the unique caller IDs for a large list of phone calls.
  2. Identifying categories of the items carried by a store from the full inventory list.

Below we will address both of the problems by leveraging the property that a set contains distinct elements, and hence will effectively filter out only the unique elements.

# Problem 1.
# ----------
# Write a function that takes in a list of phone numbers (as strings),
# and returns a list containing the unique phone numbers from the original
# list.
# Input: list of phone numbers.
# Output: list of unique phone numbers.
def phone_id_unique(numbers):
    unique_numbers_set = set(numbers)
    unique_numbers_list = list(unique_numbers_set)
    return unique_numbers_list
# Problem 1.
# ----------
# Tests:
# 1. Input:  ["800-000-0000" repeated 1 000 000 times]
#    Output: ["800-000-0000"]
test_input = ["800-000-0000"] * 1000000

# 2. Input:  ["800-100-0000" repeated 1 000 000 times, "800-200-0000" repeated 1 000 000 times, ...,
#             "800-900-0000" repeated 1 000 000 times]
#    Output: ["800-000-0000", "800-100-0000", ..., "800-900-0000"]
test_input = []
for i in range(1, 10):
    test_input = test_input + ["800-{}00-0000".format(i)] * 1000000

# 3. Input: ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
#    Output: ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
test_input = ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
['800-200-0000', '800-500-0000', '800-700-0000', '800-300-0000', '800-600-0000', '800-400-0000', '800-800-0000', '800-100-0000', '800-900-0000']
['800-020-0000', '800-000-0000', '800-010-0000', '800-030-0000']
# Problem 2.
# ----------
# Write a function that takes in a list of store carried product (as dictioanries),
# and returns a list containing the product categories that appear in the original
# list.
# Input: list of items.
# Output: list of product categories.
def product_categories(items):
    categories_set = set([item["category"] for item in items])
    categories = list(categories_set)
    return categories
# Before testing we will load some data from .csv files. These files should be put into
# the same directory as the notebook. CSV stands for comma-separated values, and is a 
# common standard for representing data in text format.
items = []

import csv
with open("data_produce.csv", "r") as f:
    reader = csv.reader(f, delimiter = ",")
    for line in reader:
        items.append({"id": line[0], 
                      "category": line[1],
                      "stock": line[2],
                      "price": line[3]})

# Tests:
# 1. Input:  [1000000 items from 8 categories]
#    Output: ["perishables", "water", "kitchen", "furniture", "electronics", "paper", "pantry", "misc"]
[' pantry', ' furniture', ' water', ' kitchen', ' perishables', ' paper', ' electronics', ' misc']

Implementing common logical operations

Mathematical logic and set arithmetic are tightly connected. This allows us to use set arithmetic to model common logical operations, which in turn can easily encapsulate some everyday tasks we want to perform with out data.

Set union is analogous to logical OR operation, set intersection to logical AND, and the symmetric difference is analogous to logical XOR (exclusive OR) operation. Thus, we can use these operations to translate common tasks into set operations. Let us look at some of the examples below.

# Problem 3.
# ----------
# Write a function that takes in a set of items on mom's shopping list,
# a set of items on dad's shopping list, a set of items already bought by
# mom, a set of items already bought by dad, and finally a set of items
# that are currently in the fridge. The output should be a consolidated
# shopping list, i.e. it should only include the items that are not in the
# fridge and are not yet bought. 
# Input: 5 sets of items as described above.
# Output: list of items that need to be procured.
def shopping_list_cons(mom_to_buy, dad_to_buy,
                       mom_bought, dad_bought,
    all_to_buy = (mom_to_buy | dad_to_buy)
    all_bought = (mom_bought | dad_bought)
    to_buy = list(all_to_buy - all_bought - in_fridge)
    return to_buy
# Problem 3.
# ----------
# Tests:
# 1. Input:  mom_to_buy = {"apples", "candy", "chicken", "beef"}
#            dad_to_buy = {"candy", "beef", "bread", "cola"}
#            mom_bought = {"bread", "biscuits"}
#            dad_bought = {"milk", "coffee"}
#            in_fridge  = {"eggs", "chicken"}
#    Output: ["apples", "candy", "beef", "cola"]
mom_to_buy = {"apples", "candy", "chicken", "beef"}
dad_to_buy = {"candy", "beef", "bread", "cola"}
mom_bought = {"bread", "biscuits"}
dad_bought = {"milk", "coffee"}
in_fridge  = {"eggs", "chicken"}
print(shopping_list_cons(mom_to_buy, dad_to_buy, mom_bought, dad_bought, in_fridge))
['apples', 'candy', 'beef', 'cola']

Here we take advantage of some set operations to solve the problem. First, we consolidate both “to buy” lists taking their union, thus ensuring that all items are accounted for and none are double counted. Then we consolidate the already bought items by taking another union. Finally, we take the difference between what we need to buy and what is already bought or already in the fridge.

In the next problem will take a look at some applications of the symmetric difference.

# Problem 4.
# ----------
# Students at Chicken Soup High-school are offered two
# options for Calculus classes. There is an "Intro to Calculus"
# class and "Calculus" class. Some students take only 
# the first class through their time in high-school,
# some only take the second class, by placing out of the
# first one, and finally some students take both classes
# as a sequence.
# At the end of each year, average for these classes performance
# is computed to evaluate effectiveness of instructors. 
# The average is computed according to a strange formula,
# because statistics and performance department of 
# Chicken Soup high loves hard to understand numbers.
# You are provided with the formula and the list of 
# students in each class and their grades. To protect
# students' privacy you are given unique StudentIDs.
# AVG = (avg grade for students who only took "Intro") +
#     + (avg grade for students who only took "Calculus") +
#     + 1.75 * (avg grade for students who took both)
# Write a function that takes in two dictionaries of 
# StudentIDs and grades and computes the average according
# to the given formula.
# Input: a dictionary for students who took "Intro to Calculus",
#        a dictionary for students who took "Calculus".
# Output: the average grade.
import numpy as np
def chicken_soup_high_avg(intro, calc):
    one_course = set(intro.keys()) ^ set(calc.keys())
    both_courses = set(intro.keys()) & set(calc.keys())
    one_grades = []
    both_grades = []
    for studentID in one_course:
    for studentID in both_courses:
    avg = np.mean(one_grades) + 1.75 * np.mean(both_grades)
    return avg
# Problem 4.
# ----------
# Tests:
# 1. Input: {"1": 4.0, "2": 3.75, "3": 3.4},
#           {"2": 3.8, "3": 3.0, "4": 4.0}
#    Output: 10.10
print("{:.2f}".format(chicken_soup_high_avg({"1": 4.0, "2": 3.75, "3": 3.4},
                                            {"2": 3.8, "3": 3.0, "4": 4.0})))

Further remarks

While not strictly necessary, sets can make several classic graph algorithms easier to write and explain. We will cover those in the graphs section.