Teaching kids to code: I’m a computer scientist and I think it teaches kids important skills

Good afternoon world!

To those of you who know it doesn’t come as a surprise that I care a lot about teaching, in particular mathematics and computer science. Recently while browsing r/programming I came across an article that gives a perspective of a software developer on why you should not teach your kids how to code. I think the article brings up several good points, but is not quite complete and draws a conclusion with which I thoroughly disagree. Thus, I have decided to present my take on the issue, and point out a few nuances which I think are important to consider when making claims about usefulness of teaching coding.

Why is teaching coding a bad idea?

First of all, I need to say that the article is well written and definitely tackles some major issues with the hype around learning how to code. However, I would like to take a closer look at the following paragraph.

A former co-worker of mine was trained at a coding boot camp with the motto “Coding Is the New Literacy”. That sentiment is at the heart of all the programming books and games. The description on one popular book says starting coding early is “essential to prepare kids for the future”. This gives the impression that not teaching kids to code is somehow equivalent to not teaching them to read.

Joe Morgan, Slate.com

I think this is a valid criticism of aggressive advertisement of coding boot camps and resources target at children. However, the language of “new literacy” can be taken apart from a different angle. Just like we learn basic reading skills, we also learn basic mathematical reasoning at the young age. In many ways coding is a field that brings together the abstract mathematical reasoning and applied results. Hence, while the branding of “new literacy” is misleading, we can think of coding as a rather “old literacy” repackaged into a modern set of tools and scenarios.

In particular, a good amount of the critique presented in the article is aimed at learning syntax of particular language rather then general problem solving. I do agree that teaching coding, regardless of the age group in fact, should be aimed at cultivating problem solving skills and developing abstract thinking. However, the second point is exactly why I disagree with the article’s author. While we talk about problem solving a lot, it seems like a common pattern to think of teaching problem solving in the context of applied life skills: assembling furniture, cooking, playing with LEGO, etc. However, what many of these examples lack is the abstraction aspect that is essential to mathematics and computer science. Even the notion of something so natural as a natural number hides in itself a deep abstraction leap that is often taken for granted. When we think of the number 3, we are thinking of an abstract notion of count. Three apples are inherently different from three oranges or three chairs, but we are thinking of some abstract property that unites all of these examples, namely the count. The number 3 does not exist on its own, we can’t create a palpable 3, but we still are capable of thinking and knowing exactly what we mean by it.

Hence, mathematics and by a natural extension coding is not only about problem solving, drive and perseverance. It is also about abstract thinking, which is something that needs to be cultivated early on. I have encountered multiple people who struggle in college level proof-based mathematics classes, because the school system has failed at teaching them abstract thinking and reasoning to a satisfactory degree. I want to reiterate, that it is not a flaw within those people, and it is not some special quirk of mathematics as a subject. Anyone can learn mathematics, and everyone should learn some basic skills from it. The most valuable skill being exactly the power of abstract thinking.

So what exactly is abstract thinking?

It is hard to define exactly what do I mean by abstract thinking, but there are a few common trends that occur throughout examples of it. First, there is a pattern recognition part of any abstraction. Namely, an abstract property arises often as a natural recognition of a pattern in observed world. For example, we can go back to counting example. We observe a certain natural property of the objects that surround us. They can appear in different quantities. One way of abstracting the idea of quantity is precisely counting. When we think of apples on the table, we can consider their individual volumes (another abstraction) or masses (abstraction again), but perhaps we can also consider the number of individual apples, i.e. their count. We recognize some pattern to our world, and create an abstract concept to reflect it.

Now, there is a second common trend, classification or identification of the equivalence classes of patterns. This sounds complicated and is probably peppered with strict mathematical definitions (listed in Appendix). However, I am arguing that in fact it is one of the most natural things that people do! This idea was brought to my attention by prof. Laszlo Babai during one of his lectures on discrete mathematics. We do notice same colors, and group things based on the color, without realizing that in fact we are identifying an equivalence class. We do recognize that three apples have the same count as three oranges, therefore identifying an equivalence class among things that can be counted, a class that we simply call 3. The same can be said about 5, and 7 and so on. We identify an abstract equivalence through observation of natural world.

The third commonality of abstractions is generalization or cross-applicability, if you wish. Once we develop an abstraction, we start noticing it new places, and realizing that the same logical process can be repeated and applied anew to a different scenario. First, let me tell you a classic joke.

Working late at night, a mathematician doesn’t notice how one of her candle tips over and sets some of the papers on the table on fire. Once she realizes what is going on, she rushes to the kitchen grabs a glass of water, pours it over the table and extinguishes the fire.

A few weeks later she has her friend over for an evening of whiskey and cigars. The careless friend throws a still lit match into the trashcan setting some papers in it on fire. The mathematician dumps out the flaming papers on the table, rushes to the kitchen for a glass of water, and then puts out the fire.

Her puzzled friend asks: “Why did you dump the papers on the table first?”

Mathematician replies: “Oh, I just reduced the problem, to the one I have solved two weeks ago!”

Folklore

This is a classical example of reducing to the problem previously solved, or if thought about slightly differently, recognizing the applicability of the same abstract pattern to a new case. In our apple arithmetic example, we can think of the following: we already realized the abstract notion of the numbers 3 and 5, and the pattern of them forming the number 8 when put together. Now, if we suddenly find ourselves with the same pattern for oranges, we already will know the answer 3 + 5 = 8. What helps us is the abstraction (the object doesn’t matter, the count does) and its generalization to any type of countable objects.

Thus, while not exactly answering what abstract thinking is I outline three important aspects of it, namely: pattern recognition, equivalence recognition, and generalization.

How does one teach kids to develop abstract thinking and what all of this has to do with coding?

We are incredibly lucky here, because a lot of basics of abstract thinking come to us for free as a perk of being human. Furthermore, a lot of basic children literature is already aimed at developing skills tied to pattern and equivalence recognition. The generalization of the abstractions on the other hand is not always common in early teaching, and is one of the important aspects of mastering abstract thinking. Kids would often struggle with basic algebra concepts, such as a variable or unknown. What is important is teaching them in a way that allows these notions to be genuinely recognizable as common patterns, that simply generalize the rules already learned. In that line of thought, a function f(x)=x+2 can be though of as the notion of taking some number of objects in a countable class and adding two more of the same object. Adding two apples to how many apples you already have (but recall, apples did not matter in the end).

So how does coding tie into this entire story? Well, coding in itself is full of abstractions, and therefore presents a rich playground for maturing the concepts and ideas of abstract thinking. However, unlike mathematics or physics, coding has a unique aspect to it that allows us to see practical implications of our abstract reasoning.

It is exciting to see how some words that you wrote turn into a circle on your screen, or a square, or a flag (more on that in the next post). However, it is also important that this exemplification allows kids to solidify and check their abstract reasoning. A for loop is an abstract construction that allows you to repeat an action some prescribed number of times. It is good to have an understanding of this abstraction, but solidifying it by seeing how changing the number of repetitions, changes the number of figures drawn is extra nice. It brings back that natural observation component to the abstract thinking, which should enable a young learner to thinking creatively and develop a new graspable connection between abstract generalized concepts and tangible everyday observations.

Conclsuion

Coding gives us an opportunity to learn abstract thinking while continuously supporting it with natural observations. In the similar way to cooking and tinkering with LEGO, we get a combination of ideas and observable consequences within one process. We should shift the aim of coding books and boot camps for children from “one true answer” syntax oriented problems, to thinking and skills oriented puzzles. The goal of such education is not to foster a particular skill in a programming language, but to create a thinker, who can notice patterns and effectively generalize them to new problems encountered.

We have an amazing tool that can easily grasp attention and provide a rich and exciting framework for learning. Instead of shunning it due to novelty or perceived shallowness, we should think about how we can use it to teach and learn what is truly fundamental: the abstract thinking!

Appendix

A little bit of dry math for the formal definitions to some of the stuff I have been talking about. Keep in mind that these highly formal and abstract definitions in fact tie back to our natural observations.

Definition. A binary relation R is defined as a set of ordered pairs (a,b) of elements a from a set A and b from a set B, i.e. a subset of A\times B. We say that two elements a,b are related, denoted aRb iff (a,b)\in R.

Definition. We call a binary relation R defined on the pair of sets A and A an equivalence relation iff it satisfies the following properties:

  1. Reflexivity: aRa
  2. Symmetry: aRb \implies bRa
  3. Transitivity: If aRb and bRc, then aRc.

Definition. Given a set A, an element of that set a and an equivalence relation on this set \sim. We call the set [a]=\{x\in A|x\sim a\} the equivalence class of the element a.

Definition. A partition of a set A is collection of disjoint sets B_1, B_2, ..., B_n s.t. their union equals A.

Theorem. The set of equivalence classes of A under an equivalence relation \sim is a partition.

Definition. We call the set of equivalence classes of a set A under an equivalence relation \sim a quotient set, and denote it by A/\sim.

Why any (comprehensive) course on computer programming should cover basics of C: A story.

Good evening world!

Recently I came across a small task that reinforced my belief in the importance of C programming. The task was the following:

  1. Generate 500,000,000 pseudo-random numbers using Linear Congruential Generator algorithm [1, 2].
  2. Use Box-Mueller transform [3] to get normally distributed random variables (RV).
  3. Compute the next step in the evolution of a deterministic system, and add noise using the generated normally distributed RV.
  4. Write the result of every 1000th step to a file.

In other words, we need to run a for-loop for 500,000,000 steps, doing some calculations (generating RV + evaluating deterministic function), and writing to a file once in 1000 steps.

This doesn’t sound particularly challenging, and the whole thing can be done in less than 80 lines of C code. Same task can also be done in about 45 lines of Python. However, LOC is not the metric I want to look at here. I want to talk about performance of the code written, and some general educational caveats.

Is the field set level?

Let’s talk a tiny bit about optimization, under-optimization and over-optimization here.

The moment I shared this story with my friend, he immediately said that the performance comparison doesn’t make sense. However, the argument provided was the following: if the Python code takes that much longer to run, it clearly was not written well. I agree, when I was coming up with the comparison I was not using any fancy race-track wheels for Python. The entire script I wrote is as vanilla Python, as one can possibly imagine. Does this mean that I am cheating by employing “bad” or “inefficient” Python coding practices? I would say no, or at worst, just a little bit.

In both cases: C and Python, I wrote a vanilla implementation of the given task. Hence, no parallelism, no non-standard libraries, no pre-compiled/hardware optimized shenanigans. Did I manage to cheat a bit? Yes, of course I did, I compiled my C code with -O3 optimization flag. This of course is not the full story either. I did run my Python script naively invoking python ./generate.py rather than trying to compile it into optimized binary and then running it. However, for all of these “sins” I have a quite simple answer: I don’t do that with Python 99.9% of the time. I do not compile my Python scripts. I do not roam the web for pre-compiled computational libraries for Python. I do not tend to care that much about performance in the first place, when I code in Python.

How is this a conversation about optimization then? Well, I think we need to consider several parameters to be optimized, and then checkout what we get in terms of the relative performance. Hence, I will be thinking about 3 metrics here:

  1. Human minutes spent writing code (including debugging).
  2. Human minutes spent waiting for the program to finish running.
  3. sys runtime of the programs written.

In the context of these 3 parameters I can clearly define what I mean by optimizing, over-optimizing and under-optimizing performance of a task.

Over-optimizing: This is the case when I will spend a lot of time writing code that supposedly is great in terms of wall and sys times. Not surprisingly majority of the over-optimization in my case does not come from the assembly injections leveraging latest microarchitecture features. When I over-optimize with probability 0.9 it is due to me finding a paper proposing a fast algorithm that I am trying to write from scratch. Clearly this brings a caveat: asymptotically better performance, does not always translate into cycle-count performance on small enough examples.

Optimizing: Once in a blue moon, when working on a one-off personal project, I do hit the right spot. Just enough of complexity in the implementation to get a good average for the runtimes. Any properly optimized code should be optimized both in terms of human minutes spent writing it, and human minutes spent waiting for the results. However, as with anything in the world of computer programming, or life at large, there is a caveat: optimization is context dependent. Spending more development hours over code that has to be reused on a regular basis is worth it, as long as the eventual benefit in runtime pays for it.

Under-optimizing: This is what happens when the deadline is far away. Hacking together 25 lines of your favorite vanilla high-level language, and letting it run overnight, because you still have a week of time left, and one run only takes 14 hours. Surprisingly, I think that from a practical perspective this is more justified than over-optimizing. If I had to choose between code that takes 14 hours to run, but gets the job done, and code that takes 12 hours to develop and only 2 to run, I might go for the first one, because at least I can sleep or read for those 12 hours of difference. However, the caveat here is simple: if you need to run the code more than once, the unoptimized runtime will cost you a lot.

Performance Analysis

I was compiling and running all code on my personal laptop. The specs are listed below.

MacBook Pro (Retina, 13-inch, Mid 2014)

  • 2.6 GHz Intel Core i5 (4278U)
  • 8 GB 1600 MHz DDR3 RAM

Runtimes measured with time utility.

Runtime comparison for the generator scripts.

As you can see all across the times, the performance differs drastically. This is by no means a shocking or unexpected result, but it matters for the rest of the discussion.

Pedagogical Discussion

This post ultimately is about teaching and learning, so let’s finally talk about why any comprehensive course[F1] on computer programming must cover some basics of C language.

First, C is a compiled language. While the intricacies of compiling as a process lie beyond the introduction level, the acknowledgement of compilation as a step in a lifecycle of a program is critical. Virtually anything that has to do with computer programming in its broadest definitions can benefit from a better understanding of the full picture. As an example I can bring up a recent workplace story, where as we discovered certain business logic scripts where ultimately compiled into SQL statements. When the underlying tables changed, SQL statements became invalid, while the surface level logic remained perfectly sound. Thus, it took a bit of tinkering around to find out that in fact we had to trigger a re-compile for the SQL to become valid again. Hence, if you have a better knowledge of the full picture, then your bug fixing abilities are also better.

Second, C has great performance metrics. As the first part of this story shows, C does in fact yield quiet great performance in its vanilla form. Of course you have to be mindful of your project scope. In terms of over-optimization failures C is probably at the top of the list in close competition with C++. Just think of all the linked list implementations ever written in C. Now, think of all double linked list implementations, and FFT implementations, and Dijkstra’s algorithm implementations, and so on ad nauseam. Writing code in C oftentimes feels like re-inventing the wheel. In fair part because it is. However, when the task at hand boils down to arguably simple arithmetic operations that need to performed at medium scale, writing it up in C is probably the best bet.

Third, C is ubiquitous (unless you are on Windows). If you have *nix system it comes with either gcc or clang or some other form of C compiler. No need to download a myriad of heavy IDEs and weird things. To be fair the same can be said about vanilla Python, which in part is why I love it so much (but still use Anaconda).

Fourth, C builds discipline and rigor. I am not talking about painstaking debugging of memory leaks and segfaults. I am not talking about arcane magic of pointer arithmetic. Those things are clearly important, but I am talking about very very basic C. You need to declare variable types. You need to allocate stuff ahead of time. You need to keep track of what moving parts you’ve got in the game. These things amount to cleaner code and better style. You have to at least minimally organize your thoughts before writing in C. Hopefully, that generalizes to the same concept for all of the code you will write.

Finally, C is just another programming language. I firmly believe that breadth of programming knowledge is equally if not more important than depth for about 80% of people who will ever write code. In the job market it is hard to guarantee that your amazing knowledge of C++ will always be equally demanded. In the academia you might realize that the lab you just joined does things in Perl. You can probably still write half of your code in Java, but then you need to interface to the rest of the Perl codebase… You get the general idea. On the other hand, “the jack of all trades, but master of none” kind of programmer will be more likely to pick up a new language from the docs, because it is needed. In this regard C serves as a good training ground for general programming principles.

Hence, in the end we have a performant language that exposes you to some fundamental programming concepts and builds up a better coding discipline.

References

  1. S. K. Park, K. W. Miller. Random number generators: good ones are hard to find. Comm. ACM (31):10, pp. 1192-1201, 1988. [DOI: 10.1145/63039.63042]
  2. P. A. W. Lewis, A. S. Goodman, J. M. Miller. A pseudo-random number generator for the System/360. IBM Systems Journal (8):2, pp. 136-146, 1969. [DOI:10.1147/sj.82.0136]
  3. G. E. P. Box, M. E. Muller. A Note on the Generation of Random Normal Deviates. Ann. Math. Statist. (29):2, pp. 610-611, 1958. [DOI: 10.1214/aoms/1177706645]

Footnotes

[F1] By comprehensive course I mean an academic to a calendar year long introductory sequence on computer programming. Examples would include any “Intro to Computer Science… n = 1, 2, 3, …” sequences, and any bootcamps that aim to teach you computer programming. I do agree that there are shorter courses that clearly cannot cover learning C. However, I would also argue that such courses by no means are comprehensive.