Where Should Data Structures Be Taught?

At the welcome reception of Software Craftsmanship North America, I was bombarded with folks wanting to know more about the academy. These folks all agree that the universities are failing to produce relevant software developers and believe this is a much better way. They were asking a lot about what and how we would teach. I explained that we would teach software development from a Lean Craftsmanship perspective. We would also teach Computer Science and soft skills in the context of software development, as it should be. Here is a true story of what I mean.

In the mid-90s, I was working on a very large Smalltalk project at Nortel. One evening when I was about to leave, a young man who had just joined the project several weeks earlier after graduating college with a BS in Computer Science caught my attention. "Ken, can I show you what I've been working on?" Though I wanted to go home, I knew taking a few minutes to encourage this new guy was the right thing to do. "Sure," I said.

He proceeded to show me how he had implemented a binary tree in Smalltalk and had converted a piece of the application to use it because he had noticed that they have been using a linear search to find certain objects.

As he showed me the code, I exercised self-control before speaking... well, as much as I could muster. The code did not follow any of the conventional protocol of the well-established Smalltalk Collections. In order to make use of it, he changed a huge amount of code in the existing system when he could/should have merely changed a couple of references to class names (and isolated that further with some careful refactoring) had he employed the standard collection protocols to exploit the concept of polymorphism. But polymorphism wasn't taught in his Computer Science courses. So, I gently pointed out to him that it would be better to follow the traditional protocol.

Since the young man wasn't on my sub-team, I asked him whether the other more senior people on his team had seen the code he was writing. I was fearful that these "senior" people either didn't know what the young man was doing or, worse yet, that the more senior people didn't have very good practices either. He said, "not yet."

I asked him whether they had assigned him this task, and he told me that they just told him to look over the code (for about 4 weeks without supervision/explanation?!?) and, when he did, he noticed the "sin" of a linear search being used (as it was implemented in the standard Smalltalk OrderedCollection class). I asked, "Was there a particular performance problem in this area of the application?".

He responded, "It was doing a linear search and a binary search is much faster."

I said, "I know that, but was there a particular performance problem?"

He again responded, "They were doing a linear search!" speaking with a bit of annoyance that the respected Smalltalk guru didn't seem to understand the horrors of linear searching.

I said, "How many objects were in the particular collections that you replaced with the binary tree? A few thousand? Tens of thousands?"

He said, "No. I think it was typically about 15 or 20. But the size doesn't really matter because a binary search is always better."

I asked, "Do you know how fast a computer can do a linear search through 15 or 20 objects?"

He stared at me, not knowing what to say.

I decided to turn this one back to the team leaders who let the kid waste 4 weeks of the company's money. "Why don't you talk to Lee tomorrow and see what he says. However, I think that if you want to keep the binary tree, you should make it more like the protocol of the other Smalltalk Collections, but I'm not sure you'll need it."

I went home shaking my head about how he had been taught the data structure theory well, but out of context of so many other important concepts of software development. He had been given the knowledge but no wisdom to guide him as to how, when, and why to use it.

Fast forward 5-ish years.

Nathaniel Talbott, RoleModel's first apprentice, is working with me on another project with Organon Teknika, whose team we are trying to teach how to apply Java and Extreme Programming. The project was fairly large and, though we rotated who paired with whom at least once/day, there were parts of the system I hadn't had my hands in for a while. During a stand-up meeting - where everyone at least checks in on what others are up to so a NOOB doesn't end up thrashing on his own for several weeks providing no value to the project - someone brought up that as we were adding more data, part of the system was taking a very long time to respond. Several others said they had noticed the same thing. I said that I had some guesses as to the source of the problem and I could go over my theories with whoever was interested after the stand-up.

Just about the entire development group decided to follow me to the whiteboard. Most of the development team had Computer Science degrees or some other form of college training besides Nathaniel. So, I started scribbling on the whiteboard and describing the data structure behind the scenes and how I suspected it was growing and copying elements from the old buffer to the new buffer. I pointed out that because the collection started out small, and was being added to one at a time in a linear fashion, I suspected that the result was a proliferation of creating new buffers and copying the elements to the new buffers from the old buffers. I also shared a couple of other theories I had about this particular section of code that would make it worse. Included in my theory was the idea that this was a sorted collection and the sort criteria was being applied in a fashion that added a relatively inefficient linear search. I then proceeded to point out some solutions to the problems that would avoid so much unnecessary processing. In the midst of the discussion, I mentioned that most collections in Java were simple collections of buffers, but you could use linked lists, or hashed collections at times. Nathaniel, who had been working with us for about a year said, "I've heard the term hash before, but I don't really know what it means." I then explained the concept, having just explained linked lists to him and the team a few minutes earlier.

This entire discussion took about 45 minutes. Most of the others in the room learned a lot, too. Though some of them had heard a lot of the theory I presented while they were in college, it was years earlier and out of context so they hadn't recognized the applicability of these concepts that had been tucked in the back of their minds. Now that I had just brought it back to the forefront, they were ready to apply it for the first time. I confirmed that they all understood what to look for and how to fix it if they found it. They all acknowledged their understanding, were dismissed, and went back to their desks... all but one.

Jeff Canna, one of the more experienced developers on the team, just sat at the end of the table shaking his head.

I asked, "what's the matter, Jeff, did I say something wrong?"

"No," he responded. "I was just marveling that you just taught an entire semester of data structures in less than an hour and the kid absorbed it all."

I objected, "I didn't teach the entire data structures course..." I then paused to reflect on whether my reactionary statement was accurate. I then realized, "Well, I guess the parts I left out were the exercises we were taught in the class that were out of context. He's been using the data structures for more than a year very effectively and learning how to write good clean code using them without knowing the theory behind the structures and algorithms supporting them. I guess this was the first time he really needed to know... just in time learning."

At that moment, I was reminded of the earlier newbie who had been taught all of the theory first.

I was thankful that I hadn't paid a lot of money for a college grad with a Computer Science degree, but had invested in an apprentice who was a less expensive and more productive developer at the ripe age of 19, and who already had a year of experience with object-oriented and extreme programming practices as well as learning how to work well with a team.

Object-oriented programming, design patterns, extreme programming, and other agile practices are not taught at any depth, if mentioned at all, in just about any college in the country. Almost no colleges offer "Working With A Software Team 101". One person at the reception said that a college professor he knew said that test-driven-development shouldn't be taught at the university because it is just a passing fad. At least that professor has heard of it. I know others that haven't. But at least they all teach data structures.

Where should data structures be taught? In college, or in context?

Photo by Fatos Bytyqi on Unsplash