by Kevin Wayne, Robert Sedgewick
Welcome back. Today we’re going to do some math and some science. Not a lot, but we need to have a scientific basis for understanding the performance of our algorithms to properly deploy them in practise. So today we’re going to talk, about how to, observe performance characteristics of algorithms. We’re going to look at how to make mathematical models and how to classify algorithms according to the order of growth of their running time. We’ll talk a bit about the theory of algorithms and also how to analyze memory usage. So to put this all in perspective, we’re going to think about these issues from the point of view of different types of characters.
So the first one is the programmer who needs to solve a problem and get it working and get it deployed. Second one is the client who wants to use the whatever program did to get the job done. Third one is the theoretician, that’s somebody who really wants to understand what’s going on. And, and the last one is kind of a team, this basic blocking and tackling sometimes necessary to get, you know, all these things done. So, there’s a little bit of each one of these in today’s lecture.
And actually when you’re a student you have to think that you might be playing any or all of these roles some day.
So, it’s pretty important to understand the different points of view. So, the key that we’ll focus on is running time. And actually the idea of understanding the running time of a computation goes way back even to Babbage and probably before. And here’s a quote from Babbage, “As soon as an analytical engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise by what course of calculation can these results be arrived at by the machine in the shortest time”. If you look at Babbage’s machine called the analytic engine, it’s got a crank on it. And literally the concern that Babbage had in knowing how long a computation would take is, how m any times do we have to turn the crank. It’s, it’s not that different, in today’s world. The crank may be something electronic that’s happening a billion times a second. But still, we’re looking for, how many times does some discreet operation have to be performed in order to get a computation done.
So, there are lot of reasons to analyse algorithms. In the context of this course we are mainly interested in performance prediction. And we also want to compare the performance of different algorithms for the same task, and to be able to provide some guarantees on how well they perform. Along with this, is understanding some theoretical basis for how algorithms perform. But primarily, the practical reason that we want to be analyzing algorithms and understanding them is to avoid performance bugs. We want to have some confidence that our algorithms going to complete the job in the amount of time, that, that we think it will. And it’s very, very frequent to see, in today’s computational infrastructure, a situation where the client gets bad performance, because the programmer did not understand the performance characteristics of the algorithm. And today’s lecture is about trying to avoid that.
Now, we’re going to focus on performance and comparing algorithms in this course. There’s later courses in typical computer science curricula that have more information about the theoretical basis of algorithms and I’ll mention a little bit about that later on. But our focus is on being able to predict performance and comparing algorithms.
Now there’s a long list of success stories in designing algorithm with better performance in, in enabling the solution of problems that would otherwise not be solved. And I’ll just give a couple of examples. One of the first and most famous is the so called FFT algorithm. That’s an algorithm for breaking down the wave form of n samples of a signal into periodic components. And that’s at the basis for dvds and jpegs and, and many other applications. There’s an easy way to do it that takes time proportional to N^2. But the FFT algorithm, takes only N log N steps. And the difference between N log N and N^2 is, is the difference between being able to solve a large problem and not being able to solve it. A lot of the digital technology, digital media technology that we have today is enabled by that fast algorithm.
Another example was actually developed by Andrew Appel, who’s now the chair of computer science here at Princeton. And it was developed when he was an undergraduate for his senior thesis. It’s a fast algorithm for the N body simulation problem. The easy algorithm takes time proportional to N^2, but Appel’s algorithm was an N log N algorithm that again, meant that scientists can do N body simulation for huge values of N. And that enables new research.
S0, the challenge is that we usually face is, will my program be able to solve a large practical input? And, and actually, the working programmer is actually faced with that all the time. Why, why is my program running so slowly? Why does it run out of memory? And that’s faced programmers for a really long time and the insight to address this. Deuter Kanoof, in the 1970s, was that, we really can use the scientific method to understand the performance of algorithms in operation.
Maybe we’re not unlocking new secrets of the universe but, we can use the, scientific method, and treat the computer, as something to be studied in that way and come to an understanding of how our program are going to perform. And let’s take a look at that in more detail.
So this just a quick summary of what we mean by the scientific method, which has, been successful for a couple of centuries now. So, what we’re going to do is, observe from some feature of the natural world. In this case, it’s going to be the running time of our program on a computer. Then we’re going to develop hypothesis some model that’s consistent with the observations, and we’re going to hope that, that hypothesis is good enough that it’ll allow us to predict something.
Usually predict a running time for larger problem size, or on a different computer. And then we’ll verify the predictions by making more observations, and validate until we’re comfortable that our model hypothesis and observations all agree. That’s a way to get comfort that we understand the performance of our programs. Now, the within the scientific method, there’s some basic principles and the, the first is that if you’re going to run experiments, you should expect that somebody else should be able to run experiments and get the same result. And also the hypotheses have to have a specific property that the experiment can show the hypothesis to be wrong. So, it has to be carefully crafted, and we’ll be sure to try to do that. So, and again the future of the natural world that we’re studying is some particular computer that exists in the natural world. It changes the algorithm from an abstraction to a, some, some kind of actual physical thing happening like electrons racing around inside the computer.