CSC 153: Computer Science Fundamentals Grinnell College Spring, 2005 Laboratory Exercise Reading

# Time and Space Complexity

## Summary

This reading introduces some principles of algorithm effectiveness, including the amount of time and memory required for the algorithm. Big-O notation is introduced to provide an informal measure of the time or space required by an algorithm. These ideas are applied to the linear and binary search algorithms, discussed in the lab on searching.

## Factors related to Algorithm Effectiveness

In considering the solution to a problem, it is natural to ask how effective that solution might be. Also, when comparing solutions, one might wonder if one solution were better than another. Altogether, one might use many criteria to evaluate such solutions, including:

• Accuracy: Of course, any program should produce correct answers. (If we were satisfied with wrong results, it is trivial to produce many such answers very quickly.) However, it may not be immediately clear just how accurate results should be in a specific instance. For example, one algorithm may be simple to program and may run quickly, but it only may be accurate to 5 decimal places. A second algorithm may be more complex and much slower, but may give 15 place accuracy. If 5-place accuracy is adequate for a specific application, the first algorithm is the better choice. However, if 10 or 12 place accuracy is required, the slower algorithm must be used.

• Efficiency: Efficiency can be measured in many ways: programmer time, algorithm execution time, memory used, and so on. If a program is to be used once, then programmer time may be a major consideration, and a simple algorithm might be preferred. If a program is to be used many times, however, then it may be worth spending more development time with a complex algorithm, so the procedure will run very quickly.

• Use of Memory: One algorithm may require more computer memory in which to execute. If space is a scarce resource, then the amount of space an algorithm requires should be taken into consideration when comparing algorithms.

• Ease of Modification: It is common practice to modify old programs to solve new problems. A very obscure algorithm that is difficult to understand, therefore. is usually less desirable than one which can be easily read and modified.
For this laboratory exercise, we focus on algorithm execution time.

## Algorithm Execution Time

In determining algorithm execution time, we may proceed in several ways:
• We may time how long an algorithm takes on a specific machine.
• We may analyze the algorithm at a detailed level to determine how many instructions a computer must execute to solve the problem.
• We may analyze the algorithm at a high level to approximate time factors, independent of a specific machine.
Each of these approaches has advantages, but each also has drawbacks. Execution times on a specific machine normally depend upon details of the machine and on the specific data used. Timings may vary from data set to data set and from machine to machine, so experiments from one machine and one data set may not be very helpful in general.

The analysis of instructions may take into account the nature of the data −− for example, one might consider what happens in a worst case. Also, such analysis commonly is based on the size of the data being processed −− the number of items or how large or small the data are. This is sometimes called a microanalysis of program execution. Once again, however, the specific instructions may vary from machine to machine, and detailed conclusions from one machine may not apply to another.

A high-level analysis may identify types of activities performed, without considering exact timings of instructions. This is sometimes called a macroanalysis of program execution. This can give a helpful overall assessment of an algorithm, based on the size of the data. However, such an analysis cannot show fine variations among algorithms or machines.

For many purposes, it turns out than a high-level analysis provides adequate information to compare algorithms. For the most part, we follow that approach here.

## Analysis: Linear Search

Consider a simple linear search of an array a for a specific item. A typical code segment follows:
```
int[] a = new int [arraySize];

...

// linear search algorithm
j = 0;
while (j < a.length && item != a[j])
j++;
result = (j != a.length);

```

In executing this code, the machine first initializes j, then the machine goes through the loop (perhaps t times), and finally computes a result. In working through the loop, the condition (j < a.length && item != a[j]) occurs each time and once at the end (t+1 times), and the variable j is incremented t times. Putting all of this together, the amount of work is:

1. initialization (once)
2. checking of loop condition (t+1 times)
3. incrementing j (t times)
4. final computation of result (once)

Of course, the amount of time for each action varies from one machine to another. However, suppose that A is the time for initialization, C is the time for checking the loop condition once, I is the amount of time for incrementing i once, and F is the time required for the final computation. Then, the total time for the computation will be:

Overall time = A + (t+1)C + tI + F = t(C+I) + (A+C+F)

Next, suppose the array contains N elements. How many times might we expect to go through the loop? That is, what is a reasonable estimate for t?

If the desired item is not in the array, the answer is easy. We must go through all elements of the array before concluding item is not in the array, and t = N. If item is in the array, we might be lucky and find it at the beginning of the search, or we might be unlucky and find it at the very end. On average, we might expect to about half way through the array. This analysis gives rise to three alternatives:

• Best Case: t = 1
• Worst Case: t = N
• Average Case: t = N/2

In practice, it is rarely realistic to hope for the best case, and computer scientists tend not to spend much time analyzing this possibility. The average case often is of interest, but is sometimes hard to estimate. Thus, computer scientists often focus on the worst case. The worst case gives a pessimistic, but possible, view, and usually it is relatively easy to identify. In this case, the average case and worst case analyses have similar forms, although the constants are different:

• Worst Case: Overall time = N(C+I) + (A+C+F)
• Average Case: Overall time = N(C+I)/2 + (A+C+F)

In a microanalysis, we now could substitute specific values for the various constants to describe the precise amount of time required for the linear search on a specific machine. While this might be helpful for a specific environment, we would have to redo the analysis for each new machine (and compiler). Instead, we take a more conceptual view. The key points of these expressions are that they represent lines −− a linear relationship between overall time and the size N of the array: Also, for relatively large values of N (i.e., for large arrays), the initial constants A+C+F will have relatively little effect. We can summarize this qualitative analysis by indicating that the overall time is approximately constant * N. As the constant depends on details of a machine and compiler, we focus on this dominant term (ignoring constants), and we say the linear search has order N, written O(N).

## Analysis: Binary Search

During a recent class discussion, we developed code to search for an item in an array using a binary search. What follows is one possible version of this code:

```
// binary search algorithm
lo = 0;
hi = a.length;
mid = (hi + lo)/2;
result = false;
while (!result && lo < hi) {
if (a[mid] == item)
result = true;
else if (a[mid] < item)
lo = mid + 1;
else hi = mid;
mid = (hi + lo)/2;
}

```

As for the linear search, we would like to estimate the work involved to locate an item in array a, which we will assume has size N. This code allows somewhat more variety than the linear search, as the work within the loop involves several options (either of two conditions could be true or false, and various assignments could result). Thus, we will need some averages about the work needed at various stages. Suppose I is the time for initialization, C is the time for checking the loop condition once, and L is an average time required to execute once the if statements in the body of the loop. Suppose also that the loop is executed t times. Then, the total time for the computation will be:

I + (t+1)C + tL = t(C+L) + (I+C)

While this provides a good start for the analysis, we need some additional study to determine how t relates to the array size N. Here, we might be lucky and find the desired item on the first test, but that seems unlikely, and we ignore that possibility. Also, an average-case analysis is a bit tricky here, so we focus on the worst-case. In the binary search, we start by considering the entire array −− of size N. After one step, we have checked the middle of this array, determined which half the item might be in, and restricted our search to that half. After the second step, we have checked the middle of this half, and restricted the search to half of the half −− or a quarter of the array. More generally, at each stage, the size of the array segment under consideration is halved again. This progression of sizes is shown in the following table:

Step number Size of Array Still
Under Consideration
0 N
1 N/2 = N/21
2 N/4 = N/22
3 N/8 = N/23
...
t N/2t

The process continues, until there is nothing left to search. That is, the size of the array under consideration should be less than 1, or N/2t < 1. This will happen when N is about 2t. Solving for t gives t = log2N. Plugging this into the above equation gives:

Overall time = log2N(C+L) + (I+C)

As before, a macroanalysis ignores proportionality constants from a microanalysis: differences from machine to machine may change a proportionality constant, not the nature of the main terms. As we suggested informally before, the order of an algorithm is the amount of time required to execute an algorithm, ignoring the proportionality constants. In this case, we say a binary search has order log2N, written O(log2N). The overall shape of the curve depends on the nature of the logarithm function, and a rough graph follows: While this analysis may seem rough, it still can provide some useful insights. For example, the function log2N increases by only 1 if N doubles. Applying this to the above estimate of overall time for the binary search, if the size of an array doubles, then we would expect the time for a binary search to increase only by a small, constant amount (C+L in the above formula).

## More Experimentation

Program ~walker/java/examples/searching/searchTest.java provides a framework for timing the linear and binary search algorithms, as described above. This program illustrates the use of a timing method System.currentTimeMillis(), which returns a time in milliseconds. As the algorithms run very quickly, the program repeats each search 1000 times, so timing measurements in milliseconds will yield appropriate numbers.

The program asks the user to set the minimum and maximum array sizes to be tested, as well as the number of trials to be tested at each array size. Program execution then picks elements at random, applies the search algorithms, and reports the timings. After arrays of one size are tested, the array size is doubled, and the process repeats.

This document is available on the World Wide Web as

```http://www.cs.grinnell.edu/~walker/courses/153.sp05/readings/reading-complexity.shtml
```

 created January 14, 1998 last revised March 24, 2005  For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.