Intel's Teach Parallel @ ISNTV

Home

Introduction
Experiments
Conclusion
The code
Video

Introduction

In the Multicore programming course here at Universidad de Concepción, some of us wanted to share experiences with other people regarding parallel programming, particularly in the classroom. After a few calls back and forth, Paul Steinberg and Jen Teal of Intel suggested that we met with Tom Murphy of Contra Costa College in California to discuss a way to open the channels of communication between the two institutions. The four of us thought it would be a good idea to ask students to solve a given problem using parallelism, and then discuss each other's results live during an ISNTV "Teach Parallel" show. Tom chose Project Euler's fourth problem, reproduced here:

Problem 4. 16 November 2001. A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 99. Find the largest palindrome made from the product of two 3-digit numbers.

However, to make things interesting for parallelism, he suggested we increased the search space from three digits to five digits.

Here is some extra material we thought would be interesting to have for the discussion with Tom and Tom's students, together with a little writeup of our solutions.

The solutions were proposed mainly by two students: Juan Olate and Donald Inostroza, with a bit of coaching from Erick Elejalde, José Fuentes, Michel Dumontier (from Carleton University's Biology Department) and myself, but all ideas, experiments, proofs and conclusions are theirs.

Experiments

Since the problem is about finding the largest palindrome number which is the result of multiplying 2 other numbers of an arbitrary number of digits, we can represent our solution space as a matrix of all the possible results of multiplying 2 numbers in the specified range (Like the incomplete matrix of Table 1). Taking this into account, we present 2 different algorithms to solve this problem:

The naïve solution

In algorithm analysis we always need some baseline. Our baseline was simply multiplying all numbers in the 5-digit space (100000x100000 matrix) and assign this to a single thread. We called this the "NaiveSeq" algorithm. This algorithm is very simple and it just calculates all numbers in the matrix. After calculating each number, it will decide if its a palindrome and in the case it is, it will store it only if it's bigger than the last palindrome stored. The sequential version of this algorithm will just calculate all numbers in a determined order (for example, for a 3-Digit problem the order could be: 100*100, 100*101, 100*102, ..., 101*100, 101*101, ..., 999*100, 999*101, ..., 999*999) and keep the highest palindrome result stored. The parallel version of this algorithm will assign a subset, equal in amount, of the solution space to each thread to be solved separately, after all threads are done each will have stored the highest palindrome found for that particular subset of the solution space. We then compare the different palindromes and keep the highest. We could easily improve the naive solution to only search within half of the total search space by cutting off all numbers which are the result of a previously calculated number combination. For example, for the 3-Digit problem it is the same to calculate 100*101 than to calculate 101*100, so adding just a few tweaks to the previous naive solution we can cut the total solution space to half, so that we do not repeat a calculation.

An alternative solution

By observing the matrix in the Table 1,

Table 1

you will notice that the highest values are at the top of the matrix and that the lowest values are at the bottom of the matrix. We can use this information to try and search palindromes from highest result values to lowest result values. If we could find a way to multiply numbers so that we ensure that we will always get the highest possible value, we would only need to search till the first palindrome is found, and then discard all the rest of the numbers because none of them will be higher. This is not easy to accomplish and it is not always clear when a multiplication will yield a higher number than another multiplication with different operands. For instance, it is very obvious that 20*20 will yield a higher result than 19*20, because we have only reduced one of the operands. On the other hand, it is not obvious if 15*10 or 11*14 will give a higher result without doing the multiplication (15*10=150 and 11*14=154). Still, we found out some properties about the multiplication matrix that helped us build a faster algorithm. If we look at the matrix figure, we will notice that the number 165 is higher than all the grayed out numbers. It is trivial that the numbers contained to the left and above 165 will be smaller, because all the operands for those numbers will have to be inevitably smaller or equal to 15 and 11 (operands of number 165). What is not obvious is that the grayed numbers contained above the diagonal to the right will also be smaller than 165. The math proof for this property is shown below

Proof 1

With this information, we can then search from top to bottom, and from right to left, till we find a palindrome. Once we find one, we can eliminate from our search space all the numbers which we know will be smaller than our actual palindrome solution, thus reducing the search space dramatically. If, for example, 165 was a palindrome, then all the grayed out numbers would be discarded from the search space and we would only have few numbers to keep searching through. In the example matrix the search order would be: 20*20, 19*20, 19*19, 18*20, 18*19, 18*18, and so on. The first palindrome found in that matrix would be the number 323 (17*19) and the only remaining number to be searched would be 340 (17*20), discarding all the remaining search space. The parallel approach to this solution would be to assign different rows of the matrix to different threads. For example, if we had 2 threads to solve the problem represented by the matrix in the figure, we could assign red rows to one thread and green rows to another thread. Any thread that finds a palindrome will store it in a global variable, if its higher than the previously stored palindrome, and every thread will check what is the value of the palindrome in this global variable before starting a new row so that it knows which portion of the search space is already discarded. In the the example matrix, if 165 were a palindrome found by the red thread, then both threads would know that all the gray area is now to be avoided when searching.

Results

Graph 1

Graph 2

Conclusion

For all algorithms and for any problem which size is considerable, we get an experimentally measurable speed up of around 3x for a machine with 8 cores (4 physical cores with hyperthreading). This implies that the bigger the problem, the more the time we save. For this particular machine, a problem which takes 3 minutes with the sequential solution, the parallel solution will take about 1 minute, saving 2 minutes, but if the problem takes 30 minutes, then the parallel solution will take 10 minutes, which is a considerable reduction in effective waiting time. This also means that we will be using a higher percentage of our machine's total CPU resources to get the work done in a shorter time, which should be the goal of parallel programming.

Comparing the naive and fast algorithm solutions, we could notice that in both the speedup is of around 3x, so parallelism does not only work well for the naive inefficient solution, which we are unlikely to implement, it can also work on improved solutions, such as the alternative fast algorithm we presented. As long as the algorithms keeps a high degree of data independence, which happens for this particular palindrome finding problem, we can build a parallel solution with a considerable speedup.

The main conclusion to be drawn is that for good enough algorithms, parallelism kicks in when the problem is large enough, but not before.

The code

You can download our experimental code from here. If you have any suggestions, please email me at lferres, followed by the "at" sign, followed by inf.udec.cl.

Video

The TV show aired on Oct 25, 2011 an was uploaded on Nov 1, 2011. You can watch it here. Unfortunately, the sound is not that good. We have written a transcript of the show, to make it more accessible. We will be providing subtitles and a Spanish translation as soon as we get some time off our other activities.


[ Updated: Thu Nov 24 15:23:39 2011 ]