Allegro CL 9.0 on Windows, Linux, and the Mac comes with a version which supports Symmetric Multiprocessing (SMP), which means that it can utilize more than one hardware processor at the same time. (On all supported platforms, a non-SMP version of Allegro CL 9.0 is also available.) This is a second of a series of Tech Corner articles which discuss aspects of SMP.
One feature of SMP is that it is difficult to do certain simple store and read operations on lists, like pop and push and certain read/modify/store operations, such as is done by incf and friends, and pushnew. All these operators work fine in SMP if the place they are operating on is read and stored by one process only, indeed most work fine if one process only stores. But as soon as multiple processes try to store, things can go wrong.
The problem is that operations like incf first read the current value in a place (such as the value of a variable), then increment it by 1, and then store the new value. If in the time that between the read and the store, another process also calls incf (or decf) on the same location, the operations of the two proocesses can get interleaved and the result is not what is expected. The following shows this. Assume the steps are ordered by the exact instant in time they occur:
1. Value of *var* is 6 2. Process 1 calls (incf *var*) 3. Process 2 calls (incf *var*) 4. Process 1 reads value, gets 6 5. Process 2 reads value, gets 6 6. Process 1 adds 1 to 6, gets 7 7. Process 2 adds 1 to 6, gets 7 8. Process 1 stores 7 as value of *var* 9. Process 2 stores 7 as value of *var*
The result is *var* ends up with value 7 when it should have value 8. More processes and combined incf's and decf's can result is an array of potential end values, almost all incorrect, if by 'correct' we mean the result if all the incf's and decf's were done in a single thread.
Here we have a log of doing this calculation running Allegro CL 64-bit 9.0 SMP on a 4-processor Windows 7 machine. We define functions that do incf's and decf's 100 times, and we run them in twenty processes. Because we have an equal number of incf and decf calls, the value of *v* at the end should be 0. But it is -4 in this example and can be any value between -1000 (no incf succeeds) to 1000 (no decf succeeds).
cg-user(1): (defvar *v* 0) *v* cg-user(2): (defun foo () (dotimes (i 100) (incf *v*) (sleep 1))) foo cg-user(3): (defun bar () (dotimes (i 100) (decf *v*) (sleep 1))) bar cg-user(4): (compile 'foo) foo nil nil cg-user(5): (compile 'bar) bar nil nil cg-user(6): (dotimes (i 10) (mp:process-run-function (symbol-name (gensym)) 'foo) (mp:process-run-function (symbol-name (gensym)) 'bar)) nil cg-user(7): :proc P Id Bix Dis Sec dSec Pri State Process Name, Whostate, Arrest * 20f8 14 0 2 2.2 0 waiting g1181, Sleeping * 2058 25 0 2 2.2 0 runnable g1192 * 31e8 23 0 2 2.0 0 runnable g1190 * 47c 3 0 2 1.7 8 runnable IDE GUI * 33d8 15 0 1 1.4 0 runnable g1182 * 154c 7 0 1 1.4 0 waiting g1174, Sleeping * 25b4 9 0 1 1.3 0 waiting g1176, Sleeping * 112c 22 0 1 1.3 0 runnable g1189 * 2c78 8 0 1 1.3 0 waiting g1175, Sleeping * 2750 11 0 1 1.2 0 runnable g1178 * 37c8 6 0 1 1.2 0 runnable g1173 * 1904 16 0 1 1.2 0 waiting g1183, Sleeping * 2054 21 0 1 1.0 0 runnable g1188 * 1d14 20 0 1 1.0 0 waiting g1187, Sleeping * 1b3c 12 0 1 0.8 0 runnable g1179 * 1198 10 0 1 0.7 0 waiting g1177, Sleeping * 2bf8 24 0 1 0.6 0 waiting g1191, Sleeping * 3280 19 0 1 0.6 0 runnable g1186 * 3694 13 0 0 0.4 0 waiting g1180, Sleeping * 2ac4 17 0 0 0.3 0 waiting g1184, Sleeping * 234c 2 1 0 0.2 0 waiting Initial Lisp Listener, waiting-for-input * 2f9c 18 0 0 0.2 0 runnable g1185 * fa0 5 0 0 0.0 0 runnable Listener 1 cg-user(8): :proc P Id Bix Dis Sec dSec Pri State Process Name, Whostate, Arrest * 47c 3 0 2 0.3 8 runnable IDE GUI * fa0 5 0 0 0.0 0 runnable Listener 1 * 234c 2 0 0 0.0 0 waiting Initial Lisp Listener, waiting-for-input cg-user(9): *v* -4 cg-user(10):
cg-user(1): (defvar *v1* 0) *v* cg-user(2): (defun foo () (dotimes (i 100) (incf-atomic *v1*) (sleep 1))) foo cg-user(3): (defun bar () (dotimes (i 100) (decf-atomic *v1*) (sleep 1))) bar
Now any calls to foo and bar in any process in any order will, when they have all completed, result in *v1* having the expected value. However, the order of calculation is not guaranteed.
These macros also have problems with SMP. The push macro places a new item at the front of a list. push works be taking an item, creating a new cons object, making the cdr point to the current first cons of the list, and making the list header point to the new cons. (Non-empty lists are stored as a collection of conses and a header. The header points to the first cons in the list, and the cdr of each cons points to the next list cons, except the last where the cdr is nil -- ignoring, for this discussion, circular lists.) pushnew works by first testing whethe rthe new item is on the list, and adding it like push if it is not. popmoves (and returns) the first item from the list.
Here we illustrate the problem with push. The following is what can happen with two processors both evaluating a push on the same list:
Processor 1 get the pointer to the first list item Processor 2 get the pointer to the (same) first list item Processor 1 creates a new cons with the first argument to push the car and the pointed just obtained the cdr Processor 2 creates a new cons with the first argument to push the car and the pointed just obtained the cdr Processor X replaces the pointer in the list header with a pointer to its new cons Processor Y replaces the pointer in the list header with a pointer to its new cons
We put X and Y rather than 1 and 2 in the last lines. Whichever is X will find its item missing from the list after this all completes. This will work correctly with push-atomic used in place of push.
pop has the same issue (two processes could both get the same value from the list rather than one getting the first and the other getting the second). This will work correctly with pop-atomic.
pushnew adds an additional complication: making sure that the object to be added to the list is not already present. If two processes try pushnew with the same item at the same time, even if push-atomic is used, the result may be that the item is placed on the list twice. Rather than having an atomic primitive version of pushnew, implementations have to use locks or similar tools to guarantee the list is updated correctly without unwanted duplicates.
We will discuss locks in the next Tech Corner article. See smp.htm, which is the main document discussing SMP.
|Copyright © 2013 Franz Inc., All Rights Reserved | Privacy Statement||