Hands-on #3: Valgrind
Complete the following hands-on assignment. Do the activities described, and submit your solutions for following questions to ftp://535603986:public@public.sjtu.edu.cn/upload/hands-on-3 Please submit a doc or a pdf with title like 513037xxxx.pdf. You are free to use either Chinese or English. Due time is 2015-4-7 23:59.
With any problems, contact Shi Lei (sleepytodeath@gmail.com). ( What you saw early today was the draft. Sorry for the mistake. )
Introduction
This assignment is about parallel programming with Pthreads. You will get familiar with Valgrind, a debugging tool we are using to detect race conditions. Although Valgrind contains many convenient tools you can utilize to accelerate your daily debugging job, we only focus on race condition detection here.
You may want to skim the Eraser paper, which is close in spirit to the tool we are using here. You might also want to refresh your memory on what race conditions are and the troubles that they can cause. You can refer to relative chapters in textbook for that.
Warmup
Download the program ph.c and store it in a file ph.c.
Then compile and run it with one thread:
$ gcc -g -o ph ph.c -pthread
$ ./ph 1
completion time for put phase = 2.633956
0: 0 keys missing
completion time for get phase = 2.649134
The 1 specifies the number of threads that execute put and get operations on the the hash table. The program inserts NKEYS into a hash table in the put phase and then looks them all up in the get phase.
Let's run with 2 threads. Then each thread does NKEYS/2 puts in the put phase and NKEYS/2 lookups in the get phase. The program is written in C, the default language for low-level systems programming, and uses pthreads for creating several threads that can run in parallel if our machine has several cores (most processors today have at least 2 cores).
If we achieve perfect parallelism, then the two phases complete in half the time. Here is the output:
$ ./ph 2
completion time for put phase = 1.415325
0: 60 keys missing
1: 68 keys missing
completion time for get phase = 3.151615
We see that put phase is faster now, so we were able to exploit 2 cores (but it didn't get twice as fast). The get phase, however, is slower with 2 cores, which is disappointing: using more cores, the program ran slower. More on that later, because there is a bigger problem.
You will likely observe that the code is incorrect. The application inserted 27+39 keys in phase 1 that phase 2 couldn't find. In your runs, there may be more or fewer keys missing. There may be even 0 keys missing in some runs. If you run with 1 thread, there will never be any keys missing.
Run the application with 4 threads:
$ ./ph 4
completion time for put phase = 1.352910
3: 27 keys missing
0: 64 keys missing
2: 26 keys missing
1: 62 keys missing
completion time for get phase = 3.441338
Three points: 1) the put runs faster with 4 cores than 2 cores, but not 4 times as fast as a single core; 2) the get phase ran even slower; 3) more keys are missing.
Using valgrind to find race condition
The ph.c program is simple, and you may be able to spot the mistake immediately, but we will use a tool, helgrind that helps us find such errors automatically. helgrind is similar in spirit to the Eraser system that you are reading about, and is part of a suite of tools collectively called valgrind.
Run helgrind with ph. You will see output as follows:
$ valgrind --tool=helgrind ./ph 2
==5500== Helgrind, a thread error detector
==5500== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==5500== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==5500== Command: ./ph 2
==5500==
==5500== ---Thread-Announcement------------------------------------------
==5500==
==5500== Thread #3 was created
==5500== at 0x514123E: clone (in /usr/lib/libc-2.21.so)
==5500== by 0x4E421B9: create_thread (in /usr/lib/libpthread-2.21.so)
==5500== by 0x4E43AFA: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.21.so)
==5500== by 0x4C302CD: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5500== by 0x401040: main (ph.c:148)
==5500==
==5500== ---Thread-Announcement------------------------------------------
==5500==
==5500== Thread #2 was created
==5500== at 0x514123E: clone (in /usr/lib/libc-2.21.so)
==5500== by 0x4E421B9: create_thread (in /usr/lib/libpthread-2.21.so)
==5500== by 0x4E43AFA: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.21.so)
==5500== by 0x4C302CD: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5500== by 0x401040: main (ph.c:148)
==5500==
==5500== ----------------------------------------------------------------
==5500==
==5500== Possible data race during read of size 4 at 0x1CE5008 by thread #3
==5500== Locks held: none
==5500== at 0x400BDC: put (ph.c:61)
==5500== by 0x400E32: put_thread (ph.c:98)
==5500== by 0x4C30466: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5500== by 0x4E43373: start_thread (in /usr/lib/libpthread-2.21.so)
==5500==
==5500== This conflicts with a previous write of size 4 by thread #2
==5500== Locks held: none
==5500== at 0x400C6F: put (ph.c:64)
==5500== by 0x400E32: put_thread (ph.c:98)
==5500== by 0x4C30466: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5500== by 0x4E43373: start_thread (in /usr/lib/libpthread-2.21.so)
==5500== Address 0x1ce5008 is 24000008 bytes inside data symbol "table"
==5500==
==5500==
==5500== More than 10000000 total errors detected. I'm not reporting any more.
==5500== Final error counts will be inaccurate. Go fix your program!
==5500== Rerun with --error-limit=no to disable this cutoff. Note
==5500== that errors may occur in your program without prior warning from
==5500== Valgrind, because errors are no longer being displayed.
==5500==
( The whole program may take minutes to finish in valgrind, so you can stop it here. )
Question 1: Study helgrind's output and the ph.c program. Clearly something is wrong with put() on line 61. Why are there missing keys with 2 or more threads, but not with 1 thread? Identify a sequence of events that can lead to keys missing for 2 threads.
Fixing the error
To avoid this sequence of events, insert lock and unlock statements in put so that the number keys missing is always 0. The relevant pthread calls are (for more see the manual pages, man pthread):
pthread_mutex_t lock; // declare a lock
pthread_mutex_init(&lock, NULL); // initialize the lock
pthread_mutex_lock(&lock); // acquire lock
pthread_mutex_unlock(&lock); // release lock
The function get() has an example use of locks, and main initializes it.
Modify ph.c to make it correct and recompile with gcc. Test your code first with 1 thread, then test it with 2 threads. Is it correct (i.e. have you eliminated missing keys)? Check the correctness using helgrind. (Note that valgrind slows down ph a lot, so you may want to modify NKEYS to be some small number.)
Question 2: Describe your changes and argue why these changes ensure correctness.
Question 3: Is the two-threaded version faster than the single-threaded version in the put phase? Report the running times for the two-threaded version and the single-threaded version. (Make sure to undo your NKEYS change.)
Question 4: Most likely (depending on your exact changes) ph won't achieve any speedup. Why is that?
Living dangerously
Remove the locks from get() in ph.c, recompile, and rerun the program.
Question 5: You should observe speedup on several cores for the get phase of ph. Why is that?
Question 6: Why does valgrind report no errors for get()? Can you imagine a execution sequence where valgrind may also report error for get(), with current get() and put() functions?
Challenge
Question 7: Can you think of a way of modifying ph.c so that you get speedup for both phases and valgrind won't report race conditions? (If you have time, implement that plan and check.)
Task for fun
If you find the previous impressively long hands-on very exciting and get unsatisfied with this short one, you can try this interesting task. Of course, you don’t have to. No score is concerned.
As is described before, Valgrind contains many useful tools besides helgrind. For example, it is capable of detecting memory leakage. If your program is written in C or C++ and contains a number of pointer operations, possibly there is memory leakage.
Use this tool to analyze a program you write recently. See if you can find any leak. If any, try to fix it. Report where and how this memory leak occurs.
You may use a command like this:
$ valgrind --leak-check=full your_program
Use -g tag while compiling for more info in valgrind's output.
If you can’t find any suitable program, consider this. This is a stupid simulation of a layer in network. It receives package from lower layer, checks and stores them in buffer and combines them into upper layer message. ( Hint: a single line can fix this bug )