LNC Episode 032: Lockless Thread-Safe Data Structures
Scotty and Mike discuss sharing data between threads using lockless thread safe data structures.
Download/Listen
Running Time: 55 mins 14 secs
Download Size: 51MB
Files
Episode 32 (Enhanced)
Episode 32 (MP3)
Show Notes
Guest: Mike Ash
Blog: http://www.mikeash.com
Multithreading: why we want it
- Responsiveness
- Multicore performance
- (Keep this bit short, just a quick overview)
Thread communication
- Need to communicate things between threads, otherwise what’s the point?
- Can use shared data or mesage passing
- Message passing is cleaner, easier, use it when you can, but not universally applicable
- Example: passing a parameter to +detachNewThreadSelector:…
- Example: performSelectorOnThread:
- Example: Distributed Objects
- Shared data is dirtier, tricker, but often more efficient, sometimes easier to design
- Example: progress count
- Example: shared work item array
Protecting shared data
- What happens when we don’t
- Example: shared array gets corrupted and crashes
- Example: x++ loses updates if multithreaded with no protection
- The standard solution: locks
- NSLock, @synchronized in Cocoa
- Advantages
- Reasonably straightforward, sometimes
- Blocks (stops using CPU) if lock is held by another thread for a long time
- Good built-in support, lots of documentation, tutorials, and guides
- Disadvantages
- Deadlocks
- Blocks (stops making progress) if lock is held by another thread for a long time
- Expensive when contended
- Must make system calls, context switches, sleep and wake the thread
- Areas protected by a lock are effectively single-threaded, only use one core
What is lockless
- Simple, shared data with no locks!
- But we saw why this fails, how do you make it work?
- OSAtomic.h
- Example: make shared x++ work with OSAtomicIncrement32Barrier
- Quick note to always use the Barrier version
Basic concepts
- Standard shared data is based around the idea of the lock
- Lockless shared data is based around the idea of compare-and-swap
- What the heck is compare and swap?
- Equivalent C function (if(*ptr == old) *ptr = new;)
- Gives you an *atomic* update of a location in memory
- Use as a building block to create “transactions” in memory
- Basic concept: fetch, update, commit
- The “commit” is a CAS
- CAS fails if the value doesn’t match
- When this happens, throw away your work and start over
- Normal use of CAS should always be in a loop
- Example: implement atomic increment using CAS
- Fetch the current value, add one, commit with CAS, repeat on failure
- Most of the functions in OSAtomic.h are just very simple variations on this idea
Examine the ramifications
- Efficient
- Never a system call
- Never single-threaded
- Not as efficient as you might think
- On multi-core system, CAS requires synchronization between cores
- OSAtomicIncrement is ~10x slower than a straight x++ on Mac Pro
- But still pretty cheap
- Non-blocking
- Your loop may repeat more than once, but it will finish, and generally very quickly
- Potential for “livelock”
- Not a real concern in most cases
- Keep your CAS loops small and consistent
- Don’t pound on them at 100% CPU without end
- Tricky
- Must fit all shared data manipulation into this fetch/update/commit pattern
- CAS only works with a single pointer-sized unit, so you must base your transactions on that
- i386 does 64-bit CAS even in 32-bit mode, but this does not extend to other archs
- Bugs can manifest rarely, and in difficult-to-diagnose ways
- Code defensively, go slowly, test heavily
More complicated uses: a work queue
- OSAtomicEnqueue and OSAtomicDequeue
- Have to make your own queue element struct
- Purpose of that “offset” argument
- Using offsetof macro
- Not really a queue, actually a stack
- Often don’t care about what order our work is done in, so doesn’t matter
- Outline of pushing in work units from a controller thread, pulling them out on worker threads
- OSQueue for work units
- Shared counter for completion count
- Thread that finishes all work posts notification to the controller thread using performSelectorOnThread:
- Or, controller thread polls completion count
Advanced techniques
- Thread safe linked list
- Can give a real queue (instead of OSQueue’s stack), or just a safe container
- Use CAS to update links
- Lots of difficulties to overcome here:
- Freeing items that another thread is iterating
- ABA problem when removing items
- Potential solutions:
- Garbage collection
- ObjC-GC must use objc_atomicCompareAndSwapInstanceVariableBarrier in objc-auto.h
- GC write barrier may take a lock for you!
- ObjC-GC stops your threads from time to time, also hurts
- Tag bits – still unsafe!
- Hazard pointers
- Restrictions on the data structure – allow insert but not delete
- Concurrent programming is an exercise in paranoia
- Lockless makes it ten times worse
- Try to encapsulate the lockless code as much as possible, minimize your exposure
- Go slow, think everything through, and test test test
Further reading
- OSAtomic.h
- Don’t fear the libkern, you can use it!
- TransactionKit
- Lockless transaction-based NSMutableDictionary
- Lots of literature on lockless data structures, use google
- Software transactional memory
- A continuation of lockless data structures
- Very advanced, active research area
- Probably not yet practical to use in your software today










Leave your response!