Home » Late Night Cocoa

LNC Episode 032: Lockless Thread-Safe Data Structures

5 September 2008 2,148 views No Comment

Scotty and Mike discuss sharing data between threads using lockless thread safe data structures.

Download/Listen
Running Time: 55 mins 14 secs
Download Size: 51MB

Files
Episode 32 (Enhanced)
Episode 32 (MP3)


Show Notes

 

Guest: Mike Ash
Blog: http://www.mikeash.com 

Multithreading: why we want it
- Responsiveness
- Multicore performance
- (Keep this bit short, just a quick overview)

Thread communication
- Need to communicate things between threads, otherwise what’s the point?
- Can use shared data or mesage passing
- Message passing is cleaner, easier, use it when you can, but not universally applicable
- Example: passing a parameter to +detachNewThreadSelector:…
- Example: performSelectorOnThread:
- Example: Distributed Objects
- Shared data is dirtier, tricker, but often more efficient, sometimes easier to design
- Example: progress count
- Example: shared work item array

Protecting shared data
- What happens when we don’t
- Example: shared array gets corrupted and crashes
- Example: x++ loses updates if multithreaded with no protection
- The standard solution: locks
- NSLock, @synchronized in Cocoa
- Advantages
- Reasonably straightforward, sometimes
- Blocks (stops using CPU) if lock is held by another thread for a long time
- Good built-in support, lots of documentation, tutorials, and guides
- Disadvantages
- Deadlocks
- Blocks (stops making progress) if lock is held by another thread for a long time
- Expensive when contended
- Must make system calls, context switches, sleep and wake the thread
- Areas protected by a lock are effectively single-threaded, only use one core

What is lockless
- Simple, shared data with no locks!
- But we saw why this fails, how do you make it work?
- OSAtomic.h
- Example: make shared x++ work with OSAtomicIncrement32Barrier
- Quick note to always use the Barrier version

Basic concepts
- Standard shared data is based around the idea of the lock
- Lockless shared data is based around the idea of compare-and-swap
- What the heck is compare and swap?
- Equivalent C function (if(*ptr == old) *ptr = new;)
- Gives you an *atomic* update of a location in memory
- Use as a building block to create “transactions” in memory
- Basic concept: fetch, update, commit
- The “commit” is a CAS
- CAS fails if the value doesn’t match
- When this happens, throw away your work and start over
- Normal use of CAS should always be in a loop
- Example: implement atomic increment using CAS
- Fetch the current value, add one, commit with CAS, repeat on failure
- Most of the functions in OSAtomic.h are just very simple variations on this idea

Examine the ramifications
- Efficient
- Never a system call
- Never single-threaded
- Not as efficient as you might think
- On multi-core system, CAS requires synchronization between cores
- OSAtomicIncrement is ~10x slower than a straight x++ on Mac Pro
- But still pretty cheap
- Non-blocking
- Your loop may repeat more than once, but it will finish, and generally very quickly
- Potential for “livelock”
- Not a real concern in most cases
- Keep your CAS loops small and consistent
- Don’t pound on them at 100% CPU without end
- Tricky
- Must fit all shared data manipulation into this fetch/update/commit pattern
- CAS only works with a single pointer-sized unit, so you must base your transactions on that
- i386 does 64-bit CAS even in 32-bit mode, but this does not extend to other archs
- Bugs can manifest rarely, and in difficult-to-diagnose ways
- Code defensively, go slowly, test heavily

More complicated uses: a work queue
- OSAtomicEnqueue and OSAtomicDequeue
- Have to make your own queue element struct
- Purpose of that “offset” argument
- Using offsetof macro
- Not really a queue, actually a stack
- Often don’t care about what order our work is done in, so doesn’t matter
- Outline of pushing in work units from a controller thread, pulling them out on worker threads
- OSQueue for work units
- Shared counter for completion count
- Thread that finishes all work posts notification to the controller thread using performSelectorOnThread:
- Or, controller thread polls completion count

Advanced techniques
- Thread safe linked list
- Can give a real queue (instead of OSQueue’s stack), or just a safe container
- Use CAS to update links
- Lots of difficulties to overcome here:
- Freeing items that another thread is iterating
- ABA problem when removing items
- Potential solutions:
- Garbage collection
- ObjC-GC must use objc_atomicCompareAndSwapInstanceVariableBarrier in objc-auto.h
- GC write barrier may take a lock for you!
- ObjC-GC stops your threads from time to time, also hurts
- Tag bits – still unsafe!
- Hazard pointers
- Restrictions on the data structure – allow insert but not delete
- Concurrent programming is an exercise in paranoia
- Lockless makes it ten times worse
- Try to encapsulate the lockless code as much as possible, minimize your exposure
- Go slow, think everything through, and test test test

Further reading
- OSAtomic.h
- Don’t fear the libkern, you can use it!
- TransactionKit
- Lockless transaction-based NSMutableDictionary
- Lots of literature on lockless data structures, use google
- Software transactional memory
- A continuation of lockless data structures
- Very advanced, active research area
- Probably not yet practical to use in your software today

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.