Late Night
Cocoa
Late Night Cocoa is a podcast that helps Cocoa developers improve their Cocoa skills. Each episode of the podcast involves an experienced Cocoa developer sharing his or her experience in a specific area of Cocoa development.
Subscribe in iTunes (AAC Enhanced Version)
Subscribe in iTunes (MP3 Version)
Subscribe using RSS (AAC Enhanced Version)
Subscribe using RSS (MP3 Version)
Late Night Cocoa is a podcast that helps Cocoa developers improve their Cocoa skills. Each episode of the podcast involves an experienced Cocoa developer sharing his or her experience in a specific area of Cocoa development.
Subscribe in iTunes (AAC Enhanced Version)
Subscribe in iTunes (MP3 Version)
Subscribe using RSS (AAC Enhanced Version)
Subscribe using RSS (MP3 Version)
Episode 32:
Lockless Thread-Safe Data Structures
Download/Listen:
Episode 32 (Enhanced)
/ Episode 32 (MP3)
Running Time: 55 mins 14 secs
Download Size: 51MB
Running Time: 55 mins 14 secs
Download Size: 51MB
Released: 5th
September 2008
Guest:
Mike Ash
Scotty and Mike
discuss sharing data between threads using lockless
thread safe data structures.
Multithreading: why we want it
- Responsiveness
- Multicore performance
- (Keep this bit short, just a quick overview)
Thread communication
- Need to communicate things between threads, otherwise what's the point?
- Can use shared data or mesage passing
- Message passing is cleaner, easier, use it when you can, but not universally applicable
- Example: passing a parameter to +detachNewThreadSelector:...
- Example: performSelectorOnThread:
- Example: Distributed Objects
- Shared data is dirtier, tricker, but often more efficient, sometimes easier to design
- Example: progress count
- Example: shared work item array
Protecting shared data
- What happens when we don't
- Example: shared array gets corrupted and crashes
- Example: x++ loses updates if multithreaded with no protection
- The standard solution: locks
- NSLock, @synchronized in Cocoa
- Advantages
- Reasonably straightforward, sometimes
- Blocks (stops using CPU) if lock is held by another thread for a long time
- Good built-in support, lots of documentation, tutorials, and guides
- Disadvantages
- Deadlocks
- Blocks (stops making progress) if lock is held by another thread for a long time
- Expensive when contended
- Must make system calls, context switches, sleep and wake the thread
- Areas protected by a lock are effectively single-threaded, only use one core
What is lockless
- Simple, shared data with no locks!
- But we saw why this fails, how do you make it work?
- OSAtomic.h
- Example: make shared x++ work with OSAtomicIncrement32Barrier
- Quick note to always use the Barrier version
Basic concepts
- Standard shared data is based around the idea of the lock
- Lockless shared data is based around the idea of compare-and-swap
- What the heck is compare and swap?
- Equivalent C function (if(*ptr == old) *ptr = new;)
- Gives you an *atomic* update of a location in memory
- Use as a building block to create "transactions" in memory
- Basic concept: fetch, update, commit
- The "commit" is a CAS
- CAS fails if the value doesn't match
- When this happens, throw away your work and start over
- Normal use of CAS should always be in a loop
- Example: implement atomic increment using CAS
- Fetch the current value, add one, commit with CAS, repeat on failure
- Most of the functions in OSAtomic.h are just very simple variations on this idea
Examine the ramifications
- Efficient
- Never a system call
- Never single-threaded
- Not as efficient as you might think
- On multi-core system, CAS requires synchronization between cores
- OSAtomicIncrement is ~10x slower than a straight x++ on Mac Pro
- But still pretty cheap
- Non-blocking
- Your loop may repeat more than once, but it will finish, and generally very quickly
- Potential for "livelock"
- Not a real concern in most cases
- Keep your CAS loops small and consistent
- Don't pound on them at 100% CPU without end
- Tricky
- Must fit all shared data manipulation into this fetch/update/commit pattern
- CAS only works with a single pointer-sized unit, so you must base your transactions on that
- i386 does 64-bit CAS even in 32-bit mode, but this does not extend to other archs
- Bugs can manifest rarely, and in difficult-to-diagnose ways
- Code defensively, go slowly, test heavily
More complicated uses: a work queue
- OSAtomicEnqueue and OSAtomicDequeue
- Have to make your own queue element struct
- Purpose of that "offset" argument
- Using offsetof macro
- Not really a queue, actually a stack
- Often don't care about what order our work is done in, so doesn't matter
- Outline of pushing in work units from a controller thread, pulling them out on worker threads
- OSQueue for work units
- Shared counter for completion count
- Thread that finishes all work posts notification to the controller thread using performSelectorOnThread:
- Or, controller thread polls completion count
Advanced techniques
- Thread safe linked list
- Can give a real queue (instead of OSQueue's stack), or just a safe container
- Use CAS to update links
- Lots of difficulties to overcome here:
- Freeing items that another thread is iterating
- ABA problem when removing items
- Potential solutions:
- Garbage collection
- ObjC-GC must use objc_atomicCompareAndSwapInstanceVariableBarrier in objc-auto.h
- GC write barrier may take a lock for you!
- ObjC-GC stops your threads from time to time, also hurts
- Tag bits - still unsafe!
- Hazard pointers
- Restrictions on the data structure - allow insert but not delete
- Concurrent programming is an exercise in paranoia
- Lockless makes it ten times worse
- Try to encapsulate the lockless code as much as possible, minimize your exposure
- Go slow, think everything through, and test test test
Further reading
- OSAtomic.h
- Don't fear the libkern, you can use it!
- TransactionKit
- Lockless transaction-based NSMutableDictionary
- Lots of literature on lockless data structures, use google
- Software transactional memory
- A continuation of lockless data structures
- Very advanced, active research area
- Probably not yet practical to use in your software today
Multithreading: why we want it
- Responsiveness
- Multicore performance
- (Keep this bit short, just a quick overview)
Thread communication
- Need to communicate things between threads, otherwise what's the point?
- Can use shared data or mesage passing
- Message passing is cleaner, easier, use it when you can, but not universally applicable
- Example: passing a parameter to +detachNewThreadSelector:...
- Example: performSelectorOnThread:
- Example: Distributed Objects
- Shared data is dirtier, tricker, but often more efficient, sometimes easier to design
- Example: progress count
- Example: shared work item array
Protecting shared data
- What happens when we don't
- Example: shared array gets corrupted and crashes
- Example: x++ loses updates if multithreaded with no protection
- The standard solution: locks
- NSLock, @synchronized in Cocoa
- Advantages
- Reasonably straightforward, sometimes
- Blocks (stops using CPU) if lock is held by another thread for a long time
- Good built-in support, lots of documentation, tutorials, and guides
- Disadvantages
- Deadlocks
- Blocks (stops making progress) if lock is held by another thread for a long time
- Expensive when contended
- Must make system calls, context switches, sleep and wake the thread
- Areas protected by a lock are effectively single-threaded, only use one core
What is lockless
- Simple, shared data with no locks!
- But we saw why this fails, how do you make it work?
- OSAtomic.h
- Example: make shared x++ work with OSAtomicIncrement32Barrier
- Quick note to always use the Barrier version
Basic concepts
- Standard shared data is based around the idea of the lock
- Lockless shared data is based around the idea of compare-and-swap
- What the heck is compare and swap?
- Equivalent C function (if(*ptr == old) *ptr = new;)
- Gives you an *atomic* update of a location in memory
- Use as a building block to create "transactions" in memory
- Basic concept: fetch, update, commit
- The "commit" is a CAS
- CAS fails if the value doesn't match
- When this happens, throw away your work and start over
- Normal use of CAS should always be in a loop
- Example: implement atomic increment using CAS
- Fetch the current value, add one, commit with CAS, repeat on failure
- Most of the functions in OSAtomic.h are just very simple variations on this idea
Examine the ramifications
- Efficient
- Never a system call
- Never single-threaded
- Not as efficient as you might think
- On multi-core system, CAS requires synchronization between cores
- OSAtomicIncrement is ~10x slower than a straight x++ on Mac Pro
- But still pretty cheap
- Non-blocking
- Your loop may repeat more than once, but it will finish, and generally very quickly
- Potential for "livelock"
- Not a real concern in most cases
- Keep your CAS loops small and consistent
- Don't pound on them at 100% CPU without end
- Tricky
- Must fit all shared data manipulation into this fetch/update/commit pattern
- CAS only works with a single pointer-sized unit, so you must base your transactions on that
- i386 does 64-bit CAS even in 32-bit mode, but this does not extend to other archs
- Bugs can manifest rarely, and in difficult-to-diagnose ways
- Code defensively, go slowly, test heavily
More complicated uses: a work queue
- OSAtomicEnqueue and OSAtomicDequeue
- Have to make your own queue element struct
- Purpose of that "offset" argument
- Using offsetof macro
- Not really a queue, actually a stack
- Often don't care about what order our work is done in, so doesn't matter
- Outline of pushing in work units from a controller thread, pulling them out on worker threads
- OSQueue for work units
- Shared counter for completion count
- Thread that finishes all work posts notification to the controller thread using performSelectorOnThread:
- Or, controller thread polls completion count
Advanced techniques
- Thread safe linked list
- Can give a real queue (instead of OSQueue's stack), or just a safe container
- Use CAS to update links
- Lots of difficulties to overcome here:
- Freeing items that another thread is iterating
- ABA problem when removing items
- Potential solutions:
- Garbage collection
- ObjC-GC must use objc_atomicCompareAndSwapInstanceVariableBarrier in objc-auto.h
- GC write barrier may take a lock for you!
- ObjC-GC stops your threads from time to time, also hurts
- Tag bits - still unsafe!
- Hazard pointers
- Restrictions on the data structure - allow insert but not delete
- Concurrent programming is an exercise in paranoia
- Lockless makes it ten times worse
- Try to encapsulate the lockless code as much as possible, minimize your exposure
- Go slow, think everything through, and test test test
Further reading
- OSAtomic.h
- Don't fear the libkern, you can use it!
- TransactionKit
- Lockless transaction-based NSMutableDictionary
- Lots of literature on lockless data structures, use google
- Software transactional memory
- A continuation of lockless data structures
- Very advanced, active research area
- Probably not yet practical to use in your software today