Sunday, November 2, 2008

How safe is a "thread safe" data structure?

I think it is obvious that no abstract data type can guarantee that any (non trivial) sequence of invocations is atomic (i.e. not interruptible by other threads using the same instance of the ADT) through mechanisms internal to any ADT implementation. Another good example is the common idiom to insert into a map without replacing:
  synchronized (map)
{
if (!map.containsKey("key"))
{
map.put("key", "value");
}
}
As a consequence each public statement about thread-safety of an ADT or one of its implementations is generally only a statement about behaviour of *single invocations* of the public API. But this statement has some value in its own because the client needs to know whether he is expected to protect single invocations as well. Since the JavaDoc of HashMap states that it is not thread-safe clients must protect the following, too, if concurrent access is possible:
  synchronized (map)
{
map.put("key", "value");
}
Otherwise the map could be internally corrupted. Interesting that in the case of the map ADT there is some special API/implementation couple that solves both problems, internal and to some degree external atomicity. A java.util.concurrent.ConcurrentHashMap guarantees atomicity of single invocations with a possibly higher concurrency than external synchronization. And the ConcurrentMap interface offers the putIfAbsent() operation which executes the "not-containsKey-put" sequence atomically (and even much faster because only one hash lookup is needed!). The following is thread-safe without external synchronization:
  String existingValue = map.putIfAbsent("key", "value");
if (existingValue != null)
{
// New value has not been inserted!
}
Sometimes we have a situation where two data structures together form a unity in the sense that both of them must be modified at a time or none of them to have a consistent state at any time. An example is a bi-directional mapping. While a ConcurrentMap is a good choice for other mutli-threaded scenarios, we usually don't use them for scenrios where multiple data structures are involved because we need external synchronization anyway.
A ConcurrentHashMap allows for higher concurrency than an externally synchronized map but it is more expensive than a completely unsynchronized HashMap. Or in other words, two ConcurrentHashMaps plus external synchronization are much more expensive than two HashMaps with external synchronization.

Apologies for re-iterating stuff that is known to so many already.
It is still Sunday morning...

1 comment:

  1. Just a comment on how MySQL JDBC driver achieves atomicity.

    They use a synchronizedMap() and do
    object = map.remove(key);

    and after working with object, they do
    map.put(key,object);

    So, what they do there is achieving thread-safety by removing currently used objects from the map and re-inserting them, if they are not to be used anymore.

    If course this is only sensible in settings which permit temporary removal from the map...

    ReplyDelete