Sunday, December 14, 2008

What is a Workspace?

Stupid question!
It's a space to work in.
Stupid question?

Stupidity, like beauty, is a not context-free

A friend of mine used to complain that we, the guys of the software industry, only have a maximum of four words available to give things a name. That can lead to horrible ambiguities. Acronyms do the rest (see Uncorking CDOs). I agree that meaningful names tend to be longer, which is not always nice, either. Maybe someone can help me to find a shorter name for a strategy class I had to create yesterday:

TakeRemoteChangesThenApplyLocalChangesObjectConflictResolver

Well, workspace, I'm wandering from the subject. Often it seems necessary to look at the roles and responsibilities of a technical concept to guess its exact purpose and derive adequat usage rules. So, to re-word my initial question:

How am I supposed to use Eclipse workspaces?

I fear there is no single truth, given the vast number of responsibilities of a workspace:
  1. Manage my projects, associate them with revision control
  2. Manage my JREs/JDKs to compile my Java projects against
  3. Manage my target platform to compile plug-in projects against
  4. Manage my API baselines for plug-in development
  5. Manage my working and display preferences
And certainly some more. Each of the above responsibilities has its own dependencies and implications. For example the target platform and the API baselines are dependent on the projects that currently exist in my workspace. While there can be multiple baselines defined in the global workspace preferences and assigned to particular projects, the target platform is a global setting for the whole workspace. It can not be defined on a per project base. Since the set of projects I'm actively working on varies over time, this requires frequent change of the target platform definition.

The only constant I'm seeing is the set of my personal usage and display preferences. The way I populate my perspectives, the key bindings I'm used to and all the other things that proofed to be useful for me. Exporting and importing these preferences is also not ideal because it doesn't catch all settings (like stored passwords) and leads to redundancy problems.

My favorite decoration of changes

Now I get the feeling that the implications and inherent lifecycles of these responsibilities do not match particularly good together. In fact I'm living with this feeling from the beginning of my Eclipse experiences. The fact that my usage preferences are kind of a singleton at any point in time I'm somewhat reluctant to the idea of maintaining a bunch of separate workspaces. Or am I missing a trick to operate multiple workspaces on an identical set of IDE preferences?

I know that it's only a default behaviour of Eclipse to create and maintain my projects physically inside of my workspace. With some effort I can link in a set of projects that a physically stored in a folder outside of the workspace. The problem is that each set of open projects in the workspace requires a number of particular baselines defined and a number of partular plugin-ins to be present in the global target platform. Not to speak about the needed JREs and possibly other stuff that is required for the workspace projects.

An additional, but related, problem is that the initial check-outs from a revision control system usually require me to know which of the remote folders correspond to proper Eclipse projects and belong to the overall set of projects required for a larger set of functionality (also often called a project!). I know the concept of Team Project Set files (PSF) but they do not address the issue of required preference settings and their management is somewhat redundant, given that I'm already managing the set of projects directly in the workspace. I regularly forget to add new projects to the PSF files, or remove the deleted ones.

Sometimes the right steps are obvious

What I really would like to see is some convenient and consistent tooling that enables me to swap in and out these sets of related projects, together with all their prerequisites! I don't want to clutter my office with a completely new desk each time someone drops a task for me, buying and arranging new pen and ink each time...

Given that Eclipse is on the market for many years now I find it hard to believe that there is no obvious solution to this all-day problem. Well, maybe it's just that I didn't find it.

Geeks, please tell me about your solutions!

From time to time I had the feeling that Buckminster addresses some or all of these issues. Unfortunately I never managed to realize the exact responsibilities and implications of the concepts that Buckminster offers to materialize workspaces. At least I found it hard to map these concepts to my scenarios. I suspect that each flexible solution in this area can not be darn simple and I promise to put another evaluation on my todo list.

By the way, for my Net4j and CDO open source projects at Eclipse.org I came up with a mainly Ant-based approach to materialize nearly complete workspaces for the different development streams. Basically it's a two-phase process with a bootstrap phase to check out the setup scripts together with the appropriate PSFs. If you're interested, have a look at my description at http://wiki.eclipse.org/CDO_Source_Installation.

CDO workspace bootstrapping

Tuesday, December 9, 2008

Remoting with IProgressMonitor

Recently we added optional progress monitoring for remote requests to the Net4j Signalling Platform and I'd like to take this opportunity to explain some of the exciting aspects of Net4j. You can use Net4j to implement synchronous remoting applications as well as asynchronous messaging applications. Depending on the number of Net4j layers you are using you can work buffer-oriented or stream-oriented.

Another multi-purpose technology

The heart of Net4j is a fast and scalable, Java NIO-based buffer switching technology which is completely non-blocking and allows for multiple levels of buffer multiplexing. On top of this buffer-oriented transport layer there is an optional stream-oriented protocol layer which is used to implement application protocols. The framework automatically ensures that multiple such protocols can be multiplexed through a single physical transport connection like a TCP socket and that the transport type can be transparently changed to other mediums like HTTP or JVM.


I think the idea behind Net4j becomes clearer when we look at an example. Let's develop a small protocol to upload files to a remote server with full support of IProgressMonitor over the network. In Net4j this is a single signal (aka communications use-case) of a protocol and we need to provide implementations for both sides of the protocol. We start with the client-side request implementation.


Now the application can create instances of this request and call the send() method on them. The framework in turn calls the requesting() and confirming() methods when the transport layer is able to handle the data. The following example method encapsulates the sending process. Notice how the IProgressMonitor is converted to an OMMonitor.


This simple method can now be used in a workbench action. The transferFile() call is wrapped in a Job to make it run in a background thread and to integrate with the Eclipse Progress view.


The complete source code of this action class is available via CVS. Now we only need a server that is able to receive and handle our request. We start with the server-side request implementation (called an indication).


When the data of an UploadRequest arrives atthe server the Net4j framework creates an instance of our UploadIndication and calls the indicating() and confirming() methods appropriately. Therefor we need to register a signal factory, i.e. a customized protocol. In our case we create the protocol instance on-the-fly inside of the protocol factory.


If the server was expected to run in OSGi/Equinox the only missing thing would be the contribution of this protocol factory to an extension point.


But we want to look at a simple stand-alone version of our upload server.


The complete source code of the server class is available via CVS. To enhance the maintainability of our upload protocol some protocol constants are defined in a common interface. This makes it particularly easier to add new signals.


Now we just need to start the server and watch the console.


And finally we start a runtime Eclipse application with our UploadClientAction deployed. When you click this action a file selection dialog is displayed. The chosen file will be uploaded to the server and stored there in a temp folder. The progress of the upload operation is properly reflected in the Eclipse Progress view and can even be cancelled at any point in time.


This is only one example of the many things you can do with Net4j. Please note that the actual client/server communication in Net4j has some characteristics that differ from other remoting technologies, among them:
  1. There are no discrete marshalling/un-marshalling phases. As the client writes data to a stream this data is internally filled into fixed-size buffers and passed to the transport layer as soon as they are full enough. As a consequence the server already starts reading and processing that data while the client is still busy sending more data.
  2. Multiple virtual channels can be multiplexed through a single physical transport connection. Each such channel can be associated with its own signal protocol and then be used to multiplex arbitrary numbers of signals into both directions.
  3. The client and server roles do only apply while establishing a physical transport connection. Once connected both sides can open virtual channels at arbitrary times and send or receive data, possibly through signals.
A larger example of a Net4j application protocol is the one that connects EMF models with a CDO model repository. It is also an asymmetric protocol and consists of a client-side implementation and a server-side implementation.


I hope some of you found this interesting. Maybe I have some more ideas for articles about Net4j in the future...

Tuesday, December 2, 2008

Eclipse Demo Camp in Berlin

Last week I was at the Eclipse Demo Camp in Berlin, my home town in Germany. This was my second demo camp so I was curious how it would compare to the one I visited early this year in Bern, Switzerland.

Berlin photos of my friend Andreas Mösching

It turned out that they were quite similar. A lot of very interesting presentations coupled with a delicious buffet. Another similarity was the presenters' tendency to completely over-estimate their talking time. I remember that my talk at the EclipseCon 2008 also suffered from this a lot. I think I learned from it and I planned to talk only 15 of my 30 minutes and leave the rest for discussion.

The "Deutsche Dom" (main cathedral in Berlin)

My talk about the CDO Model Repository was the first one in a sequence of eight. Fortunately I managed to show my three architecture slides in 10 minutes, followed by a live demo of another 10 minutes. The rest of the time was dedicated to a lot of questions and answers. Very informative for me and the audience. Thank you!

CDO Architecture

After me Tom Ritter from the Fraunhofer Institut described their ModelBus effort. It reminded me slightly to what I saw in TopcaseD framework and indeed Tom talked about a former cooperation. I wonder if somebody would be interested in evaluating ways to integrate CDO with their ModelBus...

ModelBus

Then Volker Wegert from Siemens had a handicap while showing his SAP R/3 Connector for Eclipse RCP Applications: none of the attendees ever faced SAP back-ends. Nevertheless he managed to make it interesting!

SAP R/3 Connector for Eclipse RCP Applications

Very interesting was Jens von Pilgrim's demo of GEF3D. I was amazed to see what they're able to do with 2D user interfaces! With a handful of code changes they turn an ordinary class diagram editor into a multi-layered 3D editor with connections between the layers and so much more. Amazing. And Jens is a collegue of my new CDO committer Stefan Winkler, so I guess I'll hear from him in the future.

Gef3D

Enrico Schnepel explained us how to use GenGMF to ease the development of GMF editors for large metamodels. I also always thought that the GMF's diagram configuration models are so flexible that they fail to easily support the 95 percent cases. Could be worth a look at GenGMF...

GenGMF

One of the talks was not on the agenda so unfortunately I can not remember the name of the student who showed a demo of his distributed shared model editing framework on top of EMF (or GMF?). In his talk he explained that he investigated CDO and found it inappropriate due to the lack of offline support. What a pity that he did not take the time to contact the CDO team. We are currently investigating ways to provide more disconnected modes of model sharing and it would have been so much nicer to co-operate rather than duplicate efforts! I'd appreciate very much his work in the CDO project and he agreed to consider this. Ed, didn't you mention recently how much you enjoy seeing our team grow and prosper? :P

Shared editing in GMF diagrams

Theofanis Vassiliou-Gioles demonstrated their TTworkbench - an extensible Eclipse based test environment - in a very detailed way. I felt like a trainee...

Buffet between the talks

Stephan Herrmann from the TU Berlin gave the last talk: Plugin reuse and adaptation with Object Teams: Don't settle for a compromise! It looked a bit like an alternative approach to AspectJ and his examples were nearly as amazing as the ones at the ESE talk about Equinox Aspects. I must admit that I'm a bit scared about the security implications about such major changes to published and deployed code, un-anticipated by their providers. But it's clear that many things can be achieved with it that were impossible without.

Object Teams

The demo camp started at 6 p.m. and now it was already far past 10. Only two short breaks so most of us looked a bit tired already and I also decided to go home. For future demo camps I would really limit the presentation time to 15 minutes per talk and allow for discussion afterwards. It should be clear that a demo camp is not a training where we should learn the last detail of a tool or technology. It should create interest and the wish to dig deeper back at home.

The Fernsehturm in Berlin

Monday, November 24, 2008

How Scalable are my Models?

Recently I noticed an increasing hype around the topic scalability of models, paired with comprehensible concerns about the coherence in between. It is clear that, in the context of Eclipse, we are speaking about EMF, the Eclipse Modeling Framework. To answer the headline question we first need to gather a common understanding of what scalability means in this context. We can summarize the things we can do with a model into two categories:
  • Use the model in main memory
  • Preserve the model state between sessions

For a model to be scalable it is required that the resource consumption for its usage and preservation is not a function of the model size.

Scalability does not necessarily imply that it is always darned fast to use or preserve a single model object. Rather it guarantees that performance and foot print are the same or at least similar, whether the object is the only one contained in a resource or it is contained in a huge object graph. Usually the resource consumption should be a function of the model change.

Some persistence approaches obviously need to violate this constraint. For example saving model changes to a text file will always write the whole model as opposed to only the changes (respective enhancement requests in the EMF newsgroup showed me that this is not as obvious as I thought before). Even loading a single object usually requires to deserialize a whole resource file.

Other persistence systems like relational databases, object-oriented databases or even proprietary random-access files are likely to provide for more scalable preservation of models. An EMF change recorder could be attached to your resource set and the resulting change description could be somehow transformed into a set of modifications, executed against the back-end in O(n) where n is the size of change.

While each model object is usually an instance of a generated subclass of EObjectImpl there are other ancestors available, too. Understanding the cost of an object can be inevitable when trying to measure and optimize the resource consumption. But even if we know the minimum size of a single object it should be clear that we are unable to achieve real scalability just by reducing the size of our objects. The only way to handle models with arbitrary numbers of contained objects is to selectively load the objects that are currently needed into main memory and make them reclaimable by the Java garbage collector immediately after usage. A system with such characteristics does not focus on the size of single objects anymore. So what is preventing our generated models from being scalable?

The default generation pattern of EMF creates subclasses of EObjectImpl for our model concepts. These generated classes contain member fields to store the values of references. At run-time these references strongly tie together our object graph. In EMF there are two basic types of references, containment references and cross references. Traditionally only cross references can be turned into proxies to be resolvable again later. As of EMF 2.4 containment references can become proxies as well, although this requires a non-default generation pattern and possibly the adoption of the application to create and manage additional resources. It is important to note that turning an object into a proxy only sets its eProxyURI and nothing else. Particularly it does not unset any attributes or references. As a result proxies are always bigger than their unproxied pendants and they still carry their strong references to other objects! Go figure…

Now we could try to manually unset the nasty references that still prevent our objects from being garbage collected. But this can be a tedious and error-prone task. Especially bi-directional cross references can be hard to tackle because of the implicit inverse operations when unsetting one end. While it does not seem completely unfeasible it remains questionable whether the EMF proxy mechanism is appropriate to make our models scale well. To sum up:

  • Containment relationships between objects in a resource usually prevent from proxying.
  • Hence only complete resources look like candidates for unloading.
  • Detection of incoming references is expensive.
  • Proxying of incoming references does not automatically influence strong reachability.
  • Manual removal of strong references is at least inconvenient.

It seems as if we are stuck now, but let us step back to look at our model from a distance. In the end, our model is just a directed graph, the nodes are Java objects and the edges are strong Java references. And this last observation seems to be the root cause of our scalability problem! Imagine all these objects had a unique identifying value and all these associations were more like unconstrained foreign keys in a relational database system. We could point to objects without making them strongly reachable. Can we?

Yes, we can! EMF offers a different generation pattern called reflective delegation and a different run-time base class called EStoreEObjectImpl which can be used to implement models that transparently support the needed characteristics. Fasten your seat belt…

Reflective delegation changes the code that is generated for your implementation classes in three ways. Member fields are no longer generated for features. The getters and setters for single-valued features no longer access a member field’s value but rather delegate to the reflective eGet and eSet methods. And the getters for many-valued features return special EList implementations which also delegate to some reflective methods. With this generation pattern we can effectively remove all modeled state from our EObjects, including the unloved strong references. But where does it go instead?

Since we removed the state from our generated classes and the default base class EObjectImpl is not able to store modeled state it is obvious that we need a different base class, which can easily be achieved with the generator property Root Extends Class. While we could write our own implementation of InternalEObject it is usually sufficient to use or subclass EStoreEObjectImpl. Instances of this class delegate all their state access to an EStore which can be provided by the application. We only need to write our own EStore implementation with a dozen or so methods to fulfill the contract and ensure that each EStoreEObjectImpl instance points to an appropriate store instance. I have seen frameworks which maintain a separate store instance for each model object, others let all objects of a resource or a resource set share a single store and others (like CDO, explained later on) are even more complex. I think the right choice depends on how exactly the store is required to handle the object data. Before we dive into CDO’s approach we have to look at a tricky problem that all possible store implementation have to solve.

In addition to the modeled state of an object all stores have to maintain the eContainer and the eContainerFeatureID properties of an EObject. Although it is not immediately obvious the EStore interface only provides methods to get these values but no methods to set them! Since our store needs to provide these values and the framework does not pass them in explicitly we must, if we want or not, derive these values implicitly from the modification method calls (those that can influence the containment) and our knowledge about the model (which are the containment references?). Solving this problem is typically not a one hour task!

Now let us look at how the CDO Model Repository framework faces the problem. Here are some of the requirements for objects in CDO:

  • Loadable on demand, even across containment relationships
  • Garbage collectable, if not used anymore
  • Replaceable by newer versions (passive update) or older versions (temporality)
  • Easily and efficiently transferable through a network wire

These led to a considerably complex design which I am trying to strip down here a bit:

CDO’s implementation of EObject subclasses EStoreEObjectImpl and shares the same store instance with all objects in the resource set that come from the same repository which, together with the virtual current time is represented by a CDOView. CDO’s implementation of EStore is stateless other than knowing its view. The modeled state of an object is stored in CDORevision instances which represent the immutable states of an object between commit operations. The revisions internally store the CDOIDs of target objects instead of strong references to them. Each object stores a strong reference to the revision that is active at the time configured in the view. A view softly or weakly caches objects keyed by their CDOID. The revisions are cached separately in the CDOSession, by default with a two-level cache (configurable fixed size LRU cache plus memory sensitive cache to take over evicted revisions). Since revisions are immutable they can be shared among different local views.

With this design neither the framework nor the objects and revisions keep strong references to other objects or revisions and the garbage collector is able to do its job as soon as the application releases its strong references. The reflective delegation causes each access to a model property to go through the store, which uses the revision of the object to determine the CDOID of the target object. This id is then used to lookup the target object in the view cache. If the object is missing, either because it was never loaded or it has already been garbage collected, the needed revision is looked up in the session cache. The revision always knows the class of the object so that the view can create a new EObject instance and wire it with the revision. If revisions are missing from the sessions cache they are loaded from the repository server.

I kept quiet about a certain aspect to avoid complicating things at the beginning. Notice that not only the framework but also the application is creating new EObject instances to populate the model. Usually this happens through calls to EFactory methods which are unable to provide the new object with the appropriate EStore pointer. It becomes obvious that CDO objects (like all EStoreEObjectImpls without a singleton EStore) generally operate in one of two basic modes, which we call TRANSIENT and PERSISTENT respectively. In the context of repository transactions and remote invalidation we further refined the hyper state PERSISTENT into the sub states NEW, CLEAN, DIRTY, PROXY and CONFLICT. The transitions are internally managed by a singleton CDOStateMachine:

In the TRANSIENT state, i.e. after the object was created but before it is attached to a view, the object has no CDOID and no revision. The store is by-passed and the values are stored in the eSettings array instead. The attach event of the state machine installs a temporary CDOID and an empty revision which is populated through a call-back to the object. During population the data values are moved from the eSettings array to the revision and at the same time the strong Java references are converted to CDOIDs. Finally the object state is set to NEW. The temporary CDOIDs of NEW objects are replaced after the next commit operation with permanent CDOIDs that the repository guarantees to be unique in its scope and all local references are adjusted accordingly.

Notice that no EObject/CDORevision pair is ever strongly reachable by anything other than the application. And the modeled state of an EObject can be atomically switched to older or newer versions by simply replacing the revision pointer. Since a revision does not store any Java references to other entities it’s easy to transfer its data over the wire. With this design it becomes feasible to traverse models of arbitrary sizes.

CDO provides some additional mechanisms to make such traversals even more enjoyable. The partial collection loading feature, for example, enables to page in configurable element chunks of huge lists and the current EStore implementation is able to record model usage patterns which can be used for pre-fetching of revisions that are likely to be used soon.

If you are interested in learning more about CDO you are welcome in the wiki and the newsgroup. You are also invited to attend my proposed talk at EclipseCon 2009: “Scale, Share and Store your Models with CDO 2.0”

Saturday, November 22, 2008

Being at ESE, not in Thailand...

Usually in late November we are going to different places in wonderful Thailand for vacation but this year the Eclipse Summit Europe was a month later than the all the other years. So I abstained from a big vacation and headed for the Summit in Ludwigsburg. I arrived there on Monday evening but my luggage did not. Never again I’ll ask for priority baggage when checking in with Air Berlin!

Arriving at the airport Bangkok

Fortunately my luggage was delivered to the Nestor hotel before midnight so I could fully concentrate on the fun of the conference. This fun caused an initial hangover on Tueasday, the symposia day. I missed the modeling symposium because I was told that it was already cramped when I arrived. And I did not prepare a position paper with the required minimum of 2 pages. I’d vote for freeing committers and other persons who are known to be involved so deeply from such effort. Later I was even told that the papers had not been checked very strictly. Anyway, I had some really nice talks to different people on Tuesday.

Me and the jungle on Koh Phi Phi

And I was able to attend the BREDEX GUIdancer presentation, given by Alexandra Imrie, who did a really great job. Later I was amazed that she, coming from Liverpool, could speak German without even the slightest accent. They seem to have a nice tool to create , maintain and execute user interface tests and I am happy that they promised to consider providing me with a free license for my CDO Model Repository project. I also met Ibrahim Sallam from Objectivity, Inc. who is currently preparing the offer of free developer licenses for their wonderful and darned fast OO database system (if used in combination with our EMF/CDO stack). We scheduled a more detailed discussion about this effort for Thursday.

More jungle near Chumphon

In the evening we had dinner in a smaller group and this time in a small restaurant with the lovely local food, which I think of is reason enough to have the summit in Schwabenland every year. Here you get the best Rostbraten, Spätzle and Maultaschen ever. Although we had a lot of fun I left early and went to bed to avoid another hangover during my CDO talk next day.

Lovely Thai food

Wednesday started with (a nice breakfast and) six great talks, the most fascinating for me being the Aspect Weaving for OSGi one. Heiko Seeberger and Martin Lippert presented amazing stuff about their Equinox Aspects project. I promised myself to give it a try as soon as possible. Some time after lunch I headed for Cedric Brun’s interesting talk about Team Work with Models : Compare and Merge with EMF Compare. I appreciated that he finished in time because my talk about the most interesting new features for the upcoming version 2.0 of my CDO Model Repository was the next one to follow.

Dragonfly at a pond on Koh Samui

It seemed that I somehow managed to address both, give an initial impression to the newbies and make existing users look forward to the next release. And our next release will really be a major one. Our small team has already implemented 175 bugzillas since Ganymede, many of them being powerful new features. Special thanks belong to Simon McDuff, who spends considerable part of his parental vacation to provide the CDO community with cool features and friendly support!

A really huge guy

To not repeat my former underestimates of talking time I focused on only very few architectural slides and some code snippets to demonstrate some of the most interesting new features:
External References
Distributed Transactions
Structured Resources
Resource Queries
Explicit Locking
Save Points
Configurable Passive Updates
Change Subscriptions
Query Framework

1, 2, 3, search for me!

I squeezed the large audience through my ten slides in only fifteen minutes, which proofed to be a good decision because even the remaining twenty minutes were not enough to anser all of the questions. I was amazed about the great interest in CDO and particularly noticed the increasing concerns about the scalability of models. CDO transparently addresses this sort of issues for example by loading and unloading single instances on demand or by partially loading huge lists of references. It is unbelievable, yet true, that we can easily traverse models of four gigabytes size or more. Depending on the back-end type chosen we can reach load rates of up to thirty thousand objects per second! I believe that such characteristics, together with the well-thought APIs and our prompt support to the community, caused a lot of the hype we are currently experiencing.

Wonderful biota in Thailand

After my talk I enjoyed the presentation of Gilles Iachelini, Marc Hoffmann and Simon Eggler about „Eclipse on Rails: RCP at the Swiss Railway“. It reminded me to an excellent live presentation they gave to me alone on during one of my business trips to Bern, Switzerland. Thank you guys, again! After that I missed the other presentations to have some more discussions on the floors. Dinner, lots of wonderful wine and the chill-out in the Nestor lobby expanded until five in the morning. As a consequence I missed the keynote on Thursday.

What the heck is that?

Ed Merks’ talk about The Unbearable Stupidity of Modeling clearly was one of the highlights of the whole summit! I’m glad that I was able to attend it. Many of the other talks that I marked as interesting in my schedule became victims of some more private discussions. The only exception was Tom Schindl’s presentation about Writing Datacentric applications with RCP+EMF+Databinding. He excited the audience with some really nice design ideas and, last not least, a demonstration of how easy it is to distribute model changes across machine boundaries with CDO. A meeting with some guys from the automotive scene prevented me from having lunch but the results were so promising that I did not care. That’s why I‘m always carrying some chocolate with me. We enjoyed it together.

Ah, a salesgirl and the wind!

As I mentioned earlier I also continued my discussion with Ibrahim about new licensing models for Objectivity’s OO database. They are currently not only exploring ways to provide free developer licenses for API and server runtimes but could also imagine to provide us (the CDO project) with empty skeleton bundles (EPL licensed) to fake p2 at installation time. I’m really looking forward to see our existing integration with Objectivity as a back-end for CDO model repositories being open sourced in the near future. Unfortunately it appeared that the time ran even faster on Thursday and after a last refreshing beer in the lobby, where most of my Eclipse friends met a last time, I headed towards Stuttgart airport to catch my flight home to Berlin. My luggage arrived with me…

Waiting for the flight back home

Thank you all for making ESE one of the nicest events in 2008 and see you at EclipseCon 2009 in Santa Clara!

Sunday, November 2, 2008

How safe is a "thread safe" data structure?

I think it is obvious that no abstract data type can guarantee that any (non trivial) sequence of invocations is atomic (i.e. not interruptible by other threads using the same instance of the ADT) through mechanisms internal to any ADT implementation. Another good example is the common idiom to insert into a map without replacing:
  synchronized (map)
{
if (!map.containsKey("key"))
{
map.put("key", "value");
}
}
As a consequence each public statement about thread-safety of an ADT or one of its implementations is generally only a statement about behaviour of *single invocations* of the public API. But this statement has some value in its own because the client needs to know whether he is expected to protect single invocations as well. Since the JavaDoc of HashMap states that it is not thread-safe clients must protect the following, too, if concurrent access is possible:
  synchronized (map)
{
map.put("key", "value");
}
Otherwise the map could be internally corrupted. Interesting that in the case of the map ADT there is some special API/implementation couple that solves both problems, internal and to some degree external atomicity. A java.util.concurrent.ConcurrentHashMap guarantees atomicity of single invocations with a possibly higher concurrency than external synchronization. And the ConcurrentMap interface offers the putIfAbsent() operation which executes the "not-containsKey-put" sequence atomically (and even much faster because only one hash lookup is needed!). The following is thread-safe without external synchronization:
  String existingValue = map.putIfAbsent("key", "value");
if (existingValue != null)
{
// New value has not been inserted!
}
Sometimes we have a situation where two data structures together form a unity in the sense that both of them must be modified at a time or none of them to have a consistent state at any time. An example is a bi-directional mapping. While a ConcurrentMap is a good choice for other mutli-threaded scenarios, we usually don't use them for scenrios where multiple data structures are involved because we need external synchronization anyway.
A ConcurrentHashMap allows for higher concurrency than an externally synchronized map but it is more expensive than a completely unsynchronized HashMap. Or in other words, two ConcurrentHashMaps plus external synchronization are much more expensive than two HashMaps with external synchronization.

Apologies for re-iterating stuff that is known to so many already.
It is still Sunday morning...

Alexander and the Gordian Knot

'... What glory's due to him that could divide
Such ravelled interests; has the knot untied,
And without stroke so smooth a passage made,
Where craft and malice such impeachments laid?'
Edmund Waller ...to the King


Original medallion of Alexander.

Well, when I look outside, this first November Sunday is not so nice. Bugzilla is waiting but it is Sunday! Anyway, I read the newsgroup and found an interesting post about a topic I discussed at different occasions with Ed Merks and other people in the recent past. I looked out of the window and decided to answer. Hold on. I looked again out of the window and said to myself "Is this the moment to start my own blog?". I always thought that I have not much important to tell in the public but then the idea: I solve this problem by declaring "this blog is not intended to tell important things!". Looking at some other blogs this seems to make sense although I know that beauty is not the only thing in the eye of the beholder.

In the end this blog enables me to pull my name in the public mud myself before others do it. For example Ed recently blogged about my cool, EMF based, model repository framework CDO and its relation with the world's financial crisis, as well as some security problems in Microsoft products. On the other hand, given the ten overly productive chinese (is this PC??) locked in Ed's home office, writing all these articles, it is pretty unlikely that I can blame myself for the next disaster much earlier.

Maybe it is best instead to try writing some posts that are semi important at least. I leave that to the eye of the aforementioned beholder, you. Watch out my next article about thread safety...

A super nova knot.

Ahh, apologies that I also leave it to you to find a relation between the gordian knot and modern software development, or not. I need to go, the sun comes out, and it is Sunday...