Configuration Management in Java EE
Configuration Management has a lot of relevance in Cloud Computing as I tried to argue earlier. Actually, I would boldly claim that Configuration Management is a corner stone in any serious attempt to squeeze a few dollars out of software.
So what is Configuration Management and its key goals? Without over-complicating things I think the two following goals is not too far away from the truth.
- Establish a configuration in a predicatble way that guarantee a correctly behaving system.
- Maintain configuration consistency as changes occur over time.
In other words, being able to manage change of behaviour in a reliable and secure way throughout the software lifecycle.
But what is configuration? Is it source code? Is it loaded statically into the current class loader? Can it be changed at runtime? Is it persistent data? Is it tracked by VCS? Actually, where is it stored? Can every computer in the cluster access it? What happens when configuration change? Do we care if changes are validated? Are the user changing configuration authorized to do so? How are changes propagated to cluster members? Will applications be notified about configuration changes?
Before going into this I would like to recall something about maintenence.
Maintenance typically consumes about 40 to 80 percent (60 percent average) of software costs. Therefore, it is probably the most important life cycle phase.
In short (without arguing about numbers), OAM is difficult and costly, clearly more so in dynamic and elastic Cloud Computing environments. And from a productivity perspective, if we can design our software so that we can avoid bouncing VCS, iterating a product release all the way through the deployment pipeline into production and still be able to manage change of behaviour, we maybe should consider it right? Obviously this would also make software more adaptable to different environments, the spirit and soul of Java.
I would argue that we use configuration to delay decisions, not only with respect to the environment and its resources, but also to application-business specific decisions. Business must be able to quickly configure offerings/rules that are not related to application server resources/infrastructure.
Therefore I believe that parts that must not change after release (behavioral integrity) is part of the program and configuration is a runtime behavioural invariant, strictly governed by program policies so that predictable system behaviour can be guaranteed, enforced on different levels depending on rate of change – BUT (and here is the pitch) in a productive, non-intrusive and reliable way. The Open/Closed principle comes to mind.
In the context of Java EE, this definition still is not clear enough. Java EE 6 released the DataSourceDefinition annotation, which sort-of assume that configuration is code. A bit more configuration flexibility is given by the Assembler/Deployer roles. Simply put, the intention is that the application (in particular its xml descriptors) can be modified just before deployment, possibly overriding hardcoded values.
This approach have always puzzled me, but maybe it is a matter of perspective on how different people perceive what type of data is considered configuration? However, I have never in my career heard, read about or met anyone that actually use this mechanism as intended. And there may be good reasons for that.
In the Maven feedback-loop compile and packaging practically goes hand in hand – and almost every Maven project is intended to produce an artifact in the form of an archive. Descriptors are generated by Maven or statically tracked by the VCS. Either way, this process seals the application for further modification, unless the archive is unzipped and modified.
But I cannot visualize a situation where it would be good idea to open up a JAR file, modify a text file, repackage and redeploy (using tools that are proprietary mind you – asadmin, wlst etc). Why? Consider what happens when a new release of the *authentic* archive is released. The changes that the assembler/deployer did will either be overwritten or needs to be re-configured again. Because of this, it is arguably not a good idea to do ad-hoc changes to version controlled files if those changes never make their way back to be tracked by the VCS. Even if they did, we would loose flexibility.
It can be worth mentioning that many open source projects signs releases with a digital signature so that security-conscious users can find a digital trust path to the tarball. How do you change configuration for such an archive without breaking the signature?
Consider impacts on development, where every developer may have a separate database tablespace for their integration tests. A clever developer probably builds some profile-sensitive maven plugins to search/replace his private data in the deployment descriptors. But why should he burdened with this and taking a turnaround hit whenever changing a configuration value, for example, between two JUnit tests (I dare not think what those tests would look like)? Xml files alone cannot validate changes themselves, we need a program to do that for us. Waiting 1-2 min for the application to deploy only to discover that your values were invalid and then do it all over again would be a disaster to developer productivity.
If we look further at how software might be deployed using a stage then switch approach for clustered systems, deployment descriptors become even more problematic since two versions of the same archives would be needed. And why should the production system be disturbed (upgraded), dealing with quiescing, because an unrelated value needed to change? Think about when a value is rejected – do you change the value back (repackage application etc) and roll back over the cluster, correct the value and try again? I dont know… but I am starting to feel uneasy about maintaining SLA reliability and configuration consistency across the cluster now.
In the context of multi-tenancy, a flat name=value type of configuration is also feel constraining. A configuration specification that is hierarchical or graph-like is better fit for modeling tenants enabling configuration compositions etc. Maybe something like this:
|1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33||
This would be the application view and note that no assumption is taken on where and how configuration is instantiated and we can fail-fast by enforcing type-safety at compile time using a annotation processor.
This is my spontaneous reflection and maybe it is too enterprisey. But i still think that both large and small applications would benefit from being configured at runtime, unware and separated from configuration sources (file, db, ldap, mib etc) and how they are managed.
I even think Java SE would benefit from Configuration Management aswell.
There are many more aspects around Configuration Management to discuss, such as security, administration, notifications, schema registration/discovery etc. But im going to stop here for comments/reflections/opinions – are deployment descriptors a good way for managing configuration or do we need something more sophisticated?
This is a post is related to the “[jsr342-experts] Re: Configuration” and “[jsr342-experts] Re: resource configuration” threads on the Java EE 7 Expert Group mailing list. Please feel free to comment here or on the mailing list.