Archive

Archive for the ‘Java’ Category

Processing huge files

January 14th, 2013 No comments

I recently had to process a set of files containg historical tick-by-tick fx market data and quickly realized that none of them could be read into memory using a traditional InputStream because every file was over 4 gigabytes in size. Emacs couldn’t even open them.

In this particular case I could write a simple bash script that divide files into smaller pieces and read them as usual. But I don’t want that since binary formats would invalidate this approach.

So the way to handle this problem properly is to process regions of data incrementally using memory mapped files. What’s nice about memory mapped files is that they do not consume virtual memory or paging space since it is backed by file data on disk.

Okey, let’s have a look at these files and extract some data. Seems like they contain ASCII text rows with comma delimited fields.

Format: [currency-pair],[timestamp],[bid-price],[ask-price]

Example: EUR/USD,20120102 00:01:30.420,1.29451,1.2949

Fair enough, I could write a program for that format. But reading and parsing files are orthogonal concepts; so let’s take a step back and think about a generic design that can be reused in case confronted with a similar problem in the future.

The problem boils down to incrementally decode a set of entries encoded in a infinitely long byte array without exhausting memory. The fact that the example format is encoded in comma/line delimited text is irrelevant for the general solution so it is clear that a decoder interface is needed in order to handle different formats.

Again, every entry cannot be parsed and kept in memory until the whole file is processed so we need a way to incrementally hand off chunks of entries that can be written elsewhere, disk or network, before they are garbage collected. An iterator is a good abstraction to handle this requirement because they act like cursors, which is exactly the point. Every iteration forwards the file pointer and let us do something with the data.

So first the Decoder interface. The idea is to incrementally decode objects from a MappedByteBuffer or return null if no objects remains in the buffer.

1 2 3
public interface Decoder<T> {
public T decode(ByteBuffer buffer);
}

Then comes the FileReader which implements Iterable. Each iteration will process next 4096 bytes of data and decode them into a list of objects using the Decoder. Notice that FileReader accept a list of files, which is nice since it enable traversal through the data without worrying about aggregation across files. By the way, 4096 byte chunks are probably a bit small for bigger files.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
public class FileReader implements Iterable<List<T>> {
private static final long CHUNK_SIZE = 4096;
private final Decoder<T> decoder;
private Iterator<File> files;
private FileReader(Decoder<T> decoder, File... files) {
this(decoder, Arrays.asList(files));
}
private FileReader(Decoder<T> decoder, List<File> files) {
this.files = files.iterator();
this.decoder = decoder;
}
public static <T> FileReader<T> create(Decoder<T> decoder, List<File> files) {
return new FileReader<T>(decoder, files);
}
 
public static <T> FileReader<T> create(Decoder<T> decoder, File... files) {
return new FileReader<T>(decoder, files);
}
@Override
public Iterator<List<T>> iterator() {
return new Iterator<List<T>>() {
private List<T> entries;
private long chunkPos = 0;
private MappedByteBuffer buffer;
private FileChannel channel;
@Override
public boolean hasNext() {
if (buffer == null || !buffer.hasRemaining()) {
buffer = nextBuffer(chunkPos);
if (buffer == null) {
return false;
}
}
T result = null;
while ((result = decoder.decode(buffer)) != null) {
if (entries == null) {
entries = new ArrayList<T>();
}
entries.add(result);
}
// set next MappedByteBuffer chunk
chunkPos += buffer.position();
buffer = null;
if (entries != null) {
return true;
} else {
Closeables.closeQuietly(channel);
return false;
}
}
private MappedByteBuffer nextBuffer(long position) {
try {
if (channel == null || channel.size() == position) {
if (channel != null) {
Closeables.closeQuietly(channel);
channel = null;
}
if (files.hasNext()) {
File file = files.next();
channel = new RandomAccessFile(file, "r").getChannel();
chunkPos = 0;
position = 0;
} else {
return null;
}
}
long chunkSize = CHUNK_SIZE;
if (channel.size() - position < chunkSize) {
chunkSize = channel.size() - position;
}
return channel.map(FileChannel.MapMode.READ_ONLY, chunkPos, chunkSize);
} catch (IOException e) {
Closeables.closeQuietly(channel);
throw new RuntimeException(e);
}
}
@Override
public List<T> next() {
List<T> res = entries;
entries = null;
return res;
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
};
}
}

Next task is to write a Decoder and I decided to implement a generic TextRowDecoder for any comma delimited text file format, accepting number of fields per row and a field delimiter and returning an array of byte arrays. TextRowDecoder can then be reused by format specific decoders that maybe handle different character sets.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
public class TextRowDecoder implements Decoder<byte[][]> {
private static final byte LF = 10;
private final int numFields;
private final byte delimiter;
public TextRowDecoder(int numFields, byte delimiter) {
this.numFields = numFields;
this.delimiter = delimiter;
}
@Override
public byte[][] decode(ByteBuffer buffer) {
int lineStartPos = buffer.position();
int limit = buffer.limit();
while (buffer.hasRemaining()) {
byte b = buffer.get();
if (b == LF) { // reached line feed so parse line
int lineEndPos = buffer.position();
// set positions for one row duplication
if (buffer.limit() < lineEndPos + 1) {
buffer.position(lineStartPos).limit(lineEndPos);
} else {
buffer.position(lineStartPos).limit(lineEndPos + 1);
}
byte[][] entry = parseRow(buffer.duplicate());
if (entry != null) {
// reset main buffer
buffer.position(lineEndPos);
buffer.limit(limit);
// set start after LF
lineStartPos = lineEndPos;
}
return entry;
}
}
buffer.position(lineStartPos);
return null;
}
public byte[][] parseRow(ByteBuffer buffer) {
int fieldStartPos = buffer.position();
int fieldEndPos = 0;
int fieldNumber = 0;
byte[][] fields = new byte[numFields][];
while (buffer.hasRemaining()) {
byte b = buffer.get();
if (b == delimiter || b == LF) {
fieldEndPos = buffer.position();
// save limit
int limit = buffer.limit();
// set positions for one row duplication
buffer.position(fieldStartPos).limit(fieldEndPos);
fields[fieldNumber] = parseField(buffer.duplicate(), fieldNumber, fieldEndPos - fieldStartPos - 1);
fieldNumber++;
// reset main buffer
buffer.position(fieldEndPos);
buffer.limit(limit);
// set start after LF
fieldStartPos = fieldEndPos;
}
if (fieldNumber == numFields) {
return fields;
}
}
return null;
}
private byte[] parseField(ByteBuffer buffer, int pos, int length) {
byte[] field = new byte[length];
for (int i = 0; i < field.length; i++) {
field[i] = buffer.get();
}
return field;
}
}

And this is how files are processed. Each list contain elements decoded from a single buffer and each element is an array of byte arrays as specified by the TextRowDecoder.

1 2 3 4 5
TextRowDecoder decoder = new TextRowDecoder(4, comma);
FileReader<byte[][]> reader = FileReader.create(decoder, file.listFiles());
for (List<byte[][]> chunk : reader) {
// do something with each chunk
}

We could stop here but there was one more requirement. Every row contain a timestamp and each batch must be grouped according to periods of time instead of buffers, day-by-day or hour-by-hour. I still want to iterate through each batch so the immediate reaction was to create a Iterable wrapper for FileReader that would implement this behaviour. One additional detail is that each element must to provide its timestamp to PeriodEntries by implementing the Timestamped interface (not shown here).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
public class PeriodEntries<T extends Timestamped> implements Iterable<List<T>> {
private final Iterator<List<T extends Timestamped>> entriesIt;
private final long interval;
private PeriodEntries(Iterable<List<T>> entriesIt, long interval) {
this.entriesIt = entriesIt.iterator();
this.interval = interval;
}
 
public static <T extends Timestamped> PeriodEntries<T> create(Iterable<List<T>> entriesIt, long interval) {
return new PeriodEntries<T>(entriesIt, interval);
}
@Override
public Iterator<List<T extends Timestamped>> iterator() {
return new Iterator<List<T>>() {
private Queue<List<T>> queue = new LinkedList<List<T>>();
private long previous;
private Iterator<T> entryIt;
@Override
public boolean hasNext() {
if (!advanceEntries()) {
return false;
}
T entry = entryIt.next();
long time = normalizeInterval(entry);
if (previous == 0) {
previous = time;
}
if (queue.peek() == null) {
List<T> group = new ArrayList<T>();
queue.add(group);
}
while (previous == time) {
queue.peek().add(entry);
if (!advanceEntries()) {
break;
}
entry = entryIt.next();
time = normalizeInterval(entry);
}
previous = time;
List<T> result = queue.peek();
if (result == null || result.isEmpty()) {
return false;
}
return true;
}
private boolean advanceEntries() {
// if there are no rows left
if (entryIt == null || !entryIt.hasNext()) {
// try get more rows if possible
if (entriesIt.hasNext()) {
entryIt = entriesIt.next().iterator();
return true;
} else {
// no more rows
return false;
}
}
return true;
}
private long normalizeInterval(Timestamped entry) {
long time = entry.getTime();
int utcOffset = TimeZone.getDefault().getOffset(time);
long utcTime = time + utcOffset;
long elapsed = utcTime % interval;
return time - elapsed;
}
@Override
public List<T> next() {
return queue.poll();
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
};
}
}

The final processing code did not change much by introducing this functionality, only one clean and tight for-loop that does not have to care about grouping elements across files, buffers and periods. PeriodEntries is also flexible enough to mange any length on the interval.

1 2 3 4 5 6 7 8 9 10 11
TrueFxDecoder decoder = new TrueFxDecoder();
FileReader<TrueFxData> reader = FileReader.create(decoder, file.listFiles());
long periodLength = TimeUnit.DAYS.toMillis(1);
PeriodEntries<TrueFxData> periods = PeriodEntries.create(reader, periodLength);
for (List<TrueFxData> entries : periods) {
// data for each day
for (TrueFxData entry : entries) {
// process each entry
}
}

As you may realize, it would not have been possible to solve this problem with collections; choosing iterators was a crucial design decision to be able to parse terabytes of data without consuming too much heap space.

Categories: Java, Uncategorized Tags:

WSDL sucks

December 7th, 2012 No comments

WSDL sucks. The whole WS-* protocol stack sucks. There, I said it.

It hurts, a pain in the but. Difficult to write and hard to debug. I cannot think of another technology that have wasted more of my time and I have yet to find one person that can give me a clear and concise explanation exactly how to to use it. Allegedly one of the most overengineered technology in the history of computers.

Doing wrong is easy, doing right is hard.

Its bloated. Tooling support is poor and hide necessary complexity. Interoperability is hard. Caching is not an option. Noncompliant with traditional web technology. The wire format is insanely verbose. Backward and forward compatibility is a nightmare. You have to read endless piles of specifications to design anything sophisticated; even sometimes bend over backwards to do the simplest of things. All your are left with is a pile crap to maintain in the end.

WTF? Where is KISS and productivity to be found in this mess?

System integration should not be this hard.

Categories: architecture, coding, Java Tags:

tools4j-config part 1, introduction

Tools4j-config 0.0.1 was released in Maven Central Repository about a week ago, a framework that aims to support creating configurable applications in a productive and consistent way.

This is a quick introduction on the possibilities for defining configuration and constraints, and how to administrate it for applications.

Prerequisites
Tools4j-config is fully functional in any Java SE 6+ compatible environment and is distributed as a set of Maven 3+ projects.

Quickstart
First we’ll define configuration used by our application. Create a maven project with the following dependencies.

1 2 3 4 5 6 7 8 9 10 11 12
<dependency>
<groupId>org.deephacks.tools4j</groupId>
<artifactId>config-api-runtime</artifactId>
<version>0.0.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.deephacks.tools4j</groupId>
<artifactId>config-core</artifactId>
<version>0.0.1</version>
<scope>runtime</scope>
</dependency>
view raw gistfile1.xml This Gist brought to you by GitHub.

Let’s assume we need access to a database, so we create a class that will represent the configuration needed to connect to it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
@Config(desc = "User database.")
public class Database {
@Id(desc = "Identification of database.")
private String id;
@Config(desc = "Address for connecting to database.")
private URL url;
// username with default value 'test'
@Config(desc = "Username for connecting to database.")
private String username = "test";
// password with default value 'test'
@Config(desc = "Password for connecting to database.")
private String password = "test";
@Config(desc = "Database connections in pool.")
private Integer poolSize;
}

Now register this class with the framework and read its configuration.

1 2 3
RuntimeContext runtime = Lookup.get().lookup(RuntimeContext.class);
runtime.register(Database.class);
List<Database> tests = runtime.all(Database.class);

Done! Our application is now configurable. Pretty quick and simple.

But what did we read? Nothing actually. The list is empty since there is no configuration available yet. We need the administrator to provision configuration first. So let’s take the administrator perspective for a second.

Create a new maven project with the following dependencies.

1 2 3 4 5 6 7 8 9 10 11 12
<dependency>
<groupId>org.deephacks.tools4j</groupId>
<artifactId>config-api-admin</artifactId>
<version>0.0.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.deephacks.tools4j</groupId>
<artifactId>config-core</artifactId>
<version>0.0.1</version>
<scope>runtime</scope>
</dependency>
view raw pom.xml This Gist brought to you by GitHub.

Configuration is managed programmatically and this is how our imaginary administrator creates a database named user and sets values for it.

1 2 3 4 5 6 7
AdminContext admin = Lookup.get().lookup(AdminContext.class);
Bean bean = Bean.create(BeanId.create("user", Database.class.getName()));
bean.addProperty("url", "/dev/null");
bean.addProperty("username", "admin");
bean.addProperty("password", "admin123");
bean.addProperty("poolSize", "nonsense");
admin.create(bean);

All good? Not quite. The code above fails twice.

Bean Database@user have a property java.net.URL@url with value /dev/null not matching its type.
Bean Database@user have a property java.lang.Integer@poolSize with value nonsense not matching its type.

These failures should be obvious. The administrator provisioned values that did not conform to the types of the configurable class, which brings up an important point: type safety. Administrators should not be able to (accidentally nor intentionally) break constraints of applications.

Assuming values are corrected, we can switch back to the application’s perspective and read the user instance.

1
Database user = runtime.get("user", Database.class);

The user instance is now initialized with values provisioned by the administrator and the application can read it without having to redeploy itself or restart the JVM.

At this very basic level it is important to notice a couple of things. Neither developer nor administrator made any assumptions on the runtime environment, nor did they know/care from where or how configuration was read/written.

Our quickstart use-case is complete but let’s dive a little deeper in order to understand the modelling capabilities of tools4j-config.

Built-in types
Configurable fields can have any of the following types.

  • java.lang.String
  • java.lang.Number and derived types
  • java.lang.Boolean
  • java.lang.Enum and derived types, including user-defined ones
  • java.util.Date
  • java.util.Currency
  • java.util.Locale
  • java.io.File
  • java.net.InetAddress
  • java.net.URL
  • java.net.URI
  • javax.xml.datatype.Duration

Fields can also be declared as a java.util.Collection implementation, generified with any of the above types. Fields initialized at declaration are considered default values (provisioned values take precedence). It is possible to declare user-defined types, but it requires some extra effort and will be covered in a future post.

All of these declarations are valid (annotations omitted for brevity).

1 2 3 4 5 6 7 8 9 10 11 12
// 1, 2, 3 is easy
List<Integer> counting = Arrays.asList(1, 2, 3);
// mathematical variable
Double x;
// measuring performance
TimeUnit precision = TimeUnit.NANOSECONDS;
// 5:01 developers
Set<Day> working = new HashSet<Day>(Arrays.asList(MON, TUE, WED, THU, FRI));
// content tags
List<String> labels;
// forever young
Date young = new Date(Long.MAX_VALUE);

References
Configurable classes can have references to other configurable classes, including themselves (recursive relationships). Circular references are allowed as well (person ‘a’ is person ‘b’:s best friend and usually vice versa).

Declaring references is identical to regular Java types.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
@Config(desc = "An individual (or application program) identity")
public class User {
@Id(desc = "username")
private String username;
 
@Config(desc = "password")
private String password;
 
@Config(desc = "Roles assigned to this user")
private Set<Role> roles = new HashSet<Role>();
}
 
@Config(desc = "Role for the permission to access a set of resources")
public class Role {
@Config(desc = "Permissions to access a set of resources")
private EnumSet<Privilege> privileges = EnumSet.noneOf(Privilege.class);
 
@Config(desc = "Roles assigned to this role")
private Set<Role> roles = new HashSet<Role>();
}

Referential integrity is enforced in order to keep relationships consistent. It is not possible create references to instances that does not exist; or remove instances that other instances already have references to.

Administrating references is almost to identical to provisioning regular Java type values.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
BeanId deployerRoleId = BeanId.create("deployer", Role.class.getName());
Bean deployer = Bean.create(deployerRoleId);
 
BeanId assemblerRoleId = BeanId.create("assembler", Role.class.getName());
Bean assembler = Bean.create(assemblerRoleId);
 
BeanId adminRoleId = BeanId.create("administrator", Role.class.getName());
Bean adminRole = Bean.create(adminRoleId);
adminRole.addReference("roles", deployerRoleId);
adminRole.addReference("roles", assemblerRoleId);
 
BeanId adminId = BeanId.create("admin", Role.class.getName());
Bean administrator = Bean.create(adminId);
administrator.addReference("roles", adminRoleId);
administrator.setProperty("password", "xxxxx");

Inheritance
Configurable classes support inheritance and configurable fields will be inherited from their parent class, enabling reuse of configurable fields and methods.

Validation
Tools4j-config integrates with JSR-303 Bean Validation to help developers to further constrain the premises under which configuration may be provisioned.

To enable Bean Validation we need to add the following dependencies to both maven projects mentioned earlier.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
<dependency>
<groupId>javax.validation</groupId>
<artifactId>validation-api</artifactId>
<version>1.0.0.GA</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-validator</artifactId>
<version>4.1.0.Final</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.deephacks.tools4j</groupId>
<artifactId>config-provider-jsr303</artifactId>
<version>0.0.1</version>
<scope>runtime</scope>
</dependency>
view raw gistfile1.xml This Gist brought to you by GitHub.

Next follows a hypothetical example where Bean Validation constraints are used to make sure that three properties of binary search trees are satisfied.

  • The left subtree of a node contains only nodes with keys less than the node’s key.
  • The right subtree of a node contains only nodes with keys greater than the node’s key.
  • Both the left and right subtrees must also be binary search trees.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
@Config(desc = "A binary tree")
@BinaryTreeConstraint
public class BinaryTree {
@Id(desc = "id of current node")
private String id;
 
@Config(desc = "value of current node")
@NotNull @Min(1)
private Integer value;
 
@Config(desc = "left child")
private BinaryTree left;
 
@Config(desc = "right child")
private BinaryTree right;
 
public Integer getValue() { return value; }
public BinaryTree getLeft() { return left; }
public BinaryTree getRight() { return right; }
public String toString() { return id + "=" + value; }
}
 
public class BinaryTreeValidator implements
ConstraintValidator<BinaryTreeConstraint, BinaryTree> {
 
public boolean isValid(BinaryTree n, ConstraintValidatorContext c) {
if (n.getLeft() != null && n.getValue() < n.getLeft().getValue()) {
String msg = n.getLeft() + " must be to right of " + n;
c.buildConstraintViolationWithTemplate(msg).addConstraintViolation();
return false;
}
if (n.getRight() != null && n.getValue() > n.getRight().getValue()) {
String msg = n.getRight() + " must be to left of " + n;
c.buildConstraintViolationWithTemplate(msg).addConstraintViolation();
return false;
}
return true;
}
 
public void initialize(BinaryTreeConstraint constraintAnnotation) { }
}
 
@Target({ TYPE })
@Retention(RetentionPolicy.RUNTIME)
@Constraint(validatedBy = BinaryTreeValidator.class)
public @interface BinaryTreeConstraint {
String message() default "";
Class<?>[] groups() default {};
Class<? extends Payload>[] payload() default {};
}

Tools4j-config is fully compatible with JSR-303 and support any combination of constraints on behalf on Bean Validation.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import javax.validation.constraints.Max;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Past;
import javax.validation.constraints.Pattern;
import javax.validation.constraints.Size;
 
@Config @NotNull @Size(min=1, max=5)
List<Person> persons;
 
@Config @NotNull @Size(min=8, max=25)
String password;
 
@Config @Max(200, message="Area too large")
Integer area() { return length * width; }
 
@Config @Past
Date time;
 
@Config @Pattern(regexp="^[\\w\\-]([\\.\\w])+[\\w]+@([\\w\\-]+\\.)+[a-zA-Z]{2,4}$" , message="Bad email")
String email;

We will not explore validation possibilities further but please do so on your own. JSR303 provide a great way for reusing constraint-related efforts.

What’s next?
Tools4j-config encourage developers to be very precise about defining configurable parts, strengthening guarantees that applications will operate on correct and meaningful data. Tools4-config also enable productivity, leaving room to focus on domain concerns and not forcing a lot of boiler-plate code around administration, persistence and many other unrelated concerns.

Tools4j-config has a lot more functionality than this post could hold – so this is the end. But the following topics will be explored in the future.

  • Persistence and data distribution

    How to choose between different ways of storing configuration such as XML files, SQL, NoSQL and more.

  • Compile-time checks

    How to catch schema faults at compile-time.

  • Administration interfaces

    How to provisioning configuration using interface such as JAX-WS, JAX-RS, auto-generated GUI, CLI and more.

  • Administration users and roles

    How to define users and enforce roles for administration.

  • Sessions and atomic commits

    How to make several changes in the scope of a session and commit changes atomically.

  • Schema discovery

    How to explore the configuration schema at runtime.

  • Change notifications

    How applications can subscribe for configuration changes.

  • Automatic documentation generation

    How to generate operational documentation from code.

  • Performance, scalability and consistency

    How to support low-latency high-performance centralized management in environments of scale with hundreds of thousands configuration instances.

  • Java EE and OSGi integration and portability

    How to integrate tools4j-config in Java EE, OSGi, Spring, CDI and other programming models and frameworks.

  • Extendability

    How to provide tailor-made implementations using the SPI mechanism and verify them against the TCK.

Please visit the website if you think tools4j-config seems interesting.

Categories: Java, Java EE, open source, tools4j-config Tags:

tools4j-config

February 1st, 2012 No comments

I am proud to announce my open-source project tools4j-config; a project that will try to address configuration concerns in Java once and for all.

I have seen and heard about far too many projects that handle configuration carelessly, causing endless headaches when put into production. Some notable nuisance are non-uniform interfaces, unmanageable structural and data changes in disparate sources, diffuse configuration intents and correlation to system concepts and lack of documentation.

Tools4j-config is my reaction from scratching an itch trying to help developers, operators and administrators (devops) to cooperatively manage configuration. This is the starting point and an honest attempt to try implement a framework that handle these concerns in a simple, productive, uniform, extendible and portable way.

The mission statement and motivation for tools4j-config is taken from the announcement on freecode.com and goes something like this.


Tools4j-config support long-running enterprise Java applications with a framework for handling configuration changes without restarting themselves.

It also aids in developing applications which are decoupled from knowing how and where to store, retrieve, and validate configurations.

The aim is to liberate applications to use configurations seamlessly on the terms of their particular environment, without constraining them to Java SE, EE, OSGi, Spring, CDI, or any other programming model or framework.

Tools4j-config is a true open source project, contributing ideas or criticism on any collaborative level is highly appreciated and will never be neglected or considered too small. Committers are welcomed with open arms.

The information on tools4j-config is presentable but a bit scarce at the moment but will build up gradually towards a 1.0 release.

Sure been a long time coming but expect a lot more on the topic of Configuration Management from me this year :-)

Categories: Java, Java EE, open source Tags:

ThoughtWorks Technology Radar

September 22nd, 2011 No comments

Most experienced developers knows about ThoughtWorks – I guess mostly because Martin Fowler is the Chief Scientist over there.

In our industry new technologies emerge at an insane rate and I personally find it very hard to keep track of all the things going on. ThoughtWorks Technology Radar is a group of people that help decision makers understand emerging technologies and trends that affect the market today, openly available to anyone.

We all know that Java is doing really good (actually fantastic) at the moment. Java 7 was released this summer and Java 8 is already well defined and planed to be released about a year from now. But if you, as a developer, want to understand more about specific technology trends in the industry that you should, or should not, be paying attention to ThoughtWorks Technology Radar is an invaluable source of information of how to strategically position yourself and your competence.

Some things that I find noteworthy are the cherishing of evolutionary database designs, acknowledging the problem of doing manual infrastructure management, NoSQL is promising but not yet mature and the WS-* stack should be treated with caution.

I advice you to read the latest ThoughtWorks Technology Radar report that was published in July 2011 and form your own opinion.

You owe it to yourself as a professional software developer!

Categories: business, Java Tags:

Java Code Geeks

August 28th, 2011 No comments

I have followed Java Code Geeks regularly, reading almost every article. Content is always top notch and on point! So im really happy to announce that my blog just got accepted into The Java Code Geek (JCG) Program.

This will of course increase publicity, but more importantly, allow me contribute and reach a larger portion of the java community and give something back. Hopefully my posts will spark discussions. I already feel motivated to write more and better posts.

Thank you for the support Byron! Im proud to be a Java Code Geek :-D

Categories: Java, writing Tags:

Uncle Bob

August 24th, 2011 No comments

I just went to a javaforum to celebrate the release of Java 7 with some friends, happy to see that some of the swedish JRockit core team was there to give their view on Java from an Oracle perspective.

The event was immediately full-booked on the day of announcement. And the interest was so high that the promoters quickly organized one more opportunity to celebrate in the coming two weeks, which was also immediately full-booked! Considering the bad economy and outsourcing insanity going on in the industry it is really nice to see Java is really alive and kickin’ it in Sweden :-D

Another great surprise was also revealed at the event, Uncle Bob was present to give a talk on Clean Code. Seeing living legends in person does not happen very often in Sweden unfortunately. I was stoked!

Among others, Robert talked about the importance of short methods. Hell, i want my methods to be short. I usually think 10 lines is reasonable.. but the man insisted on 3!

This made me wonder… what does he think about the Builder pattern where multiple method calls are chained together in a one-liner to create a fluent interface? Would that count as one line even though multiple method calls would be nested within that line of code?

Just for the record: im not talking about a transitive coupling train-wreck here.

Unfortunately I could not manage to ask my question since the QnA was very short, with only two questions answered. But I guess he would approve it as one line, as long as the syntax would be understandable.

Anyways, it was a really good and energizing presentation from Uncle Bob *and* Oracle, not to forget the free beer ;-)

Categories: Java, principles Tags:

The Devil is in the details

August 24th, 2011 No comments

I want code to be simple-n-short, on-point and easy to read. Unnecessary complexity distract and obscure understanding of what is really going on and can be a real killer for productivity.

You know, tangled for-loops and indexes to track, if/else and switch cases, null/validation checks, converting/copying/deleting/sorting collections, exception handling … the list goes on along with ever-increasing line numbers and maintenance burden.

An excellent quote by Tony Hoare comes to mind.

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.

In other words: The Devil is in the details.

Apache Commons have some of most wonderful libraries complementing the JDK APIs, but this post is not about Commons. It is about Google Guava which is similar to Commons in many regards. It provide a single library for commonly used day-to-day tasks, such as collection handling, string manipulation, concurrency, IO, primitives, exceptions etc.

There is so much nice stuff in Guava and I wont have time to go through the complete library, but here at least some examples of what it can do.

Objects
Objects makes it easy to implement hashcode/equals without cluttering your classes too much (Eclipse auto-generation tends to be a bit verbose for my taste).

Classes that implement toString are really pleasant to use when doing debugging and logging, but can be a real pain implement. Objects.toStringHelper makes this really easy and also help maintaining a consistent format for printing objects.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
public class Item {
private String id;
private String name;
public Item(String id, String name) {
this.id = id;
this.id = name;
}
public String getId() { return id;}
public String getName() { return name; }
 
@Override
public int hashCode() {
return Objects.hashCode(getId(), getName());
}
 
@Override
public String toString() {
return Objects.toStringHelper(this).add("id", getId()).add("name", getName()).toString();
}
 
@Override
public boolean equals(Object o) {
if (!(o instanceof Item)) {
return false;
}
Item other = (Item) o;
return Objects.equal(getId(), other.getId();
Objects.equal(getName(), other.getName());
}
}

Printing this class outputs something like this.

1
Item{id=1, name=Book}
view raw gistfile1.txt This Gist brought to you by GitHub.

Throwables
Wrapping the original exception object is not always appropriate, because it can cause ClassNotFoundException in the client code if communication occur between unrelated class loaders or if they are serialized on the wire. Throwables can decouple this dependency, still allowing remote clients to see the stack trace by converting it to a string.

1 2 3 4 5
try {
// throws implementation specific exception
} catch (InternalException e) {
throw new ApiException("reason", Throwables.getStackTraceAsString(e));
}

Iterables
Concatenating two separate collections and performing operations on the result can cause a quite a lot of clutter. Iterables to the rescue. Take a minute and think how code might look without Iterables.concat.

1 2 3
for (Item item : Iterables.concat(books, electronics)) {
// do something useful
}

Multimaps
Multimap is like a Map, but allow multiple values to be stored for every key. The following example is a a variant of a typesafe hetereogeneous container using multimap to realize a product catalogue of items.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
public class ProductCatalogue {
private Multimap<Class, ? extends Item> catalogue = ArrayListMultimap.create();
 
public void add(Item item) {
catalogue.put(item.getClass(), item);
}
 
public <T extends Item> Collection<Item> list(Class<T> clazz) {
return catalogue.get(clazz);
}
}
 
ProductCatalogue catalogue = new ProductCatalogue();
catalogue.add(new Book("1", "Book1"));
catalogue.add(new Movie("2", "Movie1"));
// only get books
System.out.println("Books " + catalogue.list(Book.class));
// only get movies
System.out.println("Movies " + catalogue.list(Movie.class));

BiMap
BiMap implement a one-to-one bidirectional relationship between key and value of the Map. Here is an example using language code to get the language and vice versa.

1 2 3 4 5 6
BiMap<String, String> languageCodes = HashBiMap.create();
languageCodes.put("en", "English");
languageCodes.put("fr", "French");
languageCodes.put("zh", "Chinese");
assert "English" == languageCodes.get("en");
assert "en&amp" == languageCodes.inverse().get("English");

Preconditions
Most classes have restrictions on values given them in constructor and methods. Invalid values should be escalated as soon as possible by doing explicit validity checks before execution. It is a lot better to fail-fast than to fail later with an unexpected exception or worse, silently compute the wrong result.

1 2 3 4 5
public Item(String id, String name) {
this.id = Preconditions.checkNotNull(id, "id must not be null");
this.name = Preconditions.checkNotNull(name, "name must not be null");
Preconditions.checkArgument(name.length() < 6, "name must be longer than 6 chars");
}

Constraints
Constraints are similar to preconditions in a way that they can restrict what values are added to a collection. This makes collections much easier to use and code a lot cleaner, since constraints are separated from business code.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
public class Voyage {
private Country targetcountry;
private int capacity;
private List<Cargo> items = Constraints.constrainedList(new ArrayList<Cargo>(), new Constraint<Cargo>() {
@Override
public Cargo checkElement(Cargo cargo) {
Preconditions.checkNotNull(cargo);
Preconditions.checkArgument(targetcountry.allows(cargo));
Preconditions.checkArgument(cargo.getUnits() > 0);
return cargo;
}
});
 
public void load(List<Cargo> cargos) {
items.addAll(cargos);
}
}

Predicates and Functions
Predicates evaluate if something is true or false but can also be combined into more complex evaluations using “and”, “or”, “not” and “in”.

What normally would require a for-loop and bunch of if statements can now be reduced to a one-liner. How sweet is that?

1 2 3 4 5 6 7 8 9 10
Predicate<Item> heavyItemPolicy = new Predicate<Item>() {
@Override
public boolean apply(Item item) {
if(item.getWeight() > 1000){
return true;
}
return false;
}
};
Collection<Item> heavyItems = Collections2.filter(order, heavyItemPolicy);

You can also use Maps.filterKeys or Iterables.filter in a similar way. But keep in mind that removal from modification is bidirectional. e.g. removal from the origin affect result and vice versa.

Functions on the other hand, is a way of transforming one object to another. For example, convert concurrency on a order of items.

1 2 3 4 5 6 7 8
Function currencyConverter = new Function<Double, Item>() {
 
@Override
public Double apply(Item item) {
return item.getPrice() * ANOTHER_CURRENCY;
}
}
Collection<Double> prices = Collections2.transform(order, currencyConverter);

You can also use Maps.transformValues or Iterables.transform in a similar way.

A Query API

I have been think about how to create simple but powerful Fake Objects for some time now. But I dont want fakes themselves to turn into a maintenance burden, so they must be easy to implement. My intuition tells me i need a general purpose state management framework for this to work. And so using predicates, I created a small fluent query interface interacting with an in memory storage.

1 2 3 4 5
InMemoryStorage storage = new InMemoryStorage();
// add a few Item.class objects to storage
Criteria middleprice = field("price").is(largerThan(100)).and(lessThan(200));
Criteria expired = field("expires").is(after(currentDate));
Collection<Item> result = storage.select(middleprice.and(not(expired))).from(Item.class);

I feel quite satisfied with the result actually – short, compact, understandable and typesafe.

Im not going to go into the details here, but please do inspect the implementation of Criteria and InMemoryStorage, as well as the tests.

I hope these examples will trigger you to explore Guava further and use it to make your code more readable, robust and maintainable.

And lastly, I really do hope many of these facilities reach standard Java some day soon.

Categories: coding, Java, principles, testing Tags:

Precarious case of API stability

August 22nd, 2011 2 comments

The NetBeans team just got themselves in a really tricky situation when switching to Java 7.

It turns out the that the Class.getMethods() in Java 7 does not return the methods in a deterministic, but random, order. This causes JUnit tests to be shuffled around for every execution. This may be fine, but the NetBeans team have unknowingly introduced tests that are dependent on each other, and the order in which they are executed, causing their test suite of 8000 tests to fall like a domino.

This is an excellent example of the type of maintenance nightmare that I have warned about earlier. I certainly don’t blame them, they hardly could have anticipated this to happen. I have personal suffering from this myself and know how hard it can be to write independent tests. But this incident convince me even more about the fact that tests should be focused, robust and self-contained. I actually already wrote this down earlier in Open Config Developer Guidelines.

Jaroslav Tulach (who btw also wrote this excellent book) explain the problem in more detail here and you can follow to discussion between him and the JUnit crew on their mailing list.

Categories: coding, Java, principles, testing Tags:

Reuse fanatics

August 3rd, 2011 4 comments

I really enjoy working with JUnit, it has served me well in the past and really appreciate the efforts made into this fantastic framework.

However, I need to write test methods that can be executed for unforeseen circumstances without changing tests themselves. Being able to write tests that can be easily reused for a *combination* of irrelated set of fixture inputs (or pre-conditions if you will).

For example, I want to write JUnit tests that automatically assert success for all possible outcomes using a combination of application servers (weblogic, jboss, glassfish), JPA providers (eclipse-link, open-jpa, hibernate), databases (mysql, oracle, postgresql) and a unspecified set of boundary/randomized input values (possibly also including data provided by a QA database)… without writing redundant tests for these combinations.

Having such a mechanism would make it real easy to introduce additional test fixtures without changing tests, removing duplication and significant test maintenance.

It would be reasonably easy to parallelize tests (and possibly distribute them across machines), if verifying all combinations turns out to take too long time.

I have tried org.junit.runners.Parameterized but its mechanics feels a bit too constraining and verbose for my needs. So I created my own JUnitRunner. I will not go into details of its implementation (you can check it for yourself).

This is a trivial example where each fixture argument is of the same type, but there is no limitation using different fixture types and the number of arguments can vary as needed.

1 2 3 4 5 6 7
@RunWith(JUnitRunner.class)
public class JUnitRunnerDemoTest {
@Test
public void demoFixtureArguments(FixtureExample one, FixtureExample two, FixtureExample three) {
System.out.println("demonstrateFixtureArguments(" + one + ", " + two + ", " + three + ")");
}
}

Assumming each fixture produce two separate sets of input data, the output would look something like this (where each line is a separate test execution).
1 2 3 4 5 6 7 8
demonstrateFixtureArguments(0, 0, 0)
demonstrateFixtureArguments(1, 0, 0)
demonstrateFixtureArguments(0, 1, 0)
demonstrateFixtureArguments(1, 1, 0)
demonstrateFixtureArguments(0, 0, 1)
demonstrateFixtureArguments(1, 0, 1)
demonstrateFixtureArguments(0, 1, 1)
demonstrateFixtureArguments(1, 1, 1)
view raw gistfile1.txt This Gist brought to you by GitHub.

I also plan to implement data generation annotations for standard java.lang types, enabling me to do the following.

1 2 3
@Test
public void demoFixtureArguments(@Boundry @Random(10) Integer one, @Random Date two, FixtureExample three) {
}

And taking this even futher, enabling data generation annotations in Fixture classes.

Ok, Mission accomplished. Moving on to next topic of this post, JUnit Rules.

JUnit Rules enables you to manipulate the test running process itself in various ways. Every rule needs to be activated by having a variable annotated with the @Rule annotation in the test.

1 2 3 4 5 6 7 8
public class DummyTest {
@Rule
public IgnoreLeadingFailure ilf = new IgnoreLeadingFailure();
@Test
public void testTest() {
assertTrue(false);
}
}

This feel a bit too intrusive for my taste, I would like to annotate anything without the burden of @Rule activation.

So I incoorporated this into my JUnitRunner aswell, allowing me to write this instead (no need for activating @Transactional).

1 2 3 4 5 6 7 8
@RunWith(JUnitRunner.class)
public class DummyTest {
@Test
@Transactional
public void testTest() {
assertTrue(false);
}
}

The magic lies in the RuleHandler annotation, which is used as a meta-annotation for annotating @Transactional to indicate what org.junit.rules.MethodRule to run.

I can see the need for creating annotation extension hierarchies aswell, using the same technique to create a chain of MethodRule invocations using a single base annotation.

1 2 3 4 5 6 7 8 9
@Reliable
@RuleHandler(ExampleRule.class)
public @interface ExampleAnnotation {
}
 
@Traceable
@RuleHandler(ReliableRule.class)
public @interface Reliable {
}

These techniques also integrates seamlessly. Sweet!

1 2 3 4 5
@Test
@Reliable
public void demoFixtureArguments(FixtureExample one, FixtureExample two, FixtureExample three) {
System.out.println("demonstrateFixtureArguments(" + one + ", " + two + ", " + three + ")");
}



One final remark…

Im not sure I understand why JUnit limit each test class to be run with *one* JUnit Runner, verbosely indicated @RunWith for every test. I have multiple frameworks that I need to activate simultaneously, such as arquillian, unitils and soon my own stuff.. but cannot because of this constraint. This bothers me.

Im thinking, wouldnt it be possible to discover JUnit Runners at runtime, activating a chain of combined runners, removing the need for @RunWith? Hmmm..

Im not trying to criticise JUnit in any way. I would be more than happy to help contribute with improvements, and I also do realize that I may make hasty and incorrect assumptions of JUnit design principles.

Thoughts?

Categories: coding, Java, testing Tags: