Regularity in Language Mappings
July 25th, 2008 | Filed under Language Mappings | 3 commentsThe Slice language mappings are very natural. And they are very regular. What I mean by that is, if you know one language mapping, you know what to expect for another language mapping.
Does that mean things are always identical? No. There are necessary differences imposed by the target language. For example, for a Slice interface Foo, in C++ you use FooPrx::checkedCast, and in Java and C# you have to use FooPrxHelper.checkedCast. There are also cases where we take advantage of a language-specific feature, such as C# delegates in the new AMI Silverlight mapping. These differences make a mapping more convenient to use, but less regular.
Our general rule is to strive to keep mappings as similar as possible without sacrificing convenience if a language feature would produce a more natural mapping.
Regularity aids in learning and comprehension. As modern-day programmers, we are expected to be multi-lingual. Anything Ice can do to aid comprehension is much appreciated! Unfortunately, the same cannot be said for the Google Protocol Buffers mappings.
The first thing you will notice when using PB is that the generated type name is different in each target language. This is not the case with any of the Slice language mappings, where the target language type name, by default, always matches the Slice type name.
// Slice
module Demo
{
class Hello
{
int id;
};
}
The generated code for this example always produces a set of types in the Demo namespace. For example, in C++, the types Demo::Hello and Demo::HelloPrx are emitted. With PB, the situation is different.
// Person.proto
package tutorial;
option java_outer_classname = "PersonPB";
message Person
{
required int32 id = 1;
}
As with the Slice language mappings, the C++ type generated by protoc is tutorial::Person. However, for Java, the type is tutorial.PersonPB.Person, and for Python, it is Person_pb2.Person.
Unlike the Slice-to-Java mapping, where the translator creates a package containing separate classes, the PB-to-Java mapping produces a single file that contains all definitions for a proto file. I don’t think a single file containing nested classes was a very good choice. For the application programmer, this causes rather lengthy and obfuscated class names. It also forces the addition of the option java_outer_classname construct. Why? By default the class file that is emitted by the protoc compiler has the same name as the proto file. In the example above, a single file tutorial/Person.Java would be generated. However, this file wouldn’t compile since Java does not allow a nested class to have the same name as an outer class. Therefore, by necessity the PB developers added the java_outer_classname option to force the protoc compiler to produce a different file name.
The Slice-to-Python mapping, again, produces a package containing all of the necessary classes. The PB-to-Python mapping produces a single file. For example, Person.proto emits a single file Person_pb2.py. The package specified by the protocol definition has no effect on the generated package name. Personally, I find this very unexpected, and not at all desirable. For example, consider a change to the name of the protocol file. Now, at the very least, all of the import statements in your code must change. Far from ideal!
With the Slice language mappings, identifiers and methods contained within an interface, class or struct are always the same. For the example above, a class Hello is emitted containing the member variable id. If you know one Slice mapping, you know them all.
For the PB language mappings, the identifiers emitted by the various mappings are, surprisingly, different for each language. In C++, the variable id maps to the member functions Person::has_id, Person::get_id, and so on. In Java, the methods are named Person.hasId and Person.getId, and in Python the mapping uses Person.id and Person.HasField(”id”). Each of these is different, and worse, for no obvious reason. Once again, knowledge of one mapping does not impart knowledge of any other.
When it comes to creating instances of the various classes, the situation is, again, different in each language. For C++ and Python, you create an instance of the class and populate the members, but Java has a totally different mechanism. The Java mapping uses a factory pattern (called a Builder), where you construct a Builder and then use it to create an immutable message instance. For example:
tutorial.PersonPB.Person p = tutorial.PersonPB.Person.newBuilder().
setId(1).build();
I’m not saying that the builders are bad—they are not. Because of builders, it is impossible to create an un-initialized message instance, which is a good thing. However, why the different paradigms? What is good in Java would also be good in C++ and Python.
As you can see, the Slice language mappings are regular, and as close to each other as possible. This wasn’t an accident. We are all very aware, as daily users of our own product, that this is important. It is a pity that the PB mappings don’t take the same approach.