|
September 4th, 2008 | Filed under News |
No comments
Jose and Mark have just written a second article on our chat demo. This installment shows you how to build GUI clients for the chat demo in C# and Java, as well as how to build web-based clients with PHP or Silverlight.
The articles don’t just explain how to build a chat application, but discuss design trade-offs that confront many Ice applications, and how to choose an appropriate design. I suggest you give this a read—chances are that you will find it useful in your day-to-day development even if you are not into writing chat applications.
July 31st, 2008 | Filed under Distributed Computing |
10 comments
Before I delve further into the merits and demerits of RPC, SOAP, REST, and so on, I want to take a look back at the history of distributed computing. Not because I’m a connoisseur of history, but because that history has shaped much of the technology we see today and, without that history, today’s distributed computing options would likely be different.
First Steps
Back in the eighties and early nineties, we saw the first few steps toward a kind of distributed computing that did not require developers to create half a networking stack on top of raw sockets. Sun ONC, Apollo NCS, and DCE marked these early steps. These technologies were a big improvement over sockets, and certainly made the job of writing a distributed application a lot easier. However, they also had their share of drawbacks, among them that they only provided a C API, and that much of the underlying network semantics were still exposed (such as the distinction between TCP/IP and UDP, or the need to explicitly deal with domain names and port numbers). In addition, these technologies were not widely available.
A Step Forward
During the nineties, the industry witnessed the rise of C++ (together with other object-oriented languages) and anointed OO programming the latest silver bullet. (It was impossible to open any trade magazine in those days without finding "OO" in the title of just about every article.) The interest in OO also brought about a shift in distributed computing. In particular, we saw increasing research in component models and, naturally, distributed component models. This led to a whole raft of new technologies, particularly on the Microsoft side, which created DDE, OLE, OLE Automation, COM, ActiveX, DCOM, COM+, and MTS. (Isn’t it wonderful how developers were bestowed with successive new technologies? The list was later extended with .NET Remoting, which in turn was superseded by WCF.)
Meanwhile, the rest of the world wasn’t asleep either… DEC morphed DCE into ObjectBroker, IBM created SOM and, later, DSOM, Trinity College created the beginnings of Orbix, ILog invented ILog Broker, Post Modern released Black Widow, HP developed ORB Plus, and Expersoft came up with Visibroker (among others).
So, there was plenty to choose from on both sides of the camp. While Microsoft continued to release a new spin on its distribution technology every few months, the other vendors closed ranks and joined the OMG, which published CORBA. During that time (the mid- to late-nineties), the industry witnessed a fierce battle between DCOM and CORBA—the Microsoft world mostly stuck with DCOM (although many CORBA implementations also ran on Windows), while the rest of the world went ahead and used CORBA.
A New Stomping Ground
By the late nineties, it had become clear that Microsoft were losing their battle with CORBA: CORBA could parade a large number of high-profile success stories, while there was never even a single large-scale or mission-critical DCOM application. (Even on Windows, CORBA had begun to displace DCOM, due to its platform-agnostic nature—by then, CORBA actually ran on more Windows versions than DCOM.)
Concurrently with these developments (beginning around 1995), a number of things happened that, together, reshaped the entire computing industry:
- The world-wide web began its meteoric rise.
- HTML created the world’s first truly platform-independent user interface.
- Sun invented Java and created the world’s first popular platform-independent programming language.
- The common household discovered the Internet.
Taken together, these factors meant that, suddenly, there were mega-dollars to be earned with distributed computing whereas, only a few years before, computer networking was strictly an activity for nerds. (As late as 1993, Bill Gates had said "The Internet? We are not interested in it.") All of a sudden, distributed computing had become the key to commercial success—there was no way that Microsoft was going to cede the distributed computing market to CORBA.
The explosion of the web and the ensuing high-tech bubble meant that anything based around web technology was in. (The OO silver bullet of the early nineties had given way to the XML silver bullet of the late nineties.) Rather than fight a battle they could not win, Microsoft created a completely new battlefield: the new paradigm was to be distributed computing based around the web and XML, and it begat SOAP in 1999, followed by Web Services shortly thereafter. Because the shine had worn off both DCOM and CORBA, many of the distributed computing vendors flocked to WS as the saviour of their ailing businesses. (Microsoft discontinued DCOM shortly thereafter, and CORBA had dwindled into insignificance by 2003.)
A Step Backwards
The shift to SOAP and WS was a severely retrograde step for a number of technical reasons:
- Many years of software engineering experience had demonstrated that encapsulation and separation of interface and implementation were key strategies in controlling cost and complexity. SOAP threw all this hard-earned wisdom out the window: there were no objects, there was no encapsulation, there were no standardized APIs and language mappings, and the data, once again, reigned supreme.
- SOAP and WS provided an excuse for hoards of developers with no prior distributed computing experience to reinvent the wheel (something that this industry loves to do more than anything else). Not only did they reinvent the wheel, but they reinvented it badly, repeating many of the mistakes of the past, while adding a considerable share of new mistakes that no-one had ever thought of before.
- The bandwidth requirements of XML made it staggeringly wasteful; SOAP messages require twenty to more than one hundred times the bandwidth of the average binary protocol.
- XML is expensive to marshal and unmarshal; it requires hundreds of times the number of CPU cycles needed by a binary protocol.
- SOAP was so retarded and low-level that the marketing machinery had to be called into action. It duly bestowed us with an entire new service-oriented architecture ("SOA"), which was necessary to sell the botched technology to the industry.
In addition, proponents of SOAP made various claims that did not stand up to scrutiny, such as the importance of having a character-based protocol (as opposed to a binary one), the self-describing nature of XML, the inherent security gained by sending everything through port 80, and the loose coupling of service-oriented applications. (I will return to these topics in more detail in future posts.)
A Step Sideways
While the SOAP crowd was busy re-inventing a square RPC-wheel, another paradigm shift started to take hold: representational state transfer (REST). In contrast to RPC and SOAP/WS, REST is not a technology, but a set of architectural principles. In a nutshell, it argues that the success of the web is due to a number of constraints that guided its design. By making these constraints and designs explicit, we can arrive at an architectural style that benefits scalability, extensibility, and performance.
Like SOAP, REST was strongly inspired by the web, and its principles are closely linked to web technologies, such as HTTP, URIs, caching, and processing of documents by intermediaries. Like the web, REST focuses on "Internet-scale distributed hypermedia interaction", that is, it models distributed computing as an exchange of documents.
Where We Stand Today
Today, we can separate general-purpose distributed computing into three different camps.
- The RPC camp. This camp is populated by applications that use various forms of RPC, such as CORBA or Ice.
- The web camp. This camp is populated by applications that use HTTP as a substrate. It includes applications built around web browsers, HTML, XML, applets, servlets, SOAP, and web services.
- The REST camp. This camp is populated by applications that apply RESTful principles to a variety of technologies (even though a majority use the web for their implementation).
These three broad categories capture most of the action in commercial and enterprise distributed computing. (Of course, there are many other technologies in use, such as SCADA, RTCP, or home-grown technologies; but these are not a part of general-purpose, B2B and e-commerce distributed computing.)
Now, regardless of the various merits of these camps, it is important to realize why they exist:
- Historically, the RPC camp came first. As developers applied the lessons of the past, RPC improved with successive iterations of the technology and managed to accumulate a number of considerable successes.
- The web and REST camps came second. The success of browsers, Java, XML, and the exploding e-commerce market inspired REST, and provided a convenient wave for SOAP, WS, and SOA to ride.
Another important point here is that the web camp exists not necessarily because of any inherent technological advantage over the RPC camp. Historically, technology had little to do with the use of the web for distributed computing. Instead, the web camp exists because of political manoeuvring, fighting for market dominance, opportunism, and the desire to leverage an industry trend. If there is a technological advantage for the web camp, that advantage is accidental because technological excellence could not have been further from the participants’ minds at the time they created the schism.
Where We Are Going
Clearly, these distributed computing camps are here to stay for the foreseeable future. I simply cannot see a world where the majority of distributed computing would be built around RPC, or a world where most applications would be built around SOAP/WS, or a world where distributed applications would be exclusively RESTful.
To declare one camp "good" and another camp "bad" is naive and falls into the trap of searching for yet another silver bullet. As with all designs and technologies, there are trade-offs, and correctly matching these trade-offs to the requirements of a particular project is what separates "good" from "bad"—not a belief in the superiority of one camp over another.
With that in mind, we can delve deeper into these trade-offs, which I will do in future posts.
Cheers,
Michi.
July 22nd, 2008 | Filed under Distributed Computing |
11 comments
Steve Vinoski has been busy trying to convince the world at large that RPC is “fundamentally flawed”. I think it is interesting to take a look at RPC and see what those fundamental flaws are (and whether there are flaws, for that matter). Doing this will definitely take more than one post, so don’t expect the answers all at once. I will deal with various aspects of the topic over a number posts over the next few weeks, so please bear with me.
Before we can delve into the details, let’s take a look at a definition of RPC. In one of the comments to his blog post, Steve states that RFC 707 defines RPC. Having just read trough that RFC again (twice), I cannot find a definition of RPC anywhere in that document. What I can find is an outline of a protocol that allows a client to ask a server to perform some work and to get results back from the server. But that’s pretty much it. There is no mention of specific APIs, there is no mention of interface definition languages, and there is no requirement for specific interaction models. (RFC 707 explicitly states that interactions need not be synchronous and that the run-time environment can provide a non-blocking interaction model.)
The protocol that is described by RFC 707 is remarkably simple: just two message types, request and reply. This is attractive but, as we have found out in the intervening time, inadequate. For example, for a connection-oriented transport, we also need a message that confirms the acceptance of a connection; without such a message, the protocol can violate at-most-semantics, which are important for operations that are not idempotent. The importance of at-most-once semantics and idempotent operations is another thing that we have learned in the intervening time. (RFC 707 does not mention either.)
How much time? RFC 707 was published in January 1976, more than thirty-two years ago. To put things in perspective, that was at a time in computing history where RPC (and networking, for that matter) were truly in their infancy. People were taking their first few hesitant steps toward distributed computing back then: TCP/IP had only just been invented and there was no such thing as distributed computing as we understand it today—what counted as distributed computing back then were protocols such as telnet and ftp. (DCE did not exist, and even UUCP and email had not been invented yet.)
While RFC 707 defines one of the earliest protocols for RPC, we today know this protocol to be inadequate. Moreover, a lot has happened in the intervening years; RPC today is a far cry from “RPC” back in 1976, and RFC 707 is of interest mainly as a historical document that has essentially no relevance to modern middleware. So, let me see whether we can find a more current definition.
Wikipedia has an entry for RPC. That entry also refers to RFC 707, but states that RFC 707 “describes” RPC (which is not the same as defining it). The entry then offers the following explanation:
An RPC is initiated by the client sending a request message to a known remote server in order to execute a specified procedure using supplied parameters. A response is returned to the client where the application continues along with its process. […] While the server is processing the call, the client is blocked.
“While the server is processing the call, the client is blocked.” Huh? Only a few lines earlier, the same article references RFC 707, which explicitly states that the interaction can be non-blocking. I guess we are running into the limitations of Wikipedia—there is only so much quality control that can be applied to the various articles.
So, let’s turn to whatis.com. It says:
Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details. (A procedure call is also sometimes known as a function call or a subroutine call.) RPC uses the client/server model. The requesting program is a client and the service-providing program is the server. Like a regular or local procedure call, an RPC is a synchronous operation requiring the requesting program to be suspended until the results of the remote procedure are returned.
Hmm… The same error as with Wikipedia here: the entry claims that interactions are synchronous when, according to RFC 707 (and lots of past and current RPC implementations), they need not be. (One wonders whether one copied from the other.)
I did many more searches and checked a number of books, and what they turned up was very similar to the preceding “definitions” for RPC. The common theme is:
- One program is active and issues a request for service.
- Another program passively listens for requests and acts on them.
- When the service is complete, the service-providing program can return results back to the originator of the request.
Now, this is nice as far as it goes, but it hardly is a definition of RPC. Instead, it is a description of basic principles. But that description is so loose, just about anything fits it, including Ice, DCE, CORBA, SOAP, and REST.
Yet, we all seem to somehow know what RPC is and is not:
- RPC is about procedure (or, for object-oriented RPC, method) calls. In other words, at the API level, the interaction feels like a procedure or method call (whether synchronous or asynchronous).
- RPC platforms require a contract that defines the procedures or methods, including the types of data that are exchanged as parameters and return values.
It seems to me that this gets a little closer to what RPC is about: we have APIs that mostly hide the grunt work involved in communicating over a network, and we have a formal contract that establishes a type system.
Typically, a compiler that creates stubs and skeletons from an interface definition language generates the API and enforces the contract; however, other ways to define the contract (such as reflection) can be used. Either way, the generated code or reflection takes care of marshalling chores.
Note that, for modern RPC, a static contract is only one way of doing things. For example, Ice and CORBA provide dynamic invocation and dispatch that allow you to use RPC without an interface definition and without generated code.
Of course, the preceding description is still a far cry from a proper definition. But at least it gets us in the right direction: RPC systems take care of networking chores and—at least much of the time—use generated code that gets data onto the wire and back off the wire again. (If anyone is aware of a more detailed and/or rigorous definition of RPC, please let me know; I’d be very interested to see it.)
To what extent various technologies, such as Ice, SOAP, and REST fit (or do not fit) that description is something I will explore in future postings.
Cheers,
Michi.
July 21st, 2008 | Filed under Design and Usability |
16 comments
I am one of the people who were lucky enough to get an Apple iPhone 3G on its release day. Prior to the iPhone, I used a Sony-Ericsson S700i, which I was never happy with: it had a clunky user interface, tortured data entry via the keypad, and essentially useless contact and calendar functions.
I depend a lot on a reliable calendar that allows me to track my appointments and reminds me when I have to be somewhere. Back in 1997, I bought a PalmPilot Professional, which had nice contact management, calendar, and expense tracking functions, all of which I use regularly. I have carried various Palm organizers ever since.
And here is the snag: I have also carried a mobile phone ever since, and it has often annoyed me that I have to carry two gizmos, both of which need their batteries charged, need to be synchronized, require software updates, and so on.
The iPhone 3G is the perfect stone for killing two birds: it has a nice user interface and is much easier to use than either the Sony-Ericsson or the Palm, and the iPhone comes with calendar and contact management functions. Being able to replace two devices with a single one was enough for me to queue up and buy the iPhone.
Getting the iPhone to work and copying all my existing data onto it was a breeze. It really is as simple as plugging it in, setting a few preferences in iTunes and, bingo, after the first sync, all my calendar data appeared on the iPhone. Awesome!
Now, it is almost two weeks later, and the shine has worn off. The iPhone is great, and most things work well and are well designed. (We can quibble here and there about things such as the inability to use voice-dialling or not being able to record video, but those things are not all that important to me.) However, the one thing I need most from my iPhone, namely its calendar, turns out to be the worst-designed application on the entire phone.
After syncing the iPhone for the first time, I looked at the calendar. A few things immediately jumped out:
- The month view shows the week starting on Sunday and ending on Saturday. This is not the way things are done in Australia, where the week starts on Monday.
- There is no way to search my calendar so, if I want to find out when my next dentist appointment will be, I have to scroll through all entries to find it.
- There is no way to enter monthly repeating appointments for a particular week. For example, it is impossible to say “the second Tuesday of every month”.
In contrast, Apple’s iCal application for OS X allows me to easily do all of these things.
Because I tend to get absorbed in whatever I’m doing (especially when I’m programming), it’s easy for me to miss an appointment, which is why I like to set an alert for most appointments. However, when I entered an event, I found that I have to set the alert manually. There is no way to set a default alert period. Again, iCal can do this—for example, I can ask it to add a default alarm 30 minutes before each event. There is no reason the calendar on the iPhone could not do the same.
That night, I went to bed early and was reading a book, when my iPhone suddenly went “bling bling”. What the…? When I looked I found that it was reminding me of an all-day event for the following day. At 11:30 at night, when I’m possibly asleep already? The culprit is not the iPhone, but iCal: iCal applies the default alert period to all events I enter, including all-day events that do not have a start time. Which means that, unless I explicitly disable the default alarm for each all-day event, I get the alert the night before, at 11:30pm.
The next day, I was out and about and happened to glance at my iPhone, only to realize that I was fifteen minutes late for an appointment, even though the appointment was in my calendar and had a 30-minute alert. And the iPhone did indeed sound that alert. However, what I got was one measly ”bling bling” that lasted for all of a second. I missed the alert because I was in a noisy environment.
My first thought was “I can install a different alert sound that lasts longer” but, digging through the calendar preferences, I quickly found out that I can’t. The alert sound is not customizable (even though ringtones are), so I’m stuck with that short “bling bling”.
I also looked for a way to set repeating alarms. For example, on my Palm, I can say ”alert me every five minutes for up to 40 minutes” and my Palm will ring an alarm that lasts several seconds every five minutes. No such feature on the iPhone’s calendar: all i can do is set a second alert, so I can have an alert, say, fifteen minutes before an event, and another one five minutes before. But that is all I can do, I have to enter the alerts manually for every event, and both alerts are that single useless “bling bling” that I can hear only if I’m in a quiet environment.
After these disappointments, I experimented some more and found other unpleasant things:
- If I use my iPhone to listen to music at low volume level, calendar alerts are played at that same low volume level.
- If I use my iPhone with headphones and put it down for a while, calendar alerts play through the headphones where I can’t hear them.
Now, how is that for brilliance? There is no option to always play alerts through the speaker, whether the headphones are plugged in or not! If I don’t want to miss any appointments, I have to religiously unplug the headphones and turn the volume back up all the way every time I put down the phone. But even doing that doesn’t help much because that single “bling bling” is easily missed, for example, if I leave the room for a few minutes.
It is obvious that the designers at Apple have never used their own phone, at least not to remind them of appointments. Presumably, either a lot of people at Apple are missing their appointments, or they do what smart people do and carry a Palm instead…
A little googling reveals that these problems have been present since the original iPhone, and that many people have complained about them. Yet, Apple chose to do nothing about these very real problems for more than a year.
Not one to give up easily, I decided “well, what Apple won’t do, I can do easily enough.” Writing a calendar application isn’t all that hard, so I decided to write my own. I downloaded the iPhone SDK and started reading the documentation. It took about two minutes for me to find the following in Apple’s Audio and Video Coding How-To:
- How do I control playback level?
- On iPhone, the user controls playback level using the hardware volume control.
- How do I access the hardware volume controller?
- Global system volume, including your application’s volume, is handled by iPhone OS and is not accessible by applications.
- How do I control where sounds play (built-in speaker, dock connector, headphones)?
- iPhone OS sends audio to the appropriate device, depending on user preferences.
Well, so much for my ambitions to write my own calendar with decent alerts. It’s not possible to do what I want to do (unless I’m Apple). (I know that this is not a hardware limitation because when the phone rings or an SMS arrives, the sound indeed plays over the speakers at a preset volume level, even when I leave the headphones plugged in.)
How is it possible for a company that prides itself on ergonomics and ease of use to fall this far short of the mark? This is design at its worst: it adds alerts, supposedly to help people keep their appointments, but does not bother to check whether the feature actually serves its intended purpose. To top it off, customer complaints about this very real problem are ignored.
Sadly, I shall continue to carry my Palm, and I shall continue to live in the hope that, one day, someone will make a single device that can make both phone calls and remind my of my appointments. (I’m not holding my breath though.)
Now, what does all of this have to with Ice? Well, on the surface, nothing. I wrote this entry mainly because I passionately care about good design and usability, and because I believe that a company that does not listen to its customers is losing the plot, and losing it fast. (And venting my spleen made me feel better…)
But, come to think of it, this story has actually quite a lot to do with Ice: ZeroC does listen to its developers and the suggestions they make. In fact, many features of Ice exist only because our developers pointed out that what we did wasn’t good enough. On top of that, we eat our own dog food: we use our own software and, believe me, when one of us finds something that is awkward to use or does not work as it should, we are not shy in pointing out to each other that it sucks and needs fixing.
There are two overriding rules when it comes to design:
Though shalt use what you design yourself.
The proof of the pudding can only be found where the rubber meets the road, that is, in actual use—not with a test bench setup.
Though shalt listen to your customers’ complaints.
Just because I think that my design is fine, that does not mean that other people think so too—chances are that they will use my design in ways I have never thought of.
Here at ZeroC, we keep both rules firmly in our minds.
Cheers,
Michi.
July 17th, 2008 | Filed under Distributed Computing |
2 comments
A Note on Distributing Computing is among the most widely quoted papers on distributed computing. While I agree with much of what Jim Waldo et al. wrote, there is quite a bit I find myself disagreeing with, so here is “Another Note on Distributed Computing”, to iron out a few misconceptions.
What is right
Here is a quote from the paper:
Programming a distributed application will require the use of different techniques than those used for non-distributed applications. Programming a distributed application will require thinking about the problem in a different way than before it was thought about when the solution was a non-distributed application.
I could not agree more: I’ve been preaching for years that, if you go and design the APIs for a distributed application the same way as for a non-distributed one, you are likely to fall flat on your face. Waldo et al. cite a number of reasons for this, among them:
- Latency issues cannot be ignored.
- Yes, that is absolutely correct. Little surprise when one considers that a remote invocation is around four orders of magnitude (that’s 10,000 times) slower than a local invocation.
- It is impossible to provide uniform memory access for both local and remote objects.
- Correct. Even if we were to provide a programming model that allows completely transparent access to local and remote memory at the programming language level, the different error semantics of local and distributed access would create non-uniform semantics.
- Distributed invocations are subject to partial failure.
- Correct, they are. If a server goes down unexpectedly, and the invocations executing in the server at the time of the failure are not completely stateless, the system as a whole will be in an indeterminate state, which makes it harder to recover from a failure on re-start.
- Concurrency adds additional failure modes due to indeterminism.
- Correct. For example, depending on how operations are invoked by the client and how they are dispatched in the server, it is possible for sequential invocations made by a single client to be processed out of order in the server.
- Distributed invocations provide a fundamentally different quality of service.
- Correct. It is harder to create robust distributed applications than non-distributed ones. That should not come as a surprise, seeing that there are many more ways for the former to fail.
What is not so right
So far, I have not actually disagreed with anything, so it’s time to look a bit deeper…
Waldo et al. write that early distributed systems, such as CORBA, Arjuna, Emerald, and Clouds strived to provide a seamless view of distributed objects, such that “there is no essential distinction between objects that share an address space and objects that are on two machines with different architectures located on different continents. In such systems, an object, whether local or remote, is defined in terms of a set of interfaces declared in an interface definition language.”
The authors do not make it clear what they mean by “share an address space”, and do not further explain what they mean by “local” and “remote”. To talk meaningfully about this vision of unified objects, we need to be clear about what kinds of objects there actually are:
- Remotable objects. These are objects that can (but need not) reside in a different address space. If they do reside in a different address space, they can be reached only via inter-process communication, such as by sharing memory or sending messages over the backplane (for same-machine communication) or over a network (for communication with objects on other machines). If a remotable object is in the same address space, it offers the same interface as if it were remote. It just so happens that it can be reached via a more efficient communication mechanism.
- Local, language-native objects. These are the objects that come built into the programming language, such as C++ or Java objects. These objects have nothing to do with distribution.
For remotable objects, the implementation of operations is hidden behind their interface and, as far as the caller of an operation is concerned, the same API is used to invoke the operation, regardless of the actual location of the object (local or remote). However, that API is not necessarily the same as the API for a language-native object.
The authors go on to say that “The vision is that developers write their applications so that the objects within the application are joined using the same programmatic glue as objects between applications.” This suggests that the authors, when they talk about local objects, actually mean language-native objects. However, they also say that local objects have an interface declared in an interface definition language, which suggests collocated remotable objects.
Now, while it is true that systems such as CORBA and Ice indeed strive to make distributed computing as frictionless as possible and to make a remote operation as easy to call as a local or language-native one, they do not try to paper over the difference between language-native objects and remotable objects. It just so happens that, if an object can be called remotely, it can be called the same way whether the object happens to be collocated in the same address space or not.
This does not mean that a language-native object can be called the same way as a remotable one, or vice versa. In particular, in systems such as CORBA and Ice, remotable objects have a type that differs from the type of any language-native object, and pointers (or references) to remotable objects cannot be used interchangeably with language-native ones. In particular, invocations on remotable objects are made via proxy types (such as Ice’s Prx types), and these proxy types are not type compatible with a language-native pointer or reference.
Similarly, CORBA never suggested that all objects within an application should have an IDL interface. In fact, CORBA makes a very clear distinction between objects that have an IDL interface (and, therefore, can be made remotely accessible), and objects that do not (and, therefore, are part of the implementation, not interface, of an application).
In fact, neither CORBA nor Ice ever attempted to provide a unified vision of objects. Instead, they make it easy to call and implement remotable objects regardless of whether client and server are collocated or not. This is a far cry from saying that local and remote objects are the same, or that they can be treated as if they were the same.
The authors then assert:
Writing a distributed application in this model proceeds in three phases. The first phase is to write the application without worrying about where objects are located and how their communication is implemented.
What? Where on earth do they take this from? I cannot recall a single instance where anyone with even the least shred of credibility claimed such a thing, even in the dim-distant days of DCOM and CORBA. There is a big difference between making it easy to call a remote operation, and claiming that, because remote operations are easy to call, we can ignore object location when we design an application. If this is how people start out writing their applications, they are guaranteed to fail, and that fact has been well known and well documented for at least 15 years.
The second phase is to tune performance by “concretizing” object locations and communication methods.
No, most definitely not. If I design the interfaces to my application as they say in phase 1, it is highly unlikely that any amount of performance tuning will save the day. True, performance tuning is necessary for distributed applications, just as it is for local ones. But the preceding two phases are tantamount to saying “Write your application any which way you like, with complete disregard of distribution, and you can fix things in phase 2.”
To suggest that such an approach could actually work is disingenuous, to say the least and, again, I am not aware of anyone with any reputation whatsoever having made such a claim, not now, and not 15 years ago.
The final phase is to test with “real bullets” (e.g., networks being partitioned, machines going down). Interfaces between carefully selected objects can be beefed up as necessary to deal with these sorts of partial failures introduced by distribution by adding replication, transactions, or whatever else is needed.
This, in all seriousness, suggests that I can succeed in dealing with partial failures after I have designed the application with complete disregard of distribution, and after I have carefully tuned its performance, only to then even start worrying about partial failure semantics and remedies to them. Clearly, this is utterly ridiculous—no-one in his right mind would do this.
Now, don’t get me wrong—I don’t for one moment believe that Jim Waldo and his co-authors actually believe these things. In fact, the second half of their paper makes it abundantly clear that they do not.
But why do they talk for the first nine of fourteen pages about something that no-one in his right mind ever believed in the first place, either now, or a long time ago? And why do they say that CORBA pretended that a distributed application could be written like a non-distributed one when, to the best of my knowledge, no-one even half-way competent ever made such a claim? To me, the answer is that they want to set the stage for the conclusion of the paper. In other words, the first part of the paper softens the ground for the second part.
Waldo et al. go on to cite NFS as an example of the consequences of ignoring the distinction between local and distributed computing at the interface level. They point out that, in a sense, NFS was doomed because it either provides non-transparent semantics to applications with soft mounts (which causes applications to fail in unexpected ways), or provides transparent semantics with hard mounts (which causes applications to hang in unexpected ways). Neither alternative is palatable because each leads to failures in the distributed case that simply do not happen in the local case.
They also correctly point out that the problem can be traced to the interface level: because NFS retained the original Unix system calls for file I/O, the catch-22 of NFS is inevitable. But, so what? All this shows is that it is a stupid idea to build a distributed application as if it were a non-distributed one.
The authors go on to say that
A better approach is to accept that there are irreconcilable differences between local and distributed computing, and to be conscious of those differences at all stages of the design and implementation of distributed applications.
Yes! That is exactly how we should build distributed applications. We cannot forget—ever—when we are dealing with distribution and when we are not. That is true regardless of the technology we use for distribution, regardless of the specific APIs, and regardless of the underlying protocol. The differences are due to distribution itself, not due to any artifact of design or implementation.
What is wrong
Now we get to the part of the paper where Waldo et al. jump to seriously wrong conclusions. Let me quote a few key passages.
Rather than trying to merge local and remote objects, engineers need to be constantly reminded of the differences between the two, and know when it is appropriate to use each kind of object.
This is a trivial truism, and hardly worth mentioning. Of course engineers need to know when it is appropriate to use a remote object. In fact, engineers need to know not only when it is appropriate to use a remote object, they also need to know when it is appropriate to use any object. That is, engineers must know not only about the side-effects of remote invocations, they must know about the side effects of all invocations, whether they are remote or not.
Whenever I call any function or method, I must be aware of the potential consequences of doing so:
- I must know the performance characteristics of the function—O(log n), O(n), O(n²), or whatever.
- I must know whether the function performs disk I/O, reads user input, or may attempt to acquire a lock.
- I must know what state the function may leave the system in if something goes wrong. Does the function provide the strong or weak exception guarantee? What is the state of its in- and inout-parameters when something goes wrong?
In fact, just about everything I need to think about for a remote invocation, I also need to think about for a local invocation:
- Even if everything works perfectly, many local function invocations take a lot longer than many remote invocations. And many local function invocations take just as long as a remote invocation for the same data. (Just think of sorting a large set of values—the performance difference of the local and remote case is negligible.)
- A local invocation can block or take a long time just as much as a remote invocation can. For example, any invocation that does disk I/O can block (potentially indefinitely, as Waldo et al. point out themselves). Similarly, any operation that gets input from a user or attempts to commit a transaction can block for extended amounts of time. And, of course, any operation that attempts to acquire locks can block and, depending on the exact operation, can block for an extended period.
- There are many local APIs that provide neither the strong nor the weak exception guarantee. For example, most APIs that perform I/O leave the system in an indeterminate state when something goes wrong during a write (unless we use transactions). Similarly, almost all libraries I have ever seen fail in weird ways in the face of memory starvation. If I am lucky, my process will crash and I’ll at least know that something went badly wrong. But, quite often, the programmer who wrote the code wrote it with a mind-set of “Memory never runs out.” What happens when memory does run out is anyone’s guess—it is not unusual for a program to survive temporary memory starvation, but to leave partially updated data structures behind that may (or may not) cause the program to fail later.
The point is that Waldo and his colleagues discuss things such as performance and partial failure in great detail when, in fact, that discussion is largely orthogonal to distributed computing. That is because the same issues arise in the local case as well. Not as frequently maybe, but they do arise and, when they do, all the same issues come up as for a distributed system.
Note that even what looks like a non-distributed system may turn out to be distributed. In fact, any program that reads from a (local) disk and writes data back to that disk is distributed. Not distributed in space, but distributed in time: if a previous incarnation of the program crashed while it was writing to disk, the next incarnation of the program has to make sense of the mess that its predecessor left behind. This is little different from recovering a distributed system after a crash: either way, one side has to make sense of the mess that was left by “the other side”.
A compiler for the interface definition language used to specify classes of objects will need to alter its output based on whether the class definition being compiled is for a class to be used locally or a class being used remotely. For interfaces meant for distributed objects, the code produced might be very much like that generated by RPC stub compilers today. Code for a local interface, however, could be much simpler, probably requiring little more than a class definition in the target language.
In other words, the APIs for local and remote objects should be different and local APIs “could be much simpler, probably requiring little more than a class definition in the target language”.
This statement is factually incorrect. For one, with a modern platform such as Ice, if a remotable object is collocated, call dispatch to it is essentially as efficient as for a language-native call. (If you know that an object will be called only locally, you can tell the compiler to get rid of unnecessary marshaling and dispatch code.) Second, with Ice, native APIs cannot be much simpler than remote ones. That’s because remote invocations are already as simple as native ones, and because implementing an interface already requires little more than a class definition in the target language. (In defense of the authors, at the time they wrote their paper, things were not as elegant as they are with Ice today.)
But here is where Waldo et al. really go off the deep end:
While writing code, engineers will have to know whether they are sending messages to local or remote objects, and access those objects differently. While this might seem to add to the programming difficulty, it will in fact aid the programmer by providing a framework under which he or she can learn what to expect from the different kinds of calls.
- Translation:
- If we give fundamentally different APIs to local and remote objects, that will help programmers write better distributed applications.
I am stunned how the authors can possibly arrive at this conclusion, especially in light of what they so lucidly explain in the first part of their paper. The premises in no way support the conclusion; there simply is no logical link between them. The whole argument reads like:
All Greeks have beards.
Socrates was a Greek.
Therefore, income tax increases will stimulate the economy.
In fact, the authors themselves explain that much of the difficulty of writing distributed systems stems from problems that have nothing to do with any specific API. And yet, somehow, an API for remote calls that differs from the API for local calls is going to “aid the programmer” and solve all our distributed computing problems? Hardly.
What gets me is how patronizing (if not insulting) this conclusion is to programmers. Do Waldo et al. really believe that programmers who write distributed systems are so naive that they need a different syntax that “constantly reminds” them when they are making a remote call? That is giving distributed programmers a lot less credibility than they deserve. (Not to mention that, as we have seen, programmers must be aware of the consequences of making any call—whether local or remote—anyway.)
But there is something else that Waldo et al. apparently did not consider. Let us suppose for a moment that I have what they ask for, and that any remote invocation has to be enclosed in a remote_call function or macro. (Let’s not quibble about the exact syntax—the point is that there is some syntactic reminder that a call is remote.) Now I write something like this:
public class Person
{
public void
updateAddress(Address a)
{
_person.remote_call("updateAddress", a);
}
private RemotePerson _person;
// ...
}
The caller of this person object can now write:
Person p = new Person();
Address a = new Address();
// ...
p.updateAddress(a);
As far as I can see, the authors’ suggestion dies right there: as soon as I make any remote invocation inside another function or method, that function or method itself now must be called like a remote function, otherwise the syntactic marker that is supposed to “remind the programmer” is lost. In other words, the “remoteness” of invocations is transitive and very quickly permeates a program at almost all levels. But in turn, that greatly diminishes the already dubious value of a syntactic marker. Instead, it creates constant overexposure to an inconvenient syntax without any benefit.
Conclusion
In the introduction to their paper, the authors say that a “unified view of objects is mistaken”, and then proceed to arrive at the recommendation that “engineers need to be constantly reminded of the differences” between local and distributed computing. I do believe that is indeed good to remind engineers of the difference. And platforms such as Ice do exactly that, but in a way that does not get in the way of programming. For example, if I want to pass a proxy to a remotable person object to a function, I have to declare the function as follows:
void doSomethingWithPerson(PersonPrx person);
Because the remotable version of a person has a type PersonPrx, and that type differs from and is not compatible with any language-native type Person, the act of passing a remotable object is made explicit, and there can be no doubt as to what kind of object we are dealing with. That is all the reminder the programmer needs.
As far as the unified view is concerned, it is not mistaken, at least not for remotable objects. Whether a remotable object is collocated or not should not matter at the point of call, and should not matter in the implementation of the object. Keeping the two the same does not provide a unified view, but location transparency. And location transparency is important. For example, moving out-of-process objects in-process is possible only with this transparency. (Anyone who has ever turned a stand-alone Ice server into an Icebox service will know how easy this is, and that it intrinsically relies on location transparency.) Another advantage of location transparency is that programmers do not constantly have to deal with different syntax and can put their attention where it is needed, namely on the application semantics.
A unified view is mistaken if it attempts to paper over the difference between language-native and remotable objects, or tries to pretend that programmers can treat remote objects the same way as local ones (whether remotable or not). But neither CORBA nor Ice ever tried to do this, and neither system is unified in that sense.
Distributed computing is hard enough as is, and Ice does its best to not make it harder still. But, ultimately, API style has little to do with the real reasons for why distributed computing is hard. What we need to accept, first and foremost, is that—regardless of APIs, technologies, whether interactions are synchronous or asynchronous, and whether we use objects or “services” (whatever those might be)—distributed computing is hard because it is distributed computing.
As far as A Note on Distributed Computing is concerned, it argues from false premises and arrives at conclusions that are not supported by these premises. In fact, the paper is largely irrelevant to modern middleware such as Ice.
Now, does it matter whether I use CORBA, or Ice, or REST, or something else? You bet it does! But that I will make the topic of other posts…
Cheers,
Michi.
July 14th, 2008 | Filed under News |
4 comments
This post marks the end of one era, and the beginning of another. What is ending is our newsletter Connections. After some navel gazing, we decided that the newsletter format was too inflexible as an effective communication tool. Besides creating the content for the newsletter, quite a bit of work goes into the typesetting for each issue that, apart from making for pretty PDF pages, does not really contribute anything. Moreover, because we publish Connections every two months, it does not provide us with a way to deal with “little” topics that are too small for an FAQ or an article. And, occasionally, we would like to discuss a news item or a topic that has popped up on someone else’s blog but, by the time we publish the next issue of the newsletter, the topic is yesterday’s news…
What we really need is something that gives us more flexibility and allows us to throw in short and sharp snippets about middleware without having to go to the trouble of publishing a newsletter, and blogs are an ideal way to do that. So, here we go—you are reading the first of our blog entries, which marks the start of the new era.
In this new era, are you going to have to make do without all the awesome information we used to publish in Connections? (Big pat on our collective shoulders here…) The answer is “Of course not!”
So, everything you have come to love (yes—I know I’m in a presumptuous mood today) will still be there, just in a different format. In addition, you’ll get more topical and off-the-cuff contributions from us that, up to now, we did not really have an outlet for. And, of course, blogs allow you to comment and discuss things whereas communication with PDF documents is very much like Ice oneway invocations. I hope that you will take advantage of the comment feature and praise us for our latest brilliant ideas. (For example, you’ll be at liberty to tell me straight away that, today, I’m being too presumptuous and put me back in my place, instead of having to send me an email that I will dutifully ignore.) In other words, we can have a good old chat about all things Ice, middleware, ZeroC’s plans for world domination (just kidding…), and whatever else happens to pop up.
Which brings me to another way of chatting… If you look back through past issues of Connections, you will notice that most articles have concentrated on a specific feature of Ice, or explored a single design or implementation technique. All these designs and techniques are useful, apply in the real-world, and a typical Ice application will likely use several of these designs and techniques. And that is exactly the point: a typical Ice application will use several of these designs and techniques, not just a single one. As always in software development, different designs do not stand in isolation and, when combined, can interact with each other and present design challenges that, otherwise, would not arise. Ice is no different, so we thought it would be useful to present a more complex Ice application that combines several features and services.
This show-case application is a chat application that allows people to communicate with each other in real time over the internet, much like MSN Messenger or Yahoo! Messenger. We have no ambitions to supplant these existing services, which do a fine job. Rather, we chose a chat application because it presents design and implementation challenges that are common to many distributed applications. For example, it must be possible to authenticate clients that want to use the application, eavesdroppers must not be able to monitor other people’s conversations, the application must co-exist with existing client- and server-side firewalls, and it should be robust in the face of misbehaved or malicious clients. In other words, the idea is to show how to design and implement an industrial-strength Ice application that meets real-world requirements and is more than just a toy demonstration.
Obviously, you would expect Ice to be suitable for this (otherwise we would not dare to present the application in the first place). But you may not expect how remarkably easy it is to develop such an application. Despite the realistic requirements, code complexity rises (almost) linearly with the number of requirements and not as some higher-order function. This is because Ice provides features and services that are well designed, address orthogonal concerns, and, as a result, are easy to combine and adapt to specific application requirements. This is no accident; our design philosophy is that features do not make it into Ice because we think someone may find them useful. Instead, features make it into Ice when we encounter a real shortcoming that cannot reasonably be overcome without new functionality in the platform. This means that everything in the platform exists because it really was needed and that its design and implementation were driven by real (instead of imaginary) requirements. In the few cases where requirements interact with each other, we show you how to deal with them without suffering a blowout in complexity and development cost.
So, check out our new chat demo pages, where you can read about how the application works and get an overview of its design and implementation. If you want more meat, you can read our in-depth article that gives you the complete run-down, and, for even more meat, you can download the source code for the application. Of course, you can also point your browser at our chat server and chat via HTTPS, or use one of the clients to chat via the Ice protocol straight from your machine into our server (securely, of course).
If you do like (or don’t like) what you see, you can let us know: by posting in our forum, by adding comments to this blog entry, or in real time, by chatting in our chat room.
At ZeroC, there are many ways to chat!
Cheers,
Michi.
|
|
|
|
|