Tag Archive for "computing-history" - Michi Henning

Step By Step

July 31st, 2008 | Filed under Distributed Computing | 10 comments

Before I delve further into the merits and demerits of RPC, SOAP, REST, and so on, I want to take a look back at the history of distributed computing. Not because I’m a connoisseur of history, but because that history has shaped much of the technology we see today and, without that history, today’s distributed computing options would likely be different.

First Steps

Back in the eighties and early nineties, we saw the first few steps toward a kind of distributed computing that did not require developers to create half a networking stack on top of raw sockets. Sun ONC, Apollo NCS, and DCE marked these early steps. These technologies were a big improvement over sockets, and certainly made the job of writing a distributed application a lot easier. However, they also had their share of drawbacks, among them that they only provided a C API, and that much of the underlying network semantics were still exposed (such as the distinction between TCP/IP and UDP, or the need to explicitly deal with domain names and port numbers). In addition, these technologies were not widely available.

A Step Forward

During the nineties, the industry witnessed the rise of C++ (together with other object-oriented languages) and anointed OO programming the latest silver bullet. (It was impossible to open any trade magazine in those days without finding "OO" in the title of just about every article.) The interest in OO also brought about a shift in distributed computing. In particular, we saw increasing research in component models and, naturally, distributed component models. This led to a whole raft of new technologies, particularly on the Microsoft side, which created DDE, OLE, OLE Automation, COM, ActiveX, DCOM, COM+, and MTS. (Isn’t it wonderful how developers were bestowed with successive new technologies? The list was later extended with .NET Remoting, which in turn was superseded by WCF.)

Meanwhile, the rest of the world wasn’t asleep either… DEC morphed DCE into ObjectBroker, IBM created SOM and, later, DSOM, Trinity College created the beginnings of Orbix, ILog invented ILog Broker, Post Modern released Black Widow, HP developed ORB Plus, and Expersoft came up with Visibroker (among others).

So, there was plenty to choose from on both sides of the camp. While Microsoft continued to release a new spin on its distribution technology every few months, the other vendors closed ranks and joined the OMG, which published CORBA. During that time (the mid- to late-nineties), the industry witnessed a fierce battle between DCOM and CORBA—the Microsoft world mostly stuck with DCOM (although many CORBA implementations also ran on Windows), while the rest of the world went ahead and used CORBA.

A New Stomping Ground

By the late nineties, it had become clear that Microsoft were losing their battle with CORBA: CORBA could parade a large number of high-profile success stories, while there was never even a single large-scale or mission-critical DCOM application. (Even on Windows, CORBA had begun to displace DCOM, due to its platform-agnostic nature—by then, CORBA actually ran on more Windows versions than DCOM.)

Concurrently with these developments (beginning around 1995), a number of things happened that, together, reshaped the entire computing industry:

  • The world-wide web began its meteoric rise.
  • HTML created the world’s first truly platform-independent user interface.
  • Sun invented Java and created the world’s first popular platform-independent programming language.
  • The common household discovered the Internet.

Taken together, these factors meant that, suddenly, there were mega-dollars to be earned with distributed computing whereas, only a few years before, computer networking was strictly an activity for nerds. (As late as 1993, Bill Gates had said "The Internet? We are not interested in it.") All of a sudden, distributed computing had become the key to commercial success—there was no way that Microsoft was going to cede the distributed computing market to CORBA.

The explosion of the web and the ensuing high-tech bubble meant that anything based around web technology was in. (The OO silver bullet of the early nineties had given way to the XML silver bullet of the late nineties.) Rather than fight a battle they could not win, Microsoft created a completely new battlefield: the new paradigm was to be distributed computing based around the web and XML, and it begat SOAP in 1999, followed by Web Services shortly thereafter. Because the shine had worn off both DCOM and CORBA, many of the distributed computing vendors flocked to WS as the saviour of their ailing businesses. (Microsoft discontinued DCOM shortly thereafter, and CORBA had dwindled into insignificance by 2003.)

A Step Backwards

The shift to SOAP and WS was a severely retrograde step for a number of technical reasons:

  • Many years of software engineering experience had demonstrated that encapsulation and separation of interface and implementation were key strategies in controlling cost and complexity. SOAP threw all this hard-earned wisdom out the window: there were no objects, there was no encapsulation, there were no standardized APIs and language mappings, and the data, once again, reigned supreme.
  • SOAP and WS provided an excuse for hoards of developers with no prior distributed computing experience to reinvent the wheel (something that this industry loves to do more than anything else). Not only did they reinvent the wheel, but they reinvented it badly, repeating many of the mistakes of the past, while adding a considerable share of new mistakes that no-one had ever thought of before.
  • The bandwidth requirements of XML made it staggeringly wasteful; SOAP messages require twenty to more than one hundred times the bandwidth of the average binary protocol.
  • XML is expensive to marshal and unmarshal; it requires hundreds of times the number of CPU cycles needed by a binary protocol.
  • SOAP was so retarded and low-level that the marketing machinery had to be called into action. It duly bestowed us with an entire new service-oriented architecture ("SOA"), which was necessary to sell the botched technology to the industry.

In addition, proponents of SOAP made various claims that did not stand up to scrutiny, such as the importance of having a character-based protocol (as opposed to a binary one), the self-describing nature of XML, the inherent security gained by sending everything through port 80, and the loose coupling of service-oriented applications. (I will return to these topics in more detail in future posts.)

A Step Sideways

While the SOAP crowd was busy re-inventing a square RPC-wheel, another paradigm shift started to take hold: representational state transfer (REST). In contrast to RPC and SOAP/WS, REST is not a technology, but a set of architectural principles. In a nutshell, it argues that the success of the web is due to a number of constraints that guided its design. By making these constraints and designs explicit, we can arrive at an architectural style that benefits scalability, extensibility, and performance.

Like SOAP, REST was strongly inspired by the web, and its principles are closely linked to web technologies, such as HTTP, URIs, caching, and processing of documents by intermediaries. Like the web, REST focuses on "Internet-scale distributed hypermedia interaction", that is, it models distributed computing as an exchange of documents.

Where We Stand Today

Today, we can separate general-purpose distributed computing into three different camps.

  • The RPC camp. This camp is populated by applications that use various forms of RPC, such as CORBA or Ice.
  • The web camp. This camp is populated by applications that use HTTP as a substrate. It includes applications built around web browsers, HTML, XML, applets, servlets, SOAP, and web services.
  • The REST camp. This camp is populated by applications that apply RESTful principles to a variety of technologies (even though a majority use the web for their implementation).

These three broad categories capture most of the action in commercial and enterprise distributed computing. (Of course, there are many other technologies in use, such as SCADA, RTCP, or home-grown technologies; but these are not a part of general-purpose, B2B and e-commerce distributed computing.)

Now, regardless of the various merits of these camps, it is important to realize why they exist:

  • Historically, the RPC camp came first. As developers applied the lessons of the past, RPC improved with successive iterations of the technology and managed to accumulate a number of considerable successes.
  • The web and REST camps came second. The success of browsers, Java, XML, and the exploding e-commerce market inspired REST, and provided a convenient wave for SOAP, WS, and SOA to ride.

Another important point here is that the web camp exists not necessarily because of any inherent technological advantage over the RPC camp. Historically, technology had little to do with the use of the web for distributed computing. Instead, the web camp exists because of political manoeuvring, fighting for market dominance, opportunism, and the desire to leverage an industry trend. If there is a technological advantage for the web camp, that advantage is accidental because technological excellence could not have been further from the participants’ minds at the time they created the schism.

Where We Are Going

Clearly, these distributed computing camps are here to stay for the foreseeable future. I simply cannot see a world where the majority of distributed computing would be built around RPC, or a world where most applications would be built around SOAP/WS, or a world where distributed applications would be exclusively RESTful.

To declare one camp "good" and another camp "bad" is naive and falls into the trap of searching for yet another silver bullet. As with all designs and technologies, there are trade-offs, and correctly matching these trade-offs to the requirements of a particular project is what separates "good" from "bad"—not a belief in the superiority of one camp over another.

With that in mind, we can delve deeper into these trade-offs, which I will do in future posts.

Cheers,

Michi.

A Matter of Definition

July 22nd, 2008 | Filed under Distributed Computing | 13 comments

Steve Vinoski has been busy trying to convince the world at large that RPC is “fundamentally flawed”. I think it is interesting to take a look at RPC and see what those fundamental flaws are (and whether there are flaws, for that matter). Doing this will definitely take more than one post, so don’t expect the answers all at once. I will deal with various aspects of the topic over a number posts over the next few weeks, so please bear with me.

Before we can delve into the details, let’s take a look at a definition of RPC. In one of the comments to his blog post, Steve states that RFC 707 defines RPC. Having just read trough that RFC again (twice), I cannot find a definition of RPC anywhere in that document. What I can find is an outline of a protocol that allows a client to ask a server to perform some work and to get results back from the server. But that’s pretty much it. There is no mention of specific APIs, there is no mention of interface definition languages, and there is no requirement for specific interaction models. (RFC 707 explicitly states that interactions need not be synchronous and that the run-time environment can provide a non-blocking interaction model.)

The protocol that is described by RFC 707 is remarkably simple: just two message types, request and reply. This is attractive but, as we have found out in the intervening time, inadequate. For example, for a connection-oriented transport, we also need a message that confirms the acceptance of a connection; without such a message, the protocol can violate at-most-semantics, which are important for operations that are not idempotent. The importance of at-most-once semantics and idempotent operations is another thing that we have learned in the intervening time. (RFC 707 does not mention either.)

How much time? RFC 707 was published in January 1976, more than thirty-two years ago. To put things in perspective, that was at a time in computing history where RPC (and networking, for that matter) were truly in their infancy. People were taking their first few hesitant steps toward distributed computing back then: TCP/IP had only just been invented and there was no such thing as distributed computing as we understand it today—what counted as distributed computing back then were protocols such as telnet and ftp. (DCE did not exist, and even UUCP and email had not been invented yet.)

While RFC 707 defines one of the earliest protocols for RPC, we today know this protocol to be inadequate. Moreover, a lot has happened in the intervening years; RPC today is a far cry from “RPC” back in 1976, and RFC 707 is of interest mainly as a historical document that has essentially no relevance to modern middleware. So, let me see whether we can find a more current definition.

Wikipedia has an entry for RPC. That entry also refers to RFC 707, but states that RFC 707 “describes” RPC (which is not the same as defining it). The entry then offers the following explanation:

An RPC is initiated by the client sending a request message to a known remote server in order to execute a specified procedure using supplied parameters. A response is returned to the client where the application continues along with its process. […] While the server is processing the call, the client is blocked.

“While the server is processing the call, the client is blocked.” Huh? Only a few lines earlier, the same article references RFC 707, which explicitly states that the interaction can be non-blocking. I guess we are running into the limitations of Wikipedia—there is only so much quality control that can be applied to the various articles.

So, let’s turn to whatis.com. It says:

Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details. (A procedure call is also sometimes known as a function call or a subroutine call.) RPC uses the client/server model. The requesting program is a client and the service-providing program is the server. Like a regular or local procedure call, an RPC is a synchronous operation requiring the requesting program to be suspended until the results of the remote procedure are returned.

Hmm… The same error as with Wikipedia here: the entry claims that interactions are synchronous when, according to RFC 707 (and lots of past and current RPC implementations), they need not be. (One wonders whether one copied from the other.)    

I did many more searches and checked a number of books, and what they turned up was very similar to the preceding “definitions” for RPC. The common theme is:

  • One program is active and issues a request for service.
  • Another program passively listens for requests and acts on them.
  • When the service is complete, the service-providing program can return results back to the originator of the request.

Now, this is nice as far as it goes, but it hardly is a definition of RPC. Instead, it is a description of basic principles. But that description is so loose, just about anything fits it, including Ice, DCE, CORBA, SOAP, and REST.

Yet, we all seem to somehow know what RPC is and is not:

  • RPC is about procedure (or, for object-oriented RPC, method) calls. In other words, at the API level, the interaction feels like a procedure or method call (whether synchronous or asynchronous).
  • RPC platforms require a contract that defines the procedures or methods, including the types of data that are exchanged as parameters and return values.

It seems to me that this gets a little closer to what RPC is about: we have APIs that mostly hide the grunt work involved in communicating over a network, and we have a formal contract that establishes a type system.

Typically, a compiler that creates stubs and skeletons from an interface definition language generates the API and enforces the contract; however, other ways to define the contract (such as reflection) can be used. Either way, the generated code or reflection takes care of marshalling chores.

Note that, for modern RPC, a static contract is only one way of doing things. For example, Ice and CORBA provide dynamic invocation and dispatch that allow you to use RPC without an interface definition and without generated code.

Of course, the preceding description is still a far cry from a proper definition. But at least it gets us in the right direction: RPC systems take care of networking chores and—at least much of the time—use generated code that gets data onto the wire and back off the wire again. (If anyone is aware of a more detailed and/or rigorous definition of RPC, please let me know; I’d be very interested to see it.)

To what extent various technologies, such as Ice, SOAP, and REST fit (or do not fit) that description is something I will explore in future postings.

Cheers,

Michi.

Copyright © 2008 ZeroC, Inc.