A Matter of Definition
July 22nd, 2008 | Filed under Distributed Computing | 11 commentsSteve Vinoski has been busy trying to convince the world at large that RPC is “fundamentally flawed”. I think it is interesting to take a look at RPC and see what those fundamental flaws are (and whether there are flaws, for that matter). Doing this will definitely take more than one post, so don’t expect the answers all at once. I will deal with various aspects of the topic over a number posts over the next few weeks, so please bear with me.
Before we can delve into the details, let’s take a look at a definition of RPC. In one of the comments to his blog post, Steve states that RFC 707 defines RPC. Having just read trough that RFC again (twice), I cannot find a definition of RPC anywhere in that document. What I can find is an outline of a protocol that allows a client to ask a server to perform some work and to get results back from the server. But that’s pretty much it. There is no mention of specific APIs, there is no mention of interface definition languages, and there is no requirement for specific interaction models. (RFC 707 explicitly states that interactions need not be synchronous and that the run-time environment can provide a non-blocking interaction model.)
The protocol that is described by RFC 707 is remarkably simple: just two message types, request and reply. This is attractive but, as we have found out in the intervening time, inadequate. For example, for a connection-oriented transport, we also need a message that confirms the acceptance of a connection; without such a message, the protocol can violate at-most-semantics, which are important for operations that are not idempotent. The importance of at-most-once semantics and idempotent operations is another thing that we have learned in the intervening time. (RFC 707 does not mention either.)
How much time? RFC 707 was published in January 1976, more than thirty-two years ago. To put things in perspective, that was at a time in computing history where RPC (and networking, for that matter) were truly in their infancy. People were taking their first few hesitant steps toward distributed computing back then: TCP/IP had only just been invented and there was no such thing as distributed computing as we understand it today—what counted as distributed computing back then were protocols such as telnet and ftp. (DCE did not exist, and even UUCP and email had not been invented yet.)
While RFC 707 defines one of the earliest protocols for RPC, we today know this protocol to be inadequate. Moreover, a lot has happened in the intervening years; RPC today is a far cry from “RPC” back in 1976, and RFC 707 is of interest mainly as a historical document that has essentially no relevance to modern middleware. So, let me see whether we can find a more current definition.
Wikipedia has an entry for RPC. That entry also refers to RFC 707, but states that RFC 707 “describes” RPC (which is not the same as defining it). The entry then offers the following explanation:
An RPC is initiated by the client sending a request message to a known remote server in order to execute a specified procedure using supplied parameters. A response is returned to the client where the application continues along with its process. […] While the server is processing the call, the client is blocked.
“While the server is processing the call, the client is blocked.” Huh? Only a few lines earlier, the same article references RFC 707, which explicitly states that the interaction can be non-blocking. I guess we are running into the limitations of Wikipedia—there is only so much quality control that can be applied to the various articles.
So, let’s turn to whatis.com. It says:
Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details. (A procedure call is also sometimes known as a function call or a subroutine call.) RPC uses the client/server model. The requesting program is a client and the service-providing program is the server. Like a regular or local procedure call, an RPC is a synchronous operation requiring the requesting program to be suspended until the results of the remote procedure are returned.
Hmm… The same error as with Wikipedia here: the entry claims that interactions are synchronous when, according to RFC 707 (and lots of past and current RPC implementations), they need not be. (One wonders whether one copied from the other.)
I did many more searches and checked a number of books, and what they turned up was very similar to the preceding “definitions” for RPC. The common theme is:
- One program is active and issues a request for service.
- Another program passively listens for requests and acts on them.
- When the service is complete, the service-providing program can return results back to the originator of the request.
Now, this is nice as far as it goes, but it hardly is a definition of RPC. Instead, it is a description of basic principles. But that description is so loose, just about anything fits it, including Ice, DCE, CORBA, SOAP, and REST.
Yet, we all seem to somehow know what RPC is and is not:
- RPC is about procedure (or, for object-oriented RPC, method) calls. In other words, at the API level, the interaction feels like a procedure or method call (whether synchronous or asynchronous).
- RPC platforms require a contract that defines the procedures or methods, including the types of data that are exchanged as parameters and return values.
It seems to me that this gets a little closer to what RPC is about: we have APIs that mostly hide the grunt work involved in communicating over a network, and we have a formal contract that establishes a type system.
Typically, a compiler that creates stubs and skeletons from an interface definition language generates the API and enforces the contract; however, other ways to define the contract (such as reflection) can be used. Either way, the generated code or reflection takes care of marshalling chores.
Note that, for modern RPC, a static contract is only one way of doing things. For example, Ice and CORBA provide dynamic invocation and dispatch that allow you to use RPC without an interface definition and without generated code.
Of course, the preceding description is still a far cry from a proper definition. But at least it gets us in the right direction: RPC systems take care of networking chores and—at least much of the time—use generated code that gets data onto the wire and back off the wire again. (If anyone is aware of a more detailed and/or rigorous definition of RPC, please let me know; I’d be very interested to see it.)
To what extent various technologies, such as Ice, SOAP, and REST fit (or do not fit) that description is something I will explore in future postings.
Cheers,
Michi.
Tags: computing history, interface definition, on-the-wire contract, RPC
Michi, you say that RFC 707 didn’t define RPC, yet you’re unable to find a clear definition. If you do your homework and start tracing RPC research papers back in time, you will indeed find that they all wind up at the work of James E. White and RFC 707. I’ve mentioned this in several of my columns over the years; I’m fortunate to have some extremely knowledgeable distributed systems folks reviewing them for me before and after they’re published, and nobody has ever pointed me to a source other than RFC 707 or questioned whether RFC 707 is where RPC got started. The giants from whom I started learning distributed computing 20 years ago seemed to think that was the starting point of RPC as well.
I suggest that if you want to see the relevance of RFC 707, you work through the brief analysis of it that I provided here (I hope this link shows up):
http://steve.vinoski.net/blog/2008/07/13/protocol-buffers-leaky-rpc/#comment-1141
What White referred to as the “procedure call model” is, as I said above and to the best of my knowledge, what distributed systems researchers generally agree on as the earliest definition of RPC.
It’ll be interesting to see how you’ll explain how RPC is not fundamentally flawed, since it very clearly is, and has been known to be ever since 1976. Regardless, despite your claims in your post above, fundamental RPC flaws are actually not what I’m trying to convince the world of; rather, I merely aim to point out that yes, it is 32 years later, and now there are much better ways of building distributed systems than trying to extend and adapt general-purpose programming languages and their function invocation abstractions to distributed computing.
One nice thing is that now that I work in an industry that’s completely different than and unrelated to middleware, and I no longer earn my living from particular middleware approaches or systems. I can therefore speak freely and talk about what approaches I actually use in practice and what really works for me, rather than being beholden to some marketing message of my employer. I’m not sure you can say the same, so your analysis is likely to be biased.
Comment by Steve Vinoski — July 22, 2008 @ 1:06 am
Yes, I wasn’t able to find a clear definition of RPC. I made some attempt to narrow it at least a little bit. The aspects of procedural API and defined contract seem to be key (but I admit that this is still far from a formal definition).
I have no doubt that RFC 707 is amongst the earliest widely published papers on RPC. But I still believe that it is not a definition in any rigorous way. As far as I can see, going by RFC 707, any web browser fetching a web page performs RPC: an HTTP GET is the request; the operation name and/or parameters are part of the URL; the HTTP response is the reply and delivers the results in form of a new page. (HTTP POST can be substitued for HTTP GET in this scenario.)
Yet, no-one seems to think of HTTP traffic as RPC, but the definitions in RFC 707 don’t capture what makes them different.
RFC 707 is too old to be applied to modern-day middleware. It does not address a whole number of things that have popped up since, and that are important to distinguishing current-day technologies from each other. Hence, RFC 707, venerable as it may be, is irrelevant in a discussion of current-day distributed computing.
There are many things that we have learned in the past thirty-two years, and RFC 707 obviously could not take those into account.
Comment by Michi — July 22, 2008 @ 7:35 am
I disagree that an HTTP
GETconforms to the RFC 707 definition of RPC, because it’s not tied at all into any programming language’s procedure call abstraction. Again, see paragraphs 4b1 and 4b4 of the RFC; the latter says, for example:HTTP
GEThas nothing to do with procedures or local programming environments. Could someone wrap up HTTP verbs behind some programming language framework and misguidedly use HTTP as RPC? Sure, WS-* does exactly that (though it tends to stick toPOSTbecause only that verb allows a full range of procedure-like capabilities to be tunneled through.)I agree that much has changed in 32 years, but it’s always instructive to understand history and thereby understand how things got to be the way they are. With RPC, we see a history of proposals of what at first appear to be simple, convenient models for distributed computing that are later determined to be flawed, followed by further proposals that attempt to tackle those flaws by adding more and more complexity (e.g., CORBA) while attempting to preserve the procedure/method call model, in turn followed by new proposals that react to the complexity by stripping it out, thereby re-exposing some of the flaws (as we’ve seen recently with Protocol Buffers), but of course still maintaining that procedure/method call model. Seen in that light, it leads one to wonder whether preserving that call model is worth all the effort; as I pointed out in my recent “Serendipitous Reuse” column:
Much has indeed changed in 32 years, which is precisely why I’m wondering why some are still bothering with RPC. My overall goals in bringing up RFC 707 and working forward from it are:
to help people understand the original motivations for RPC and its original design goals
to illuminate the repeated cycles of making trade-offs between complexity and simplicity around the flaws of the RPC model
to question whether the primary design center of RPC of extending the programming language call model for distribution is the best primary abstraction for applications, regardless of its convenience
to show how the choice of programming language makes a huge difference in the convenience and correctness of distributed applications
to talk about simpler yet more effective alternatives such as REST-based approaches that don’t suffer the fundamental flaws of RPC or require workarounds for them
All of my Internet Computing columns for 2008 so far comprise a series exploring these issues.
Comment by Steve Vinoski — July 22, 2008 @ 9:32 am
I have no problems finding examples where an HTTP GET is tied into a programming language’s procedure call abstraction. For example, in .NET, we have the
HttpWebRequestclass with itsGetResponsemethod (which synchronously sends a GET request and waits for the reply). In J2ME, we have theHttpConnectionclass, which does much the same thing. Python providesurllibfor the same purpose:>>> import urllib>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read()
All this looks very RPC-like to me. It certainly seems that it isn’t just WS-* that treats HTTP as if it were RPC.
Comment by Michi — July 22, 2008 @ 7:50 pm
Not that I would hold it up as an example of a state-of-the-art RESTful approach, but the Python call doesn’t look RPCish to me at all. There’s no endpoint-specific procedure being invoked, for example. The URL is “opened,” which does a
GETof the resource, and then the code reads the body that’s returned, much as a file is read. No parameters are passed in the programming language sense —paramsis an argument to the string formatting operation, not to theurlopenfunction, and URI query string parameters are certainly not the same as regular function parameters — and no language-specific return types are returned.As I already said, anyone can wrap HTTP up and invoke it any way they want to. In general, you could write the best chunk of software ever and still some fool would probably put some horrible wrapper around it, and there’s nothing you can do about that. The key here, though, is that HTTP was not at all designed around the programming language procedure call model abstraction, which means it’s not RPC. Considering Roy Fielding’s very significant contributions to the design of HTTP, and also considering that he coined the term REST, let’s see what he has to say:
That last sentence is especially important here.
(BTW, I hope all the line breaks come out OK in that quoted section; you should consider adding a comment preview plugin.)
Comment by Steve Vinoski — July 22, 2008 @ 9:28 pm
I’m far from an expert, but I’m not convinced that web technologies would be the right thing for our applications. We have a set of C++ libraries for controlling motion, acquiring data and images, etc. for a large control system. I feel that the Ice RPC mechanism fits my needs a lot better than some web service thing would - talk about an impedance mismatch. I can use native C++ types and STL containers easily with Ice, and it is easy to wire an Ice servant to a native C++ object. But, maybe that’s just my comfort level talking.
I am always conscious of the fact that I am doing distributed computing, and understand the latency and reliability issues. Ice does not try to hide any of that. No matter what technology I use, that awareness has to be there. If a programmer doesn’t understand the issues he/she should not be doing distributed programming of any kind, right?
Comment by Mark Wilson — July 23, 2008 @ 2:30 pm
Thanks for the suggestion. We’ve added this.
Comment by Michi — July 23, 2008 @ 7:44 pm
[...] ZeroC Blogs - Michi Henning - Michi Henning and Steve Vinoski discuss the origin and definition of RFC. [...]
Pingback by del.icio.us bookmarks for July 21st, 2008 through July 23rd, 2008 < Subject Code — July 23, 2008 @ 8:30 pm
@Mark: I agree with your last sentence, but unfortunately it happens quite a bit in the real world from what I’ve seen.
For your particular case you may indeed be correct that Ice is the best way to go. (Some try to turn this discussion into a black-and-white argument (not saying you’re doing that, BTW), but it’s not and it can’t be.) I will say, though, that RESTful approaches can work well for embedded systems — I know that because I’ve personally developed such systems and was very pleased with the results. And I guess that’s part of my point: these approaches are more widely applicable than people think they are. The other part of my overall point is just going back to basics and questioning a lot of the assumptions that we make around RPC-oriented approaches, because based on quite a bit of experience over the past few years I feel that for a variety of applications there are better alternatives out there.
Comment by Steve Vinoski — July 24, 2008 @ 1:48 am
You’re basically saying that one size does not fit all, and I agree. An embedded web service for an isolated embedded device can work very well. For instance, we have XPS motion control devices that allow us to use a web page to query and configure them - very handy. But, for integrating that device into our very large control system with GUI, database, and operational protocols, I’m not so sure. Seems like throwing all those big HTML/XML strings around, and having to build and interpret them could get pretty clunky. The RPC model, IMO, works better for that.
Comment by Mark Wilson — July 24, 2008 @ 9:46 am
@Mark: I guess we’ll just have to agree to disagree. Let’s just say I’m quite confident that RESTful HTTP works very well in just the type of system you’re describing.
RESTful HTTP doesn’t have to mean big HTML and XML strings — this sort of wrong assumption is exactly why I encourage people to really look into REST in detail and with an open mind before passing judgment. Just to be clear, Mark, I’m not intending to pick on you specifically here at all, but in general the wildly inaccurate assumptions that always seem to accompany discussions about REST lead many to reach wildly inaccurate conclusions about it, which is unfortunate for them.
Comment by Steve Vinoski — July 24, 2008 @ 10:30 pm