Why Gnutella Can't Scale. No, Really. by Jordan Ritter Forward ------------------------------------------------------------------------ In the spring of 2000, when Gnutella was a hot topic on everyone's mind, a concerned few of us in the open-source community just sat back and shook our heads. Something just wasn't right. Any competent network engineer that observed a running gnutella application would tell you, through simple empirical observation alone, that the application was an incredible burden on modern networks and would probably never scale. I myself was just stupefied at the gross abuse of my limited bandwidth, and that was just DSL -- god la. Other informative articles persist on O'Reilly's P2P Website, and elsewhere. So where's my paper, and why haven't you seen it? Well, in case you didn't know, I'm one of the founding developers of Napster, and for several good reasons, including the sobering fact that I was one of the leaders of the main competitor, I did not release my material to the public. Several times I resigned myself to re-writing my paper to accommodate the release of new information and analyses, but I never finished. Now I regret having sat on this for so long, for every paper on Gnutella that has come out in the last year has served as nothing but vindication of my conclusion from so early on: Gnutella will never scale. Following is what remains of my paper, hacked up, sliced, diced and re-written. The information and analyses are still useful, but as I just said, the conclusions are the same. This paper simply proves those conclusions through mathematics. Onward, Through the Fog ------------------------------------------------------------------------ This paper assumes a working knowledge of Gnutella networks and internals, and therefore uses terminology and phraseage specific to Gnutella. If the wording seems somewhat strange or foreign to you, please stop reading this paper and seek other documentation before proceeding. Furthermore, explanation of the accompanying math is intentionally terse. Every effort has been made to verify the accuracy of the equations herein, but this discussion is intentionally limited to that which is solely relevant to Gnutella in order to keep at a minimum any distraction from an already complex topic. To Scale, or Not to Scale Scaling Gnutella will require more than just better resource management tools -- in its current incarnation Gnutella is mathematically and technologically unable to scale to a network of any reasonably large size. Following herein is a discussion focused on mathematically describing the metrics of a GnutellaNet topology, and using derived equations to interpret and visualize realistic limits of the technology. In order to keep the math as simple as possible, let's assume we're examining a relatively quiet GnutellaNet network, and dissect the flow of information one step at a time. Variables and Equations P The number of users connected to the GnutellaNet. The number of connections held open to other N servents in the network. In the default configuration of the original Gnutella client, this is 4. Our TTL, or Time To Live, on packets. TTL's are T used to age a packet and ensure that it is relayed a finite number of times before being discarded. The amount of available bandwidth, or B alternatively, the maximum capacity of the network transport. A function describing the maximum number of f(n, x, y) reachable users that are at least x hops away, but no more than y hops away. f(n, x, y) = Sum[((n-1)^(t-1))*n, t = x->y] A function describing the maximum number of g(n, t) reachable users for any given n and t. g(n, t) = f(n, 1, t) A function describing the maximum amount of bandwidth generated by relaying a transmission h(n, t, s) of s bytes given any n and t. Generation is defined as the formulation and outbound delivery of data. h(n, t, s) = n*s + f(n, 1, t-1)*(n-1)*s A function describing the maximum amount of bandwidth incurred by relaying transmission of s bytes given any n and t. Incurrence is defined i(n, t, s) as the reception or transmission of data across a unique connection to a network. i(n, t, s) = (1 + f(n, 1, t-1))*n*s + f(n, t, t)*s It benefits the casual reader to first explain in terms of a balanced, equally distributed GnutellaNet, so for this exercise assume that everyone has the same N and T. In the initial release of Gnutella, N = 4 and T = 5. Further, let P = 2000, arbitrarily. Finally, let us assume no other interfering factors exist (for now). Early reports of Gnutella's usage claimed upwards of 2000 to 4000 users on the GnutellaNet. This is significant because these reports inaccurately implied that all 4,000 users on the GnutellaNet were reachable and searchable. The reality is that even in an ideally balanced GnutellaNet, P is never relevant to your potential reach; N and T are the only limiting factors. Reachable Users T=1 T=2T=3 T=4 T=5 T=6 T=7 T=8 N=2 2 4 6 8 10 12 14 16 N=3 3 9 21 45 93 189 381 765 N=4 4 16 52 160 484 1,456 4,372 13,120 N=5 5 25 105 425 1,705 6,825 27,305 109,225 N=6 6 36 186 936 4,686 23,436 117,186 585,936 N=7 7 49 301 1,813 10,885 65,317 391,909 2,351,461 N=8 8 64 456 3,200 22,408 156,8641,098,056 7,686,400 Raising N (number of connections open) and T (number of hops) extend the number of reachable users geometrically. Keep in mind, the above illustrates potential reach given two assumptions: the network is fully balanced, and everyone shares the same N and T. So, the next obvious step for an intrepid and now better-informed Gnutella user is to increase N and T, so as to extend their potential reach into the GnutellaNet web. Not so fast! As your reach increases geometrically, so does the amount of bandwidth generated and incurred. Let's now move the discussion towards B. Delving Deeper into B Before proceeding, it is very important to understand that many assumptions must be made in order to carry out these computations. Observed characteristics of GnutellaNet topologies are simply too varying to accurately generalize. That said, I still believe that there exists a statistical mean of each characteristic in a GnutellaNet, which is to say that if I were to take a snapshot of the current topology of a public GnutellaNet I could derive an average N, T, and so forth. While potentially inaccurate as a realistic representation, these means can still produce a useful generalization. In our discussion of B, there are really two different perspectives on how to measure the amount of bandwidth: the amount generated, and the amount incurred. This is a very important distinction to make, because knowing the amount of raw data generated is statistically useful, but understanding the bandwidth cost incurred by individual events in the network is much more important since it more realistically signifies the impact on an Internet connection. As previously stated, h(n, t, s) represents the amount of bandwidth generated by relaying a packet through the network, counting only data that is outbound to another destination. i(n ,t, s), on the other hand, counts all outbound and inbound transmissions, yielding a more accurate perspective on bandwidth usage. Let's introduce an example. Joe Smith likes classic rock, and is desperately searching for any live recordings of The Grateful Dead. Joe loads up his Gnutella client, connects to the GnutellaNet, and executes his search, "grateful dead live". What actually happens? Search Query Packet Makeup IP header 20 bytes TCP header 20 bytes Gnutella 23 header bytesTER>830 996 1,162 1,328 N=3 249 747 1,743 3,735 7,719 15,687 31,623 63,495 N=4 332 1,328 4,316 13,280 40,172 120,848 362,876 1,088,960 N=5 415 2,075 8,715 35,275 141,515 566,475 2,266,315 9,065,675 N=6 498 2,988 15,438 77,688 388,938 1,945,188 9,726,438 48,632,688 N=7 581 4,067 24,983 150,479 903,455 5,421,311 32,528,447 195,171,263 N=8 664 5,312 37,848 265,6001,859,864 13,019,712 91,138,648 637,971,200 From above, given a concurrent demographic comparable to Napster (assuming equally balanced), searching for a simple 18 byte string "grateful dead live" unleashes 90 megabytes worth of data to be transmitted. Even so, I don't consider h(n, t, s) to be the best measure. Let's now look at i(n, t, s), which is comprised of the originating transmission, 1 reception and N-1 transmission for tiers 1 through T-1, and 1 reception for the last tier. Bandwidth Incurred in Bytes (S=83) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 332 664 996 1,328 1,660 1,992 2,324 2,656 N=3 498 1,494 3,486 7,470 15,438 31,374 63,246 126,990 N=4 664 2,656 8,632 26,560 80,344 241,696 725,752 2,177,920 N=5 830 4,150 17,430 70,550 283,030 1,132,950 4,532,630 18,131,350 N=6 996 5,976 30,876 155,376 777,876 3,890,376 19,452,876 97,265,376 N=71,162 8,134 49,966 300,9581,806,910 10,842,622 65,056,894 390,342,526 N=81,328 10,624 75,696 531,2003,719,728 26,039,424 182,277,296 1,275,942,400 i(n, t, s) has the unique property of representing double h(n, t, s). From above, a whopping 1.2 gigabytes of aggregate data could potentially cross everyone's networks, just to relay an 18 byte search query. This is of course where Gnutella suffers greatly from being fully distributed. Also, let's not forget that there is no consideration of time in this set of calculations. In the average case, 1.2 gigabytes worth of data takes a very long time to generate and propagate through the Internet. However, even in more realistic cases, propagating a few megabytes worth of data through several hundred thousand nodes across the Internet still takes a considerable amount of time. At this point, though, our exercise is still incomplete. What percentage of Gnutella clients share content? Of them, what percentage are likely to respond to Joe's query? And of those, what would be the mean number of responses, and their mean length? The Anatomy of a Firestorm ------------------------------------------------------------------------ This is where we'll begin to see generalizations diverging from reality. Still though, let's take a quick gander at what evangelists thought Gnutella would be capable of. For this, we'll need to introduce a few more variables and equations. More Variables and Equations a Mean percentage of users who typically share content. b Mean percentage of users who typically have responses to search queries. r Mean number of search responses the typical respondent offers. l Mean length of search responses the typical respondent offers. A function representing the Response Factor, a constant value that describes the product of the R percentage of users responding and the amount of data generated by each user. R = (a*b) * (88 + r*(10 + l)) A function describing the amount of data j(n, T, R) generated in response to a search query by tier T, given any n and Response Factor R. j(n, T, R) = f(n, T, T) * R A function decsribing the maximum amount of bandwidth generated in response to a search k(n, t, R) query, including relayed data, given any n and t and Response Factor R. k(n, t, R) = Sum[ j(n, T, R) * T, T = 1->t ] Assuming that a mean exists for the characteristics of our measurement makes these calculations much simpler. That said, recall that I don't believe this assumption to be false; that at any given moment there does exist some measurable a, b, r and l. Let's assume conservative estimates for now, and apply observed behaviour from other reports later. The difficulty in gauging the sheer amount of data coming back to us stems from our inability to realistically discern where in the partial mesh of connections the data is coming from. By design, the only thing we will know about about the packets received is the (hopefully) unique message ID. If the message ID correlates to the message ID of one of our pending queries, the response is ours. Otherwise, the response is someone else's traffic, and if it correlates to an known ID in our routing table, it is simply passed along. Search Response Packet Makeup IP header 20 bytes TCP header 20 bytes Gnutella header 23 bytes Number of hits 1 byte Port 1 byte IP Address 4 bytes Speed 3 bytes Result Set r * (8 + l + 2) bytes Servent Identifier 16 bytes Total: 88 + r*(10 + l) bytes Let's take a look now at what the variation of N and T yields in terms of bandwidth costs. For our first case, let's choose some reasonable values: a = 30%, b = 50%, r = 5 and l = 40, or R = 50.7. Bandwidth Generated in Bytes (R=50.7) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=3152.1 760.5 2,585.7 7,452.9 19,620.9 48,824.1 116,965 272,715 N=4202.8 1,419.6 6,895.2 28,797.6 110,932 496,614 1,441,500 4,989,690 N=5253.5 2,281.5 14,449.5 79,345.5 403,826 1,961,330 9,229,680 42,456,400 N=6304.2 3,346.2 26,161.2 178,261 1,128,890 6,832,640 40,104,500 230,230,000 N=7354.9 4,613.7 42,942.9 349,577 2,649,330 19,207,500 135,115,000 929,909,000 N=8405.6 6,084 65,707.2 622,190 5,491,420 46,392,900 380,422,000 3,052,650,000 Precision is limited to 6 or less digits; sorry, I don't know how to make mathematica behave differently in this case. With 30% of Gnutella users sharing, and only half of them responding, the standard client settings yield over 14MB of return responses. I believe this particular R value to be near reality as far as percentages are concerned, but r and l are probably conservative, given recent reports by Clip2 DSS and others. Let's raise R a bit, here's R = 72. Bandwidth Generated in Bytes (R=72) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=3216 1,080 3,672 10,584 27,864 69,336 166,104 387,288 N=4288 2,016 9,792 40,896 157,536 577,440 2,047,104 7,085,952 N=5360 3,240 20,520 112,680 573,480 2,785,320 13,107,240 60,293,160 N=6432 4,752 37,152 253,1521,603,152 9,703,152 56,953,152 326,953,152 N=7504 6,552 60,984 496,4403,762,360 27,276,984 191,879,352 1,320,581,304 N=8576 8,640 93,312 883,5847,798,464 65,883,456 540,244,224 4,335,130,368 These different values don't appear to have much of an impact on the overall bottom line; just over 13MB of traffic generated in response with standard client settings. Let's take one more look and adjust some of the values: a = 30%, b = 40%, r = 10 and l = 60, or R = 94.56. I believe this R to be the most realistic. Bandwidth Generated in Bytes (R=94.56) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=3283.68 1,418.4 4,822.56 13,900.3 36,594.7 91,061.3 218,150 508.638 N=4378.24 2,647.68 12,860.2 53,710.1 206,897 758,371 2,688,530 9,306,220 N=5 472.8 4,255.2 26,949.6 147,986 753,170 3,658,050 17,214,200 79,185,000 N=6567.36 6,240.96 48,793 332,473 2,105,470 12,743,500 74,798,500 429,398,000 N=7661.92 8,604.96 80,092.3 651,991 4,941,123 35,823,800 252,002,000 1,734,360,000 N=8756.48 11,347.2 122,550 1,160,44010,242,000 86,526,900 709,521,000 5,693,470,000 Standard client settings yield a whopping 17MB generated in response to Joe's search query. In order to better understand the results above, one must understand the Response Factor, R, and the reasoning behind it. Recent analyses of Gnutella networks show a small percentage of participants actually sharing content, and a disproportionately small percentage of those sharing actually having most of the content. It is highly improbable that a means to statistically describe the widely varying response characteristics of participants in a GnutellaNet exists. R is a compromise for this difficult task, representing a gross mean across an ideal GnutellaNet of responses we can expect the average query to generate. The key word here is ideal; we know these gross means to exist, but they are as yet unmeasurable, or at least at this point unverifiable, given the quickly changing network topology. Bringing it all together So, now that we have all the pieces to the puzzle, let's fit them together. How much aggregate data, including request and response, is generated by Joe's search for "grateful dead live"? Let's intersect h(n, t, s) with k(n, t, R) to get The Big Picture. Bandwidth Generated in Bytes (S=83, R=94.56) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=3 532.68 2,165.4 6,565.56 17,635.3 44,313.7 106,748 249,773 572,133 N=4 710.24 3,975.68 17,176.2 66,990.1 247,069 879,219 3,051,410 10,395,200 N=5 887.8 6330.2 35,664.6 183,261 894,685 4,224,530 19,480,500 88,250,700 N=61,065.36 9,228.96 64,231 410,161 2,494,410 14,688,700 84,524,900 478,031,000 N=71,242.92 12,672 105,075 802,470 5,844,690 31,245,100 284,530,000 1,929,530,000 N=81,420.48 16,659.2 160,398 1,426,04012,101,800 99,546,700 800,659,000 6,331,440,000 The Big Picture, h(n, t, s) and k(n, t, R) combined. What's really stunning about the above table is the stark realization that in supporting numbers of users comparable to Napster, Gnutella would generate more than an unbelievably significant 800MB worth of data for just one of those users to search the entire network for "grateful dead live" and receive responses. Our job is still not finished yet, though. What remains is to apply these statistics to observed query rates to gain an understanding of the real-time impact of a GnutellaNet on a network. Behold, The Firestorm When Napster, Inc. was served with an injunction designed to halt all file-sharing service through the Napster network, Gnutella and similar services experienced what is now commonly referred to as the "Napster Flood". While an inordinate number of users perceived the injunction as their personal charge to download from Napster as much as possible before the service was brought down, still a great many flocked to other file-sharing services such as Gnutella. During this period of time, Clip2 DSS observed query rates peaking at 10 queries per second, double the normal 3-5 per second. The possibility of exceeding 10 qps during periods of heavy usage these days is not unlikely. The final item of interest in this paper is the extrapolation of bandwidth rates (per second) from the bandwidth costs calculated above and observed rates. For thoroughness, query rates for a quiet (3qps), normal (5 qps), and burdened (10 qps) GnutellaNet are examined. For each test case, the main assumption is that Joe Smith's behaviour satisfies the typical user demographic. Bandwidth rates for 3 qps (S=83, R=94.56) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=31.6KBps 6.5KBps 19.7KBps 52.9KBps 132.9KBps 320.2KBps 749.3KBps 1.7MBps N=42.1KBps 11.9KBps 51.5KBps 201KBps 741KBps 2.6MBps 9.1MBps 31.2MBps N=52.7KBps 19KBps 107KBps 548.8KBps 2.7MBps 12.7MBps 58.4MBps 264 MBps N=63.2KBps 27.7KBps 192.7KBps 1.2MBps 7.5MBps 44.1MBps 253.6MBps 1.4 GBps N=73.7KBps 38.1KBps 315.2KBps 2.4MBps 17.5MBps 123.7MBps 853.6MBps 5.8 GBps N=84.2KBps 50KBps 481.2KBps 4.3MBps 36.3MBps 298.6MBps 2.4GBps 19 GBps Bandwidth rates for 5 qps (S=83, R=94.56) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=32.7KBps 10.8KBps 32.8KBps 88.1KBps 221.6KBps 533.7KBps 1.2MBps 2.9MBps N=43.6KBps 19.9KBps 85.9KBps 335KBps 1.2MBps 4.4MBps 15.3MBps 52 MBps N=54.4KBps 31.7KBps 178.3KBps 916.3KBps 4.5MBps 21.1MBps 97.4MBps 441.3MBps N=65.3KBps 46.1KBps 321.2KBps 2.1MBps 12.5MBps 73.4MBps 422.6MBps 2.4GBps N=76.2KBps 63.4KBps 525.4KBps 4MBps 29.2MBps 206.2MBps 1.4GBps 9.6GBps N=87.1KBps 83.3KBps 802KBps 7.1MBps 60.5MBps 497.7MBps 4GBps 31.7GBps Bandwidth rates for 10 qps (S=83, R=94.56) T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 N=2 N=3 5.4KBps 21.6KBps 65.6KBps 176.2KBps443.2KBps 1.1MBps 2.4MBps 5.8MBps N=4 7.2KBps 39.8KBps 171.8KBps 670KBps 2.4MBps 8.8MBps 30.6MBps 104 MBps N=5 8.8KBps 63.4KBps 356.6KBps 1.8MBps 9MBps 42.2MBps 194.8MBps 882.6MBps N=610.6KBps 92.2KBps 642.4KBps 4.2MBps 25MBps 146.8MBps 845.2MBps 4.8GBps N=712.4KBps 126.8KBps 1.1MBps 8MBps 58.4MBps 412.4MBps 2.8GBps 19.2GBps N=814.2KBps 166.6KBps 1.6MBps 14.2MBps 121MBps 995.4MBps 8GBps 63.4GBps Keeping things in Perspective From the charts above, it becomes mind-numbingly clear that the Gnutella distributed architecture is fundamentally flawed and can have a horrific impact on any network. On a slow day, a GnutellaNet would have to move 2.4 gigabytes per second in order to support numbers of users comparable to Napster. On a heavy day, 8 gigabytes per second. A lot of potentially obscure assumptions are made here, though, and they should be carefully examined and understood before making conclusions: * the test GnutellaNet is ideal, which is to say that all participants form a topology which conforms to g(n, t); * being ideal, its topology is static -- meaning all responses to a search query are received by the requestor, without being cut off by transient nodes; * query rates are constant, * query demographics correlate to the average case presented above, * all GnutellaNet participants are capable of supporting the bandwidth rates incurred, * search queries and responses represent the only relevant and bandwidth-significant activity on the GnutellaNet. So why should the above charts be taken with a grain of salt? Well, the real GnutellaNet that exists today is certainly not ideal, and has been occasionally observed persisting as several smaller, fractured GnutellaNets. Also, there's a great deal of transience in the GnutellaNet; observations yield only roughly 30-40% of participants remain for 24 hours or more. And it should be obvious to even the most casual observer that query rates are not constant, and are more likely to burst and lull as the topology shifts and usage varies. One important factor in evaluating the usefulness of the above is to consider the usage demographic. Current usage may show 3-5 queries per second with anywhere between 4,000 and 8,000 users, but if Gnutella were to ever grow in size, both by users and consequentially by files, search rates would likely increase dramatically. This would be for at least two reasons: more users equates to more people interested in locating content equates to more aggregate queries per second, and more content equates to wider variance in type of material equates to, quite simply, more to search for. So, applying query rates involving only thousands of users to GnutellaNet populations orders of magnitude greater in size is probably inaccurate; instead, at greater sizes, the above computed bandwidth rates are probably much too small. Indeed, one can extrapolate from the above, using the test case of 1,000,000 users: * 8,000 users generate 5 queries per second, which simplified means * 1,600 users generate 1 query per second, which then leads to * 1,000,000 users / 1,600 users per query per second == 625 queries per second Therefore it is more likely that, given an ideal GnutellaNet and a capable Internet, Gnutella would generate 625 queries per second with one million users instead of our test case of 5, which generates 4GBps worth of traffic just by itself. So how much data does a query rate of 625 qps generate? The calculation is left as a thoughtful exercise to the reader. Most important of all, though, the above numbers assume a capable network connection exists for all participants. If networks weren't capable of relaying the amounts of traffic discussed above, traffic jams would occur and query rates would drop, query response rates would drop, and overall traffic rates, as a result, would drop. And we know they aren't capable; we know that a significant percentage of participants are dialup users, and their low bandwidth capabilities cause significant traffic congestion and topology fragmentation when improperly configured. Conclusions ------------------------------------------------------------------------ Even though many assumptions were made throughout the course of these calculations, some of which are provably unrealistic, these exercises still yield a useful perspective. In an ideal world, Gnutella is truly a "broadband killer app" in the most literal of senses -- it can easily bring the Internet infrastructure to its knees. And it should also be noted that only search query and response traffic was accounted for, omitting various other types of Gnutella traffic such as PING, PONG, and most importantly, the bandwidth costs incurred by actual file transfers. 2.4GBps is just search and response traffic, but what about the obnoxiously large amount of bandwidth necessary to transfer files between clients? Those reading this paper should be careful to note that non-intended uses of the GnutellaNet also incur noticeable bandwidth hits: using search queries to chat with other participants, SPAM placed inside search queries and results to advertise various things, and gibberish, typically resulting from misbehaving users or clients. Futhermore, with individuals writing their own clients and protocol extensions, we may begin to see loop detection being rendered useless. Depending on how individual clients implement loop detection (comparing message ID's versus comparing message ID's + a checksum of the packet's payload), protocol extensions may interfere with legacy clients and result in more traffic than necessary being generated and relayed. The main argument against this paper is that GnutellaNets are never ideal, and as adoption and usage grows, are statistically less likely to be ideal, given the increase in complexity of the topology as the number of participants increase. I would agree with this principle, but I believe it only serves as better proof of the premise: if an ideally distributed and fully capable network generates 2.4GBps to accomodate 1M users (and we already know this figure to be unrealistic in terms of what the modern Internet is capable of), then a poorly distributed network with insufficient bandwidth will certainly not be able to support the same number of participants or the traffic they generate. In other words, again, Gnutella can't scale. Another key argument against these computations is that they are all focused on the center of an ideal GnutellaNet, and applying this generalization to all configurations of nodes is misleading and inaccurate. Traffic is measured and generalized from a maximizable point; this is to say that the "center" node will always generate the most amount of traffic given the same configuration throughout, whereas a leaf node in an ideal GnutellaNet generates only a fraction of that bandwidth. However, empirical analysis yields the observation that, in practice, leaf nodes don't generally have only one connection into the GnutellaNet. As a matter of fact, leaf nodes don't tend to occur naturally at all, since it is rarely in a participant's best interest to limit themselves to one connection, in maximizing bandwidth capacity versus search depth. To date I've only observed this happening on a large scale with Reflectors, or strategically placed Gnutella "proxies" at high bandwidth locations on the Internet aimed at serving dialup and other small capacity clients. So, the inaccuracy of these numbers likely lies in their being, again, much too small. Also, regardless of how intertwined and convoluted the connection paths are, the data path is effectively rendered semi-ideal through loop detection, so the methodology turns out to be more realistic than first thought. Yet another valid question to raise against the premise is, What is a reasonable size? Is it 100 users? Is it 1,000? Or 100,000? Or 1,000,000? Nothing short of global domination? Discerning what's reasonable is assuredly a subjective comparison, however, I use the phrasage interchangably with original statements like "Gnutella will kick Napster in the pants." Common sense dictates that in order to accomplish that, Gnutella would have to perform more efficiently, scale higher, and be more capable. These exercises prove that, on a perfect level, Gnutella just can't rise to meet the challenge. Consequentially, they prove that on an imperfect level Gnutella has no hope of performing on the same level. In the final assessment, it's painfully obvious that Gnutella needs a complete overhaul. Major architectural flaws are fundamental in nature and cannot be mitigated effectively without redesign at the most basic level. Some intelligent caching could likely benefit the Gnutella architecture, since observations yield that many searches and responses result in repetitive, duplicate transmissions. However, given the transience of GnutellaNet participants, and the wide variety of participating clients, it would be difficult to predict with any amount of accuracy how effective technology like this would be. Various efforts claim to be underway to redesign the protocol; among them, gPulp stands out as the farthest along, with message boards and mailing lists set up for those wanting to get involved. But, with its mission of consentual changes implemented through a working group, I harbor significant doubt as to whether they will ever be timely and effective at producing an alternative. GnutellaWorld, another revamp effort recently publicized by CNet's news.com, takes the lead on the initiative for developing Gnutella2. J.C. Nicholas, apparently representing GnutellaWorld, claimed in an interview with CNet that Gnutella2 technology would be out "soon". Characterized as an "Internet Earthquake" and promised to be "the greatest revolution since Linux", Gnutella2 sounds more like the same old hype than anything else. And with only 8-9 months under their collective belt as an organization, I personally wonder how far along efforts could be. If the fact that this open-source project's CVS repository remains quite empty, or that its mailing lists appear dormant presents any indication of progress, the Internet probably has some time to go before experiencing the next internet cataclysm. Considering GnutellaWorld's intentions of supporting 20 million people or more, I can only hope that it's nothing like the original Gnutella. Permission to reproduce this document in part or in full is permitted only under the condition that credit for the work is visibly given to the author, Jordan Ritter