Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts
Beaten Senseless
Next NYT Blog is Up

Reverse-Engineering the Internet

Posted by Jeff Barnett on April 17th, 2006

I recently read an article on www.physicsweb.org by Albert-Laszlo Barabasi titled “The Physics of the Web.” The full text of the article can be found here [http://physicsweb.org/articles/world/14/7/9]. While quite old when given the constantly evolving nature of the internet, a feature of Physics World in July 2001, I found the article immensely interesting. Mr. Barabaisi attempts to make sense of the physical and virtual web by means of analyzing the network using graph theory, a system that examines the many nodes and links that comprise a network.

First, he distinguishes between the terms “internet” and “world wide web,” or simply “web.” The internet is comprised of physical workstations and connections that carry information. The internet that we know is a system of computers, routers, switches, ethernet cables, phone lines, and fiber optic cable. Each node of the internet (a computer, router, etc) is connected to another node via a hard-wired or wireless data link. The web, however, is comprised of documents that are stored, transferred, and viewed on the internet. The web is the endless numbers of html, pdf, and other files that you view on your computer, often in a web browser. Nodes on the web (documents) are connected to other nodes by URL links. Unlike most networks, however, on the web a connection from one node to the other (a link from one site to another) does not imply a reverse link. For example, I have a link on this page to the University of Alabama, Huntsville, but UAH has no link to my page on their site. In contrast, all links on the internet are two-way links.

The study set out to determine if the seemingly random construction of the internet and web was actually random at all. New nodes are added to each network every second of every day, with no real motivation from anyone or the ability, for that matter, to keep track of it all. The study determined that both networks are what the author terms “scale-free,” meaning that the distribution of links among web pages and connections among internet nodes follows a power-law distribution rather than a binomial distribution. I understand this to mean the following: The author expected to find that most nodes on the network would have an average number of links to other nodes, let’s arbitrarily say five, and that the number of nodes with significantly more or less than five links would be small, and would get smaller the further away from five links you get—similar to a bell curve. Instead, they found that most nodes have very few links, and nodes with a great number of links are rare, but present in greater numbers than the first example.

This has a few implications. First, both the internet and the web were found to follow approximately the same pattern—that of scale-free networks. Even though they are intimately connected in our minds, there is no real reason I can think of that this should have happened. Second, this implies both networks are “clustered,” meaning that just a few nodes are responsible for a great many links. This makes sense in the light of search engines, which, according to the article, map a maximum of only 16% of the web as of 2001. I also believe this may be due to groups that share similar interests, such as a corporation or a gaming forum. These types of infrastructure have a vested interest in being connected, the former physically and the later virtually, to others that share their interests. For example, I would venture a guess that ISPs provide internet service to a vast multitude of home users that comprise the “very few links” category. In contrast, they probably also provide service to just a handful of large corporations that comprise the “very many links” category. On the web you can see a similar pattern. Sites like Google, Yahoo, ESPN, Digg, and various other news blogs provide a huge number of links to documents all over the web. In contrast, the sites they link to are often smaller sites (this one, perhaps) with a comparatively smaller number of links. In both networks the vast majority of nodes have just a few links, and it is the rare node that has an abundance of links.

Another interesting point is the principle of “19 clicks of separation.” Much like the six degrees of separation between all of us and Kevin Bacon (or so I’m told), on the world wide web an average of 19 links (or clicks) separate any two randomly selected web pages, assuming a path exists between the two (yes, a big assumption). The study took the number of web pages thought to exist in 1999, 800 million, and calculated that an average of 19 links separate any two. This number of links grows with the number of nodes on the network, so it is no doubt much larger today. The assumption that a path between any two pages exists is a pretty large assumption, but it’s still neat to think that you might be able to start at the home page of the British Embassy in South Africa and 20-21 clicks later (assuming the growth of the web since 1999) end up at Midnight in Iraq.

Also present in the web is a principle of “the rich get richer,” with respect to links and nodes, of course. The author found that some web pages were inherently more attractive to links than others because of their content. A well-designed page with regularly updated content will be linked to more often than a poorer page. These web sites often continue to grow by leaps and bounds at a pace that outscales that of slightly lesser sites. If you have a good web site, it will continue to grow and receive links, and this will only compound over time as long as it retains its attractiveness. Poor web sites, however, are locked into a vicious cycle of never having many links.

The study goes on to investigate the resilience of scale-free networks to the efforts of hackers. It found the internet surprisingly resistant, noting that a large number of random nodes have to be removed before the network will fragment into small, unconnected parts. It quotes the statistic that 3% of the internet’s routers are down at any given time, and this doesn’t even come close to fragmenting the network. Unfortunately, the web is a different story, as the removal of a few of the most-connected nodes can cripple the network. As we know, attacks from hackers are never random, and are usually directed at the targets of greatest value, ie those with the largest numbers of nodes connected to them.

So what does this all mean besides a mental workout? Well, I think there are some lessons to be learned here for publicizing web pages—the effort to become that rare node with a multitude of links. First, have a website with “attractive” content. Easier said than done, huh? If your site lacks good content it appears you will never get off the ground. However, the presence of good content will cause you to grow exponentially. Attractive content seems to be directly linked to new or regularly updated web pages. Second, try to be a site with lots of incoming links. I believe you achieve this by connecting different communities. When I have unrestricted hi-bandwidth access to the internet my surfing generally spans a few different genres: shooting and firearms, video gaming, electronics and technology, and news. Each of these genres has popular sites with many similar themed sites linked to them. If you can get sites from multiple genres to link to your site, then you have expanded your audience to the many web surfers of each of those genres. Third, get over the hump. The study quoted predicts that above a certain level of popularity (linked-ness) a website will flourish and grow exponentially, and below that threshold it will stagnate. I can’t tell you where the “hump” is, but apparently it’s an important obstacle to overcome.

The last point of the study I want to address was not really addressed in the original article at all, but is a product of my own reasoning. One quote from the article reads “we all feel that behind every complex system there is an underlying network with non-random topology.” This brought to mind the hotly debated topic of evolution versus intelligent design and what is allowed to be taught in public schools. With all the negative press attributed to intelligent design as of late, I found this statement intriguing. I believe most critically thinking individuals look at a system as complex as our world and know that “there is an underlying network with non-random topology.” Just like the internet, somebody made it. The second law of thermodynamics states (in a nutshell) that all natural systems will tend towards disorder over time. This begs the question, “If the earth is a natural system, why has it tended towards order in the form of the creation of life?” I’m not here to preach, but I couldn’t help but notice the correlation between finding a structure at the heart of the internet and world wide web, and the idea that there was and is structure present in this place we call earth.



Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Wow, interesting. Kinda hard to understand all that Internet vs. Web stuff though. :) I’ve always been under the impression that they were one and the same. Just goes to show how much I know.

I did not tackle this gem until Tuesday AM, being content to comment along simpler lines last evening, but today I have read this and it was worth it. Though the entirety is incredibly interesting to explore and to consider, what is the most noteworthy is the “product of your own reasoning”.

I read somewhere in the last year that a scientist, an elderly atheist, who had been at the center of the Darwinian support system, had acknowledged publicly in a scientific journal that he no longer gave full “devotion” to the concept. When a fellow scientist had sufficiently recovered from his apoplexy to question why, the other fellow said that from the time he had heard of DNA he could no longer credit natural selection. Asked the even more appalling thought about his “other” belief, he further acknowledged that he no longer could consider himself an atheist but had moved to being an agnostic - another journey of a million miles, accomplished within the cranial synapse system.

A man or woman with an open mind will find all the proofs necessary to support what the afore mentioned scientist is journeying toward. Despite the chaos that man manages to endlessly produce, there is order in the universe, on earth and in the DNA.

Thank-you for this early morning brain stretch and your inspired and inspiring reasoning.

Be safe, Midnight.

I enjoy all that you share here but I especially enjoyed this one. I had to save it for Friday night so I had time to read it since a quick skim of the first paragraph was enough to know I’d need more time to follow and absorb all you wrote. The last in-depth book I read on the Internet and World Wide Web was 10 or 15 years ago and you can imagine how different that was, with gophers and archies being the cool techy terms of the now-ancient original surfers =)
(Yes I was a cool techy nerd back then for about a minute until all of my vast expertise became outdated in the time it took to simply read the book)
I found your current discussion very interesting.
And your personal reasoning in conclusion not only logical
but, as always, insightful and thought provoking.
Thanks!

Yrizaria,
Thanks for the comments. I’m sure it’s all the more interesting to have been an internet “old codger” and have seen it grow into what it is today.

Just to clear up any confusion (since places like dictionary.com like to completely eff up a definition), an Atheist is simply someone who lacks belief in Theism.

Theism is the belief in a god or gods that intervenes (or directly communicates with any humans…ie Christianity, Islam, etc.) It is NOT the proclamation that god doesn’t exist period. (It’s NOT the “Doctrine that there is no god or gods”) it’s the doctrine that “none of the presented theistic gods do I believe to be real”.

One can believe in a god and still be Atheist as long as the God you believe in isn’t intervening (Called Diests. All Deists are also Atheists since they lack a belief in theism).

In other words, one can be Atheist and also believe in Intelligent Design (Creation). Atheism says nothing about what a person believes to be true, it simply describes ONE thing someone believes NOT be true - all currently presented theistic religions.

Thank you.

Really, there is no such thing as Atheism. Everyone knows there is a Creator; it’s so obvious in nature and the complexity of the human body. Everyone knows there is a God, they just choose to suppress the truth in unrighteousness and not acknowledge it.

D,

Interesting points. Thanks for the clarification. I would point out that the word “all” is pretty inclusive.

Slayer,

Interesting point as well.

Keep this civil, gentlemen.

“Keep this civil, gentlemen.”

Yes, Sir. :)

Um, a friend of mine, who reads this blog, pointed something out to me that I totally missed… I’m just that slow.

Diesel, I did not realize by what you posted that you may be an Atheist. If you are, I did not mean to offend you in any way. Please accept my sincere apologies.

Midnight, I didn’t know what you were saying “keep this civil” for… now I do. Heh, sry. I feel stupid, as well as embarrassed.

Don’t feel badly or embarrassed Slayer, the common understanding of the word Atheist is simply “someone who doesn’t believe there is a God”. Diesel’s definition was enlightening and pointed out a very important misconception.
There are many people who consider themselves “religious” who believe firmly that an intelligent force created the universe but also believe that whatever Supreme Being may have done so he/she/it is either (1) long dead or (2) has lost interest in this particular experiment which we call planet earth or (3) is deliberatly observing but not interferring for reasons so far beyond our understanding we can’t even begin to ponder (although many religions try to explain those reasons in terms we can understand and stress patience while we await the outcome)
Either way the point here was Intelligent Creation rather than proof of the continued existence of God or divine intervention in our lives today.
Ancient history verses current events. =)

Slayer,
No need for embarrassment. It appears the discussion is proceeding as one should between ladies and gentlemen.

Yrizaria,
Very interesting points on 1,2,3.