The problem with blogging
Thursday, April 05, 2007
I want to solve one of the problems with blogging. And I have a rather silly solution.
It's not that blogging has many issues (some people may disagree here), but there is a particular problem I'm interested in: it's the dead link/404 problem.
What I mean with the "404 linking" problem is essentially dead or stale links.
The number 404 is the code given by the HTTP protocol inventors to the "resource not found" error message. So that anyone creating a web server knows the convention of what to do when something is not found and so the browser displays some friendly message to the user, i.e., the dreaded "page not found" error message.
One the major aspects of blogging is linking to other sources on the internet (well, it's the life line of the web), but many sites go out of date quickly so whenever anyone is looking at historical entries in any blog some links are no longer valid.
What's a blogger to do, if he or she wants to keep blog entries relevant for all eternity?
My solution is simple, but I'm not sure if it is breaking copyright laws--may be not; isn't everything on the web free? ;)
What I do is take a snap shot in the form of an image or copy the actual HTML of the original content, hosted locally and link
to it in my blog entry.
I only do this for a few sites that I think are rather important to the context of my entries. Of course, I still have a link to the original site and then clearly mark the local content as an exact copy of the original. My legal defense is that I'm not selling the information, I'm giving full credit for the content, and my site is not a commercial application.
It's more of a referencing exercise. Without references, there would be no new academic work or any new books written, for that matter. So I can use any idea, as crazy as it may be, as long as I give credit where credit is due.
And there are some thing you can copy without permission, for example, in Canada I can reproduce up 10% of any copyrighted material for academic purposes and not break the law. The source of this information was the WLU
librarian on duty a few weeks ago, and I'm assuming he knows what he is talking about--I would recommend to check with your friendly neighborhood librarian to find out what the rules are in your location.
So my solution is not really a solution, but a hack. But I'm trying to design a permanent, fully legal, and open sourced solution. One that will work for everyone and forever.
What's similar out there? In social networking sites such as slashdot or digg, a mirror is used temporarily to solve the slashdotting
What these mirror sites do is copy the content locally so web surfers can temporarily visit the mirror sites instead of the original to alleviate the immense load these sites (slashdot or digg) generate when a story makes it to the front page.
This works, and it has actually created a fringe industry out of it, based on advertising revenues. But it's only temporary. There is not enough bandwidth nor incentive to keep a permanent mirror of other sites' content. In addition, it's probably illegal.
Another feasible solution is the way back machine
. But there are two main reasons this won't work: first, who knows when the site will go belly up and stop archiving web sites; second, it only takes snapshots at predetermined dates--it is not really archiving everything published.
To me, the 404 blogging issue need fixing, but my solution of copying the content and offer it locally is not scalable.
So, unless every site on the internet stays up for as long as mine does, which is the ideal, what other solution is there?
I'm hoping to come up with a rather good solution. Although, I'm not working too hard on it.