Export or Print
Is there a smart way to export all stories (including their comments) for one month. Or print them out?
Even I am a netadict I would like to keep a paper copy at some point.
Is there a smart way to export all stories (including their comments) for one month. Or print them out?
Even I am a netadict I would like to keep a paper copy at some point.
nex
I don't know of any way to export the raw data directly from the database, and, for example, import it into your local antville installation. You could, however, create a local static mirror of the HTML pages using
wget
, or you could do something similar with yourrss feed
. I'm not aware of an RSS reader that allows you to put all stories of one month on one page, e.g. for printing them out, but maybe you can find one at blogspace or so.And personally, I think you should keep the format electronic (full text search, perfect copies possible, no coffee spills) and let the poor trees alone :-)
Update: It turns out I lied above. I never actually used RSS, and even less one of the various newsreaders. But now I checked out the link I provided above :-) Turns out that RSS is only about delivering headlines; if you want the full story, you get the HTML version anyway. Of the clients listed there, I think the only one I don't hate is peerkat. It allows you to use a MySQL database as its data store, which means that you could easily export the data from there and process it further. But you won't get the whole stories, so I guess that's not what you want. The RSS people are working on a module that delivers all the content of a story, so once Antville supports this, we should really be able to suck a whole month's content into a local database in a bandwith-efficient manner (spidering with
wget
downloads quite some redundant data).By the way, I suggested wget several times already, but never wrote a HOWTO, because no one ever asked. If you need one, tell me.
sturmfisch
wow - that was a lengthy reply. thanks a lot for it.
why would I want a paper copy?
sometimes I am think I am very conservative. say in 40 years I'm dead. wouldn't it be interesting for my children to know that I was maintaining a log, and actually been able to read what was going in during that time? I can guarranty that my papers will still exist in 40 years, can antville do that as well?
I also was writing a "2002 review" and found it rather difficult to go through all the stories. A paper copy would still be easier and user friendlier [needless to say I still love antville!!]
rss and wget()
not sure if I am able to manage any of this. but would give it a try if somebody would hold hands or help a dummy ...
nex
I also don't rely on Antville to keep data that's important for me, I mirror my blog and save it with my own backups. I keep my backups in different places, so they'd survive even if the whole house burnt down, thus this method is even saver than keeping a hardcopy. And I think for my children paging through hypertext will be as natural as paging through folders of hardcopy is to us. However, this method requires regular care of the backups—CD-Rs won't last as long as 40 years (as laser prints do), so you have to copy the data from time to time.
Anyway, about wget: This is a non-interactive spider/download tool, which is rather well known on Linux, but also available under DOS/Windows. You specify options in a command line or in a file that tell it what files to retrieve from the net (http or ftp) and it gets them. Just download it, skim over the help file and try it out.
A perfect set of options would ensure that no redundant or unneeded pages are loaded; for example, you wouldn't need an edit form for every story in a static copy of your blog, since they wouldn't work anyway. It would also send a cookie with every request so your private offline-stories are also retrieved. I don't have the time to do this tonight, mainly because I'm tidying up my room and I have to finish that job before I go to bed, because all the stuff I'm moving around is temporarily stored on my bed :-)
But I can provide a starting point: Create a batch file/shell script named 'backup-blog', which executes this command:
wget http:
//your.antville.org --dot-style=binary -r -l 3 -np -k -t 1
Explanation:
dot-style
just makes sure you get some neat feedback on what it's doing; isn't important at all but looks cool-r
makes the retrieval recursive, i.e. it follows all links to other pages in the page located at the initial URL and if these pages contain links, it follows them further and so on. You don't have to be afraid that it does some harm in your blog or trashes anything: firstly, wget is not logged in, so it doesn't get any edit or delete links and it wouldn't be allowed to do something like that anyway. Secondly, everything that changes anything, like editing or commenting, isn't activated through a link, but through a button.wget
will follow all 'comment' links, but it will get the 'login' page every time as a result.-l 3
restrict the level of recursion to 3; i.e. ifwget
follows a link from your frontpage, then follows another link, and then another link again, it will stop there instead of going on forever. You might want to increase this number to cause all stories to be downloaded. Settign it to the number of months you have will make sure that every story will be available through the calender, but not neccessarily through the 'previous stories' links. E.g., suppose your blog is 4 months old and you set-l 4
, but in one topic you have so many stories that they span 10 pages, thenwget
won't follow the 'previous' links all the way back to the 10th page, thus they won't work in your local mirror. However, you will have saved all stories and can explore them by looking at the calendar or the folders on your disk.-np
means no parent and is very, very important if you don't want to download the whole www. It restrictswget
to only go deeper in the directory hierachy and never up. This means, from your front page, it will advance into your topics and pages of single days, but it will never go up to www.antville.org and it won't follow links to yahoo.com or any other site, which would be quite a catastrophy.-k
is a nifty option that converts absolute links to relative links, e.g. "your.antville.org/topics/bierdeckelsammeln" would be converted to something like "../../topics/bierdeckelsammeln". This is cool, because when you click the link in your local mirror, you won't be sent to antville, but to the local copy of that page. Of course, the link will only be converted if you really have a local copy of this page. In practice, this means you can click through the local copy of your blog and the page of every day will come from your disk in fractions of a second.-t 1
specifies that if a page or file cannot be reached,wget
will retry to get it only once. If your connection is unrealiable, you might want to increase this number.So there you are, a starting point. What this doesn't do is
Sorry for writing such a long-winded story again! I'm suffering from geek syndrome and always have to explain everything I know; hope this helps :-)