Antville Help
Friday, 10. January 2003
Export or Print
Is there a smart way to export all stories (including their comments) for one month. Or print them out?

Even I am a netadict I would like to keep a paper copy at some point.

... Comment

Re: Export or Print
I don't know of any way to export the raw data directly from the database, and, for example, import it into your local antville installation. You could, however, create a local static mirror of the HTML pages using wget, or you could do something similar with your rss feed. I'm not aware of an RSS reader that allows you to put all stories of one month on one page, e.g. for printing them out, but maybe you can find one at blogspace or so.

And personally, I think you should keep the format electronic (full text search, perfect copies possible, no coffee spills) and let the poor trees alone :-)

Update: It turns out I lied above. I never actually used RSS, and even less one of the various newsreaders. But now I checked out the link I provided above :-) Turns out that RSS is only about delivering headlines; if you want the full story, you get the HTML version anyway. Of the clients listed there, I think the only one I don't hate is peerkat. It allows you to use a MySQL database as its data store, which means that you could easily export the data from there and process it further. But you won't get the whole stories, so I guess that's not what you want. The RSS people are working on a module that delivers all the content of a story, so once Antville supports this, we should really be able to suck a whole month's content into a local database in a bandwith-efficient manner (spidering with wget downloads quite some redundant data).

By the way, I suggested wget several times already, but never wrote a HOWTO, because no one ever asked. If you need one, tell me.

... Link

Re: Re: Export or Print
wow - that was a lengthy reply. thanks a lot for it.

why would I want a paper copy?
sometimes I am think I am very conservative. say in 40 years I'm dead. wouldn't it be interesting for my children to know that I was maintaining a log, and actually been able to read what was going in during that time? I can guarranty that my papers will still exist in 40 years, can antville do that as well?
I also was writing a "2002 review" and found it rather difficult to go through all the stories. A paper copy would still be easier and user friendlier [needless to say I still love antville!!]

rss and wget()
not sure if I am able to manage any of this. but would give it a try if somebody would hold hands or help a dummy ...

... Link

Re(2): Export or Print
I also don't rely on Antville to keep data that's important for me, I mirror my blog and save it with my own backups. I keep my backups in different places, so they'd survive even if the whole house burnt down, thus this method is even saver than keeping a hardcopy. And I think for my children paging through hypertext will be as natural as paging through folders of hardcopy is to us. However, this method requires regular care of the backups—CD-Rs won't last as long as 40 years (as laser prints do), so you have to copy the data from time to time.

Anyway, about wget: This is a non-interactive spider/download tool, which is rather well known on Linux, but also available under DOS/Windows. You specify options in a command line or in a file that tell it what files to retrieve from the net (http or ftp) and it gets them. Just download it, skim over the help file and try it out.

A perfect set of options would ensure that no redundant or unneeded pages are loaded; for example, you wouldn't need an edit form for every story in a static copy of your blog, since they wouldn't work anyway. It would also send a cookie with every request so your private offline-stories are also retrieved. I don't have the time to do this tonight, mainly because I'm tidying up my room and I have to finish that job before I go to bed, because all the stuff I'm moving around is temporarily stored on my bed :-)

But I can provide a starting point: Create a batch file/shell script named 'backup-blog', which executes this command:

wget http://your.antville.org --dot-style=binary -r -l 3 -np -k -t 1

Explanation:
  • wget is the name of the program
  • next comes the parameter for the program, namely the URL which we use as starting point
  • followed by some options... dot-style just makes sure you get some neat feedback on what it's doing; isn't important at all but looks cool
  • -r makes the retrieval recursive, i.e. it follows all links to other pages in the page located at the initial URL and if these pages contain links, it follows them further and so on. You don't have to be afraid that it does some harm in your blog or trashes anything: firstly, wget is not logged in, so it doesn't get any edit or delete links and it wouldn't be allowed to do something like that anyway. Secondly, everything that changes anything, like editing or commenting, isn't activated through a link, but through a button. wget will follow all 'comment' links, but it will get the 'login' page every time as a result.
  • -l 3 restrict the level of recursion to 3; i.e. if wget follows a link from your frontpage, then follows another link, and then another link again, it will stop there instead of going on forever. You might want to increase this number to cause all stories to be downloaded. Settign it to the number of months you have will make sure that every story will be available through the calender, but not neccessarily through the 'previous stories' links. E.g., suppose your blog is 4 months old and you set -l 4, but in one topic you have so many stories that they span 10 pages, then wget won't follow the 'previous' links all the way back to the 10th page, thus they won't work in your local mirror. However, you will have saved all stories and can explore them by looking at the calendar or the folders on your disk.
  • -np means no parent and is very, very important if you don't want to download the whole www. It restricts wget to only go deeper in the directory hierachy and never up. This means, from your front page, it will advance into your topics and pages of single days, but it will never go up to www.antville.org and it won't follow links to yahoo.com or any other site, which would be quite a catastrophy.
  • -k is a nifty option that converts absolute links to relative links, e.g. "your.antville.org/topics/bierdeckelsammeln" would be converted to something like "../../topics/bierdeckelsammeln". This is cool, because when you click the link in your local mirror, you won't be sent to antville, but to the local copy of that page. Of course, the link will only be converted if you really have a local copy of this page. In practice, this means you can click through the local copy of your blog and the page of every day will come from your disk in fractions of a second.
  • finally, -t 1 specifies that if a page or file cannot be reached, wget will retry to get it only once. If your connection is unrealiable, you might want to increase this number.
So there you are, a starting point. What this doesn't do is
  1. get only one month
  2. get offline-stories, nor the image pool, nor the file pool
  3. avoid retrieving unnecessary pages
However, this could surely be implemented. Maybe someone else would like to go on from here and post a better set of options, or I'll have time to think something up next week.

Sorry for writing such a long-winded story again! I'm suffering from geek syndrome and always have to explain everything I know; hope this helps :-)

... Link


... Comment

Online for 2709 days
Last modified: 2009-07-02 10:59
Status
Youre not logged in ... Login
Mailing-List
e-mail:
Menu
... Home
... Tags

Search
Calendar
July 2009
SunMonTueWedThuFriSat
1234
567891011
12131415161718
19202122232425
262728293031
June
Recent updates
ich hab schon mal bei dem typen in seiner firma angerufen; daraufhin verschwanden wenigstens...
by schmerles (2009-06-23 19:12)
Es gibt noch ein zweites Posting auf mdk, aus dem ich den Link gelöscht...
by kinomu (2009-06-23 12:26)
Werte Admins, der User zalim verteilt hier allerlei Treppenlift-Spam. Bei sees.ant habe ich den Quatsch...
by ichichich (2009-06-23 11:04)
Auf einem Testblog habe ich die Comment.edit und die Story.comment-Skin verändert, um zum gewünschten...
by kinomu (2009-06-16 18:39)
Hilfegesuch wegen Comment-Formular Vor Urzeiten habe ich meine Skin für Kommentare zerschossen. Ich hätte gerne...
by molosovsky (2009-06-15 21:42)
die referrer waren wieder weg. jetzt sind ein, zwei von yahoo da, aber keine...
by schmerles (2009-06-15 16:37)
Jetzt verstehe ich, was du meinst. Daß die Tags nicht als Links angezeigt werden,...
by kinomu (2009-06-15 04:03)
Hi kinomu. Verwendet man in einer Story das macro story.topic, läßt sich mit as="link"...
by molosovsky (2009-06-14 21:59)
Ich weiß nicht, was du mit "die einzelnen tags funktionieren in der Story nicht...
by kinomu (2009-06-14 20:14)
Sie sind noch da (siehe Page Source), werden aber nicht angezeigt, weil auf der...
by kinomu (2009-06-14 19:58)
wo sind denn die referrers hin?
by don papp (2009-06-14 19:39)
Okey. Jetzt hab ich das macro umgenannt von ›story.topic‹ in ›story.tags‹. Jetzt werden zwar...
by molosovsky (2009-06-12 20:56)
http://botic.antville.org/tags/ --> bei mir funktionierte das mit mehren Tags wunderbar.
by Botic (2009-06-12 18:51)
@kinomu Bei Deiner Beschreibung, wie man neue Tags anlegt, könnte man rauslesen, dass nun einem...
by molosovsky (2009-06-12 17:16)
(Der "Zollzeichenfehler" ist jetzt auch auf Google Code verewigt.)
by kinomu (2009-06-10 14:55)
Uff. Also. Referrer-URLs von Suchmaschinen und ähnlichen sollte man aus dem Filter löschen. Dann klappts...
by molosovsky (2009-06-10 13:31)
das mit dem Spamfilter wollte ich auch schon sagen, ich wusste aber nicht, ob...
by DaveKay (2009-06-10 13:28)
Gern geschehen, aber... ...um Missverständnissen vorzubeugen: ich habe nichts geändert, nur diese Feststellung gemacht. Der...
by tobi (2009-06-10 13:05)
Bei mir bestand dieses Problem schon seit einigen Monaten. Bis jetzt hat sich da...
by molosovsky (2009-06-10 10:55)
Ah! Danke, Herr Kimonu, danke Herr Tobi. Alles ist wieder da.
by andreaffm (2009-06-10 10:12)
Die Skins haben damit nix zu tun Das ist ein (bisher scheint's noch nicht aufgetretener)...
by tobi (2009-06-10 01:01)
Nein, ansonst habe ich nichts geändert. (Und daß es tatsächlich das Zurücksetzten dieser beiden...
by kinomu (2009-06-10 00:11)
warst du noch an anderen Blogs?
by DaveKay (2009-06-10 00:09)
was hast du denn gemacht?
by DaveKay (2009-06-09 23:47)

RSS feed

Made with Antville
Helma Object Publisher