From 240e0c2451d02b3f1eea2da66b5fd964a6eb59f6 Mon Sep 17 00:00:00 2001 From: Lucian Mogosanu Date: Sat, 9 Feb 2019 13:37:52 +0200 Subject: [PATCH] posts: 083, draft --- posts/y05/083-gutenberg-rsync.markdown | 63 ++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 posts/y05/083-gutenberg-rsync.markdown diff --git a/posts/y05/083-gutenberg-rsync.markdown b/posts/y05/083-gutenberg-rsync.markdown new file mode 100644 index 0000000..df7bbac --- /dev/null +++ b/posts/y05/083-gutenberg-rsync.markdown @@ -0,0 +1,63 @@ +--- +postid: 083 +title: Rsync'ing Project Gutenberg, a report +date: February 10, 2019 +author: Lucian Mogoșanu +tags: tmsr +--- + +From [the logs][btcbase-1892264]: + +> **mircea_popescu**: incidentally, either spyked or lobbes what do you +> need to make a complete gutenberg.org copy ? it IS going away, for one +> thing the initiator guy died and for the other thing, with their +> world-famous +> [http://btcbase.org/log/2017-03-15#1627828][btcbase-1627828] there's +> no way they'll stay online all that long. +> a111: Logged on 2017-03-15 23:50 **mircea_popescu**: which +> incidentally - has been read TODAY by more people than read ALL of +> marcel proust's works since the making of gutenberg.org +> +> **mircea_popescu**: should prolly also salvage +> [http://www.perseus.tufts.edu/hopper/][hopper] but that's going to be +> more work than a straight download & strip headers job. +> **asciilifeform**: mircea_popescu: apparently gutenberg is rsync'able +> ( [https://archive.is/PWeNA][gutenberg-mirroring] ) , tho i haven't +> tried +> **mircea_popescu**: aha. not much work. + +Thusly [proceeding][tmsr-schedule], I read the +"[Mirroring How-To][gutenberg-mirroring]" guide, which pointed me to a +place called [ibiblio.org][ibiblio], which supposedly contains a full +mirror of gutenberg.org -- supposedly, because on a first attempt, one +can easily notice that their ftp doesn't contain said item, or if it +does, it's hidden so well that I could not find it. + +However, further down the line in the mirroring wiki-guide, we are given +the anchor to a [list of mirrors][gutenberg-mirrors]. Similarly, I +randomly selected a couple of links, finding that they either timed out +or didn't contain the gutenberg mirror they purport to. Fortunately, the +third choice, `rsync.mirrorservice.org`, worked, in that I could: + +~~~~ +$ rsync -av --del rsync://mirrorservice.org/gutenberg.org/ guten +~~~~ + +and after three days or so of downloading, I have sitting somewhere +circa 800GB of files that on a cursory glance seem to contain books and +other assorted items, e.g. mp3 files and DVD images. + +The mirror is currently resting on a private machine, but I will make it +available in the following months, after some disk acquisition and +swapping which will allow me to host it at house Mogosanu. Meanwhile, I +expect that for now (and probably only in the very near future), the +step above should be reproducible by other folks who wish to maintain +their own mirror. + +[btcbase-1892264]: http://btcbase.org/log/2019-02-04#1892264 +[btcbase-1627828]: http://btcbase.org/log/2017-03-15#1627828 +[hopper]: http://www.perseus.tufts.edu/hopper/ +[gutenberg-mirroring]: https://archive.is/PWeNA +[tmsr-schedule]: /posts/y05/082-tmsr-schedule-i.html#selection-200.0-200.7 +[ibiblio]: https://www.ibiblio.org/ +[gutenberg-mirrors]: https://archive.is/6k2uP -- 1.7.10.4