A request for help on wiki backup

Messages from and Discussions about IMSLP

Moderator: kcleung

Post Reply
imslp
Site Admin
Posts: 1639
Joined: Thu Jan 01, 1970 12:00 am

A request for help on wiki backup

Post by imslp »

Just to bring attention to this matter, I'm re-posting this as an announcement.

Currently I'm working on making available a copy of the IMSLP wiki database, with the text. However, I am also working on several other IMSLP-related feature implementations, and I would be extremely grateful if someone would have time to research ways to back up the wiki database that is
  1. Not too system-intensive (I'd prefer SQL backups over XML),
  2. Can be used to recreate to a very functional wiki (i.e. the wiki after re-creation should be pretty much fully functional, and not just dead text)
  3. Contains no private or sensitive information
I note here that IMSLP uses Mediawiki 1.11.0. Also, this backup system will continue to be used after IMSLP is back online.

Edit: Actual code, whether shell code or script code, to accomplish the above is extremely appreciated.
Last edited by imslp on Sat Nov 17, 2007 4:38 pm, edited 1 time in total.
Yagan Kiely
Site Admin
Posts: 1139
Joined: Sun Jan 14, 2007 8:16 am
notabot: YES
notabot2: Bot
Location: Perth, Australia
Contact:

Post by Yagan Kiely »

Leonard Vertighel
Groundskeeper
Posts: 553
Joined: Fri Feb 16, 2007 8:55 am

Re: A request for help on wiki backup

Post by Leonard Vertighel »

imslp wrote:
  1. Can be used to recreate to a very functional wiki (i.e. the wiki after re-creation should be pretty much fully functional, and not just dead text)
  2. Contains no private or sensitive information
The problem is that the database dump should not contain the user password hashes. This means that users could not log in to a wiki set up from the dump. An admin would thus have to reenable the accounts as users claim them, trusting people to only claim accounts that were really theirs in the first place. Alternatively, all existing accounts would have to remain locked, and users would be forced to set up new accounts.

For static mirrors (no user editing) this is obviously not a problem.

I'm afraid that this is a fundamental problem that we have to live with.
imslp
Site Admin
Posts: 1639
Joined: Thu Jan 01, 1970 12:00 am

Post by imslp »

Actually, I don't mind the old accounts being permanently frozen; it would be necessary to protect privacy (and passwords, even though they are hashed with salts). As long as new accounts can be created that is fine.

The issue here is that several other tables actually use the user table (like page revisions), so I cannot just throw the user table away...
emeraldimp
active poster
Posts: 219
Joined: Tue Feb 27, 2007 9:18 pm
notabot: YES
notabot2: Bot
Contact:

Post by emeraldimp »

Well, one way would be to exclude the user table on a mysqldump and handle it seperately... Not sure exactly how you'd handle it separately, though; I didn't see a command in mysqldump to exclude a field.
imslp
Site Admin
Posts: 1639
Joined: Thu Jan 01, 1970 12:00 am

Post by imslp »

That is my concern too.

Any code, whether shell code or script code that can accomplish the goals I put forth in the first post would be much welcome.
emeraldimp
active poster
Posts: 219
Joined: Tue Feb 27, 2007 9:18 pm
notabot: YES
notabot2: Bot
Contact:

Post by emeraldimp »

Well... This should handle the exclusion part. Might take a while to run, though.

Code: Select all

mysqldump -u mysqluser -p password wikidb --ignore-table=wikidb.wiki_user > wikidb.sql
Now for the users...

Code: Select all

mysql -p password -u mysqluser testwikidb -X < userselect.sql | fgrep -v "<field name=\"user_password\">" | fgrep -v  "<field name=\"user_email\">" | fgrep -v "<field name=\"user_real_name\">" > users.xml
Where usersselect.sql is

Code: Select all

select  user_id, user_name, user_real_name, "" as user_password, user_newpassword, user_email, user_options, user_touched, user_token from testwiki_user;
Now we need to transform it...

Code: Select all

xsltproc usertransform.xsl users.xml > users.sql
Where usertransform.xsl is

Code: Select all

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" "xsl:version="1.0">
    <xsl:output method="text" encoding="utf8"/>
        <xsl:strip-space elements="*"/>
        <xsl:template match="resultset">
        <xsl:for-each select="row">
            <xsl:text>INSERT INTO wikidb_user (</xsl:text>
            <xsl:for-each select="field"><xsl:value-of select="@name" /><xsl:if test="position()!=last()">,</xsl:if></xsl:for-each>
            <xsl:text>) VALUES (</xsl:text>
            <xsl:for-each select="field">"<xsl:value-of select="." />"<xsl:if test="position()!=last()">,</xsl:if></xsl:for-each>);
    </xsl:for-each>
        </xsl:template>
</xsl:stylesheet>
Concatenate...

Code: Select all

cat wikidb.sql users.sql > backupwikidb.sql
Next we should probably compress it...

Code: Select all

tar cjf backupwikidb.sql.tar.bz2 backupwikidb.sql
And that should do it! (I think...)

You might have to change a couple things (obviously username and password, but also possibly the select statement; I have a rather old version of mediawiki installed).
imslp
Site Admin
Posts: 1639
Joined: Thu Jan 01, 1970 12:00 am

Post by imslp »

If anyone could test this on Mediawiki 1.11.0 it would be much appreciated :)

Also, I'm not sure if that is the extent of the user information or not; the official Wikipedia backup (http://download.wikimedia.org/enwiki/20071018/) shows several things as private...

On another note, would someone familiar with the XML backup method comment on the speed and restoration process? I would like to avoid it for speed reasons, but if it is the only way to produce an usable backup that does not contain private data, it may unfortunately be the only way...
Post Reply