Distributed Proofreaders

Messages from and Discussions about IMSLP

Moderator: kcleung

jhellingman
regular poster
Posts: 45
Joined: Tue Oct 23, 2007 10:00 am

Post by jhellingman »

emeraldimp wrote:[...] The workflow of Mutopia is a single 'typesetter' submitting a completed project, not a distributed group working on the same project. Again, similar to Project Gutenberg before DP.

Also, the project wouldn't have to use the Lilypond format as its storage/output. It could use something else - musicXml, for example - and translate into lilypond for rendering during the proofing, and provide musicXml + ly + whatever as the ultimate output.
The crux of such a distributed project is to keep the commitment needed to contribute under a certain, rather low threshold. Looking at music, this certainly means breaking up pages into single staffs (software might help here), and maybe even introducing various rounds to add information types. Round one concentrates on just getting the notes right, round two adds slurs and other decorations, round three adds text annotations, etc. A final round will be needed to sync everything up, and produce some final output. At this point, you may also have to deal with optimizing the notations for various print / sheet sizes, and produce neat PDFs for easy printing (if not automated in the site code where such things could be generated on-demand from the sources)

Note that with each round, you can render the partial result, to help people decide how accurate they have been.

Lilypond makes some effort to make its notation compact and easy to type, which makes it preferred above such things as musicXml, which makes it easier on computers. However, human time being so much more precious than computer time, we should look at the thing that works best for humans.

Of course, transcribed works cannot replace scans. They can supplement them, as being easier to read and search (especially if a specialized search engine could be made to find certain themes and tunes, even if transposed, etc.)

The distributed aspect is not that the server side is distributed, but that the effort of transcribing is distributed, to as many people as possible. Note that music notation, like math, is complicated, and has a smaller audience than plain English texts, so the throughput will not be 1000s of pages per day, as with PG Distributed Proofreading, which produces about 200 books per month.
emeraldimp
active poster
Posts: 219
Joined: Tue Feb 27, 2007 9:18 pm
notabot: YES
notabot2: Bot
Contact:

Post by emeraldimp »

jhellingman wrote:The crux of such a distributed project is to keep the commitment needed to contribute under a certain, rather low threshold. Looking at music, this certainly means breaking up pages into single staffs (software might help here), and maybe even introducing various rounds to add information types. Round one concentrates on just getting the notes right, round two adds slurs and other decorations, round three adds text annotations, etc. A final round will be needed to sync everything up, and produce some final output. At this point, you may also have to deal with optimizing the notations for various print / sheet sizes, and produce neat PDFs for easy printing (if not automated in the site code where such things could be generated on-demand from the sources)
Exactly: use music OCR software to get the gist (hopefully) and use human volunteers to ensure correctness. A certain amount of work will have to go into preparation anyway, though, to make sure each image (whether it's a stave or a whole page) is properly associated with the score and its location within the score - like you said, Feld, but hopefully completed before a particular score is even started.

Also, I'd want the output to be pre-generated & uploaded to a repository so that the system is used for processing scores and not serving them, but that's just my preference.
Note that with each round, you can render the partial result, to help people decide how accurate they have been.
I presume you mean apart from the staff-level presentation; good idea.
Lilypond makes some effort to make its notation compact and easy to type, which makes it preferred above such things as musicXml, which makes it easier on computers. However, human time being so much more precious than computer time, we should look at the thing that works best for humans.
Very true; I got excited there for a minute! Without a notation tool of some kind (ie, Sibelius), I would use lilypond throughout (unless someone has written a ly-to-MusicXml translator). But I think that a notation tool (in the browser or that could download snippets, as previously discussed) would be ideal anyway.
Of course, transcribed works cannot replace scans. They can supplement them, as being easier to read and search (especially if a specialized search engine could be made to find certain themes and tunes, even if transposed, etc.)
I know I've seen a 'music snippet search' before, but I can't find it right now. That could be a major undertaking!
The distributed aspect is not that the server side is distributed, but that the effort of transcribing is distributed, to as many people as possible. Note that music notation, like math, is complicated, and has a smaller audience than plain English texts, so the throughput will not be 1000s of pages per day, as with PG Distributed Proofreading, which produces about 200 books per month.
True - although I think Feldmahler was drawing a parallel *ahem* between the two. And he makes a good point, but the argument is more along the lines of SETI@home: even if the efficiency is much worse, the total effort will be greater, and so more will get done. For example, I haven't worked on transcriptions in a long time, mostly because it takes a while to get going, and then you want to keep going as long as possible, and I just don't have the time, so my projects flounder. If I could work on a single staff at a time, confident that, even if I didn't work on that project ever again, it would get done, then I would be able to actively contribute.
Carolus
Site Admin
Posts: 2249
Joined: Sun Dec 10, 2006 11:18 pm
notabot: 42
notabot2: Human
Contact:

Post by Carolus »

I was wondering how a distributed proofreader schema might work for detailed comparison of different editions. This could be very important for determining the validity of copyright claims on various editions that have been issued in the last 50 years, esepcially in separating interpretative editions which qualify as 'adaptations' under Canadian law (and thus subject to a full life plus 50 term for the editor) from urtext editions which lack sufficient originality to qualify for copyright status.

Another area where such a system could be useful would be with items like the scans presently available from Google: files could be broken up and processed with software like GIMP to remove logos, de-skew pages, clean up speckles, spots and written-in markings like fingerings, etc. and then reviewed before being put back together into a PDF for posting on IMSLP.
jhellingman
regular poster
Posts: 45
Joined: Tue Oct 23, 2007 10:00 am

Post by jhellingman »

Carolus wrote:I was wondering how a distributed proofreader schema might work for detailed comparison of different editions. This could be very important for determining the validity of copyright claims on various editions that have been issued in the last 50 years, esepcially in separating interpretative editions which qualify as 'adaptations' under Canadian law (and thus subject to a full life plus 50 term for the editor) from urtext editions which lack sufficient originality to qualify for copyright status.

Another area where such a system could be useful would be with items like the scans presently available from Google: files could be broken up and processed with software like GIMP to remove logos, de-skew pages, clean up speckles, spots and written-in markings like fingerings, etc. and then reviewed before being put back together into a PDF for posting on IMSLP.
I would start with an edition clearly in the public domain (published more than 50 years ago and all identified contributors more than 50 years dead in Canada), and transcribe it as-is. After that, somebody with an interest in the work could compare with a more modern edition, and determine if changes are substantial or not. However, if they are not substantial, why would you want to have them (unless they fix an apparent mistake, which you can then also fix in the copy already prepared), and if they are substantial, you cannot have them for copyright restrictions. Ofcourse, you keep any change in a version management system, so you can track who has mad what changes why.

After scanning some music, it is a matter of loading it in some graphics editor, setting a suitable size box, and quickly cut it to pieces, suitable for proofreading.

Having a searchable melody and tune database has another benefit. In the past, some musicians have been sued for infringement of copyright for just a few notes. If you can show these notes in a public domain work, it will be easier to throw such cases out of court.

Note that Greenstone links to a range of music libraries, some having a search interface for tunes: http://www.greenstone.org/examples

A very easy way to set up a distributed proofreading system (ad interim) is by using a wiki, as I have done here for the relatively simple notes found in an anthropological work: http://www.pgdp.net/wiki/User:Jhellingm ... own_Mexico

A more comprehensive experiment is going on on the Distributed Proofreaders wiki here: http://www.pgdp.net/wiki/The_English_Hy ... Experiment


For more durable solution, dedicated software may be needed.
jhellingman
regular poster
Posts: 45
Joined: Tue Oct 23, 2007 10:00 am

Post by jhellingman »

For everybody willing to play a little with a wiki with build in lilypond support, have a look here: http://www.wikisophia.org/wiki/Wikitex

(Also nice for Math, Chemistry, Chess, and a whole lot of other stuff. Just play around in their sand box).

That's very interesting. If one could align each scanned line with the corresponding Wikitex output, this would serve as a very handy candidate for such a project.
Post Reply