horndude77 wrote:Great! I'm glad you built leptonica to make it easier for others to use. The README looks good. I'm sorry I don't have much time to review in depth right now.
A couple other observations from working with the script:
- Look at line 63 of clean_pdf.rb. I added a pause so that the files could be checked/edited before the images are compiled into a pdf. I found it useful on scores where removal didn't work well. (musicprog has been asking me to take a look for a while now. It's slower from what I understand, but I'm wondering if it will fix these problem files. I still need to do this.)
- I found the 'harp and others' volumes are somewhat problematic with this approach. Manual work is required in renaming files.
- The utf-8 characters don't seem to work well in the pdf titles. I haven't investigated this much.
Good luck!
- The ut8-8 problem causes me to ditch the file_mapper.rb and just use the original name of the file (minus .pdf) as the name of the work. (KISS principle). In the batch script, I re-group the music by work, each pdf file has its instrument name appended to the original stem. Yagan said that the copyright reviewers should be clever enough to work out the identities of the scores.
- Yes, there are some re-naming of the instrument folder names to keep the filesystem and the bash scripts happy (remove spaces and non-ASCII characters), but they only need to be renamed once for each volume, so even this is manual renaming, there is little work on the user's part.
- I noticed at later stages, you added a couple of extra patterns, they are really useful and so far I didn't see any wrong files so far, but if the users are paranoid, they can uncomment line 63 of clean_pdf.rb to allow manual checking.