PDF size limit

Advice and Help

Moderator: kcleung

nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

PDF size limit

Post by nysavoyard »

Hello, I'm new to the forum, although I've enjoyed downloading some favorites from IMSLP over the past month since reading about it in a newspaper article.

As I noticed a lack of a complete vocal score for Gilbert and Sullivan's "Patience", I am currently scanning a copy to submit as a contribution to IMSLP.

I am using an Epson scanner, set to 300 dpi black and white. I've tried smaller settings, such as 150 or 200 dpi, but 300 seems to preserve the quality best, which I'm sure will be appreciated by others. However, when complete, at 125 pages, the total scanned files will exceed the current 150 MB limit by 20-30 MB's. I'm not sure how I can reduce this when rendering it as a PDF file, so I'd appreciate any advice.

Also, the publication dates back to the 1880's, but I'm using a copy found in my local public library. Any issues there?

Thanks,

nysavoyard
KGill
Copyright Reviewer
Posts: 1295
Joined: Thu Apr 09, 2009 10:16 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by KGill »

That will make a great contribution :) The best way to reduce file size is to compress it using CCITT Group 4 (a compression standard used in TIFF-format image files). Many advanced image manipulation programs allow you to save files using this format; if you're using Windows, the best free one I can recommend is IrfanView. (ImageMagick works as well, if you're willing to delve into the command line - and it runs on any platform.) Note that this only works in monochrome (which obviously isn't a problem here, as you're already scanning in B/W). Using IrfanView, you would select 'batch processing' from the menu, select a directory containing the files to process, and (if memory serves right...I can't check as at the moment I'm on a Mac :wink: ) click on the 'advanced' button. This takes you to a large dialog box which allows you to specify colorspace (color, greyscale, B/W, etc.). After changing the appropriate settings, you can tell it to run the batch and it'll process all the images at once, very quickly. This compression will be retained when you put the images together into a PDF.
If the edition you're using does indeed date to the 1880s, then it is no problem if it comes from your local library. The copyright on it expired long ago, so it doesn't really matter where it came from (although I would personally recommend not indicating where it's from, just in case they find it here - librarians can be notoriously protective of their holdings).
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

which obviously isn't a problem here, as you're already scanning in B/W
Careful in your settings--an image that *appears* to only be composed of black and white may not necessarily be true black and white (monochrome, 1-bit, lineart, etc.). If you open the image in IrfanView and look at the information or properties for the image, it should say under both "Original colors" and "Current colors" 2 (1 BitsperPixel). If it doesn't, you have a form of grayscale. And +1 on the recommendation for CCITT 4 coding. Also, in your scanner settings be sure you're using hardware or "optical" 300dpi resolution and no software interpolated setting.
nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by nysavoyard »

Thanks so much. Actually, I am on a Mac also, so much of the technical information mentioned is foreign to me, as it seems to be directed to Windows platform users. I do have Acrobat Professional, Graphic Converter, and Photoshop CS 3 - which I believe enable one to batch convert multiple files. Could that be an option here? My understanding of compression formats remains sketchy, but ideally, I'd like to bring the complete file in under 100 MB to facilitate downloading, (less preferred would be to break it into 2 PDF downloads) and yet retain good quality.
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

Well, some learning curve is involved with any technical operation, but Acrobat Pro and Photoshop are enough to get the job done. You can view essentially the same info in Photoshop as to the image properties. The final file size @ 300dpi should be drastically under 100MB. If not, your image settings are less than optimal.
nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by nysavoyard »

OK, I've been able to reduce the 127 pp. of Patience vocal/piano score with dialogue PDF from 153 MB to 84MB. How can I submit this for inclusion in the IMSLP library?
thanks,

nys
kalliwoda
active poster
Posts: 504
Joined: Fri Dec 19, 2008 8:36 pm
notabot: YES
notabot2: Bot
Location: Berlin, Germany

Re: PDF size limit

Post by kalliwoda »

Something must be very wrong with how you get your pdf files compiled:
I am on a Mac too (Macbook Pro), with an old version of Photoshop (7.0), and use an older Epson scanner (Expression 1600 Pro). Pdf files that are bitmap b/w should come at 150-350k per A4 or letter page at 600dpi depending on complexity, if you have monochrome greyscale by defining a splitpoint, this would only increase slightly (250-500k). So expect about 30MB for 120 pages
For a 300 dpi scan at A4/letter this should be 75-200k per page, or about 10-15 MB for 120 pages

There are two options how to create a pdf file on a mac, you can print to pdf, using a mac algorythm that is similar to CCIT4, or you can save your file as "Photoshop pdf" - in my old 7.0 version this is close to CCIT4 without any options to choose, but newer versions of photoshop would allow you to select the compression algorythm.

If you reimport your pdf into Photoshop (a single page should just open without a dialog box to choose resolution), what are the image size parameters? If it is bitmap, I can only imagine, that you inadvertently increased image dimensions to far larger than letter/A4
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

Please upload a sample for us to examine.
nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by nysavoyard »

In Photoshop>Image Size, It's 8.5" x 11.697" Here's a screenshot:..well no, I can't even do that! :roll: when I clicked on Img in the post all it did was [img] [img]

btw I tried your suggestion and opened the reduced-sized PDF in Photoshop - and then tried to save as a Photoshop PDF. All I got was a 5MB file of the title page. (p. 1)

As far as algorithms, that's out of my depth...and that goes for that other formula you mentioned, CCITT 4 coding...

I could try batch converting via Graphic Converter. In my old PS Elements, which I was more familiar with, there was an option to do batch conversion. I can see if CS3 can do it as well.

Or, last resort, couldn't I just upload the 84MB file...if someone at your end would be kind enough to compress it properly?

As far as Acrobat pro. v. 8, it asked me if I'd like to make a PDF of all the scanned pages. I clicked yes. Then there was an option to reduce file size under the Documents menu.

anyway, thanks!
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

Please upload a few images to a hosting site like mediafire (http://www.mediafire.com/)
nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by nysavoyard »

I did - but I'm guessing I have to sign up for that? and Photoshop batch conversion 'Help' is beyond me...

I'm literally loosing my "Patience" here....:(

what's the point - I told you what the image size was...


all right, I'll do it tomorrow...i've wasted too much time on this today...rats!
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

mediafire is free to use. We cannot help you unless we know and can see what you're looking at.
nysavoyard
Posts: 13
Joined: Fri Apr 29, 2011 9:28 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by nysavoyard »

Thanks, sometimes it's better just to leave things and come back to them later. As it turned out, I finally stumbled across a fix to reduce the file size to 30.8 MB by submitting the first reduced pdf to Acrobat Pro and (I think) cutting the dpi down to 150. In effect, I was further reducing the reduction. It looks pretty good, too. I guess I'll be able to upload this pdf file to this forum via mediafire. Let's see if that works.

http://www.mediafire.com/?sty5yojpbza1c10


I'm assuming that if the above link is clicked on, you will be able to download the file from mediafire...please correct me if that's not so.

Aha, seems to work for me.

Before accepting this for IMSLP, please note that I had erased sections of certain pages with the library stamp on it - there were 4 or 5, total - but did not include in this file. At your suggestion, I can resubmit the file after replacing with these amended pages.
kalliwoda
active poster
Posts: 504
Joined: Fri Dec 19, 2008 8:36 pm
notabot: YES
notabot2: Bot
Location: Berlin, Germany

Re: PDF size limit

Post by kalliwoda »

No need to worry about the compression algorythms, these are used by your mac automatically, and by Photoshop too (at least in the older versions).

Your pdf opens as greyscale at 150dpi, but also displays strange halos around the objects. I think I realize the cause of your problems from the last post and a look at your pdf: Even if you have Photoshop and Acrobat Pro you are using another program (Epson scanner software?) to save your scan as jpeg with some lossy compression - I just tried the scan button on my Epson (rather than starting the scanner from Photoshop with the plugin), and it was preset to use Apples "Image Capture" which is really frustrating to use because of its limited options.
Another point: Once you have a pdf file, don't try to further manipulate the pdf file in Acrobat Pro to reduce its size. As was just recently discussed in another thread this decreases file size only if you reduce resolution, changes like "display in b/w" or erasing of material like library stamps leave the original pdf file intact and superimpose patches on it.

What you have to do is to make these changes in Photoshop (or any other image manipulation software), and then compile the page to pdf. Your Epson scanner software should also have come with plugins for Photoshop, that allow the scanner to be started from inside Photoshop for further image manipulation (under "import" in the file menu).

As a test, try to scan one page at 600 dpi lineart, (if your plugin does not work save as tiff and reopen in Photoshop), then crop if so desired and print to pdf (or: save as Photoshop - pdf). File size should be about 200k, and with a hugely improved resolution.

Everyone here knows how frustrating optimizing your setup can be, but once you have done the troubleshooting all your future scans will profit :)
Last edited by kalliwoda on Fri May 13, 2011 3:20 pm, edited 3 times in total.
daphnis
Copyright Reviewer
Posts: 1634
Joined: Thu May 17, 2007 7:15 pm
notabot: 42
notabot2: Human

Re: PDF size limit

Post by daphnis »

Ok, as Kalliwoda points out, and what I warned against, is making sure you're scanning settings are first correct. These images are in 8-bit color space (aka 256 shades of gray) and scanned at too low a resolution of 150dpi. First thing's first: Always use your scanner's optical/hardware settings to get 300dpi (no less) and black-and-white/1-bit/line art color space. Until you have images that Photoshop tells you are 300dpi and 1-bit, do not proceed.
I don't know how your Epson scanner works, but the best thing to do (if the software allows it) is to set up a scanning profile for each individual score. Create a profile that has the above settings and a desired scan surface area. From this point onwards, use that profile and have the software dump out each physical scan to one image; don't use multi-page TIFF output or PDF output. You want to be able to manipulate every individual image. Once you have that, download and install ScanTailor (http://scantailor.sourceforge.net/?q=en/node/3). Put all your TIFFs into a single folder, run ScanTailor, start a new project, and load them all up. Take some time to use the program; stock settings are fine. The 'play' buttons on the left side next to the sections are confusing somewhat. Basically all you need to press is, once images are loaded, 'split pages' (making sure to manually force one page), 'select content' (then adjust the purple boxes to contain all the printed elements), and 'output'. The resulting images have been despeckled, deskewed, and upsampled to 600dpi (if you so choose), and are ready to be compiled into a PDF.

Feel free to adjust your scanner settings and post another few images for us to check.

Good luck
Post Reply