Jump to content

Consolidating PDFs into Optimized PDF


farns

Recommended Posts

Good morning,

 

I am using a modified version of the script I found here:

http://forums.printable.com/showthread.php?t=37&highlight=insert+pdf

 

to do exactly what I need for my project. It works wonderfully, and takes a number of small PDFs and consolidates them into one file that is much easier to deal with in production.

 

However, we're starting to get a larger number of these files, and I've noticed that the composed PDF is many times larger than the parts going into it. For example, yesterday's run consisted of 40 variable data PDFs (about 2600 12x18 pages), totaling just over 2 GIG. The output PDF after running through the above script was 21 GIG. More than 10x inflation.

 

If I manually consolidate the files, by opening the first one, and then adding the other 40 or so after that, and save that file, I see a dialog box that tells me it's consolidating backgrounds, removing duplicate fonts, etc.

 

The resulting file size from this manual process weighs in at at only 1.17GB, reduced down even further from the sum of the parts going into it. I assume that is because there are items that can be consolidated within the entire group, that may exist in duplicate among the separate files.

 

Is there a setting in composition that I can't find that will allow this optimization to take place? I'm really struggling that I can't find anything to help me keep these file sizes down. What am I missing?

 

Farns

Link to comment
Share on other sites

Also... I'm doing this as PDF to an Indigo 7600. I know nothing about PPML and other such formats. I used to make VDX for a NexPress, and loved that format, but I don't know if there are other formats that FP can make, that may be more efficient for the Indigo. I'm open to any ideas there as well.
Link to comment
Share on other sites

Farns

I don't know anything about optimized settings for pdfs but as far as the PPML vs pdf, this is what we found out. It's been awhile since we used PPML files for our HP 7600, but we had color issues between a PPML and a pdf. We would output pdfs for proofs so we and the client could cycle through them on screen and as hard copy proofs, then write PPML files for the final print run and there was color differences between the 2 files. Like I said it's been awhile, about 2 years, since we have looked into the PPML files. It might be better now but we can't take that chance and we like to be able to cycle through pdfs to see the versioning, PPML files you can't open that way.

Link to comment
Share on other sites

That is good to know about the PPML format. I too like to be able to look at the print files that I need to send, and the preview engine on the 7600 is junk. PDF has always been my favorite, so I'm hoping we can get this figured out. I have been searching through everything I can get my hands on and I'm not getting anywhere. But as with most issues, I'm not sure I know the right keywords to search documentation for. Frequently, I find that it's right there in the docs, but under keywords I would have never suspected.
Link to comment
Share on other sites

My guess would be that since you're pulling in the PDFs as graphics into FusionPro, they are viewed as such and cannot be compressed as they could be in Acrobat. You could try making a JLYT or PPML file out of FusionPro but I think you'll run into the same size issue.

 

I think the best way to reduce file size would be to bypass FP all together for this step and just combine the PDFs in Acrobat. Using "File > Create > Combine Files into a Single PDF" and dropping all of the PDFs you wish to combine into that window.

 

Just a note about the Indigo: you can preview the imported files on screen or you can create a PDF of the job for better resolution/viewing by right-clicking on a job and clicking "composition preview"

Link to comment
Share on other sites

I am bringing them into a variable text frame via that script I linked at the top of the post, so that actually makes sense, that somehow that makes it so acrobat cannot properly optimize them. When I get today's batch of data I will try the other formats to see what sizes I come up with.

 

The reason bypassing FP and doing in acrobat is no good, is we're talking hundreds of files per day, so it has to be automated, and it has to be accurate. Handling the files individually is not an option.

 

The composition preview is not up to my liking either, as it's just low rez raster preview, unless there's a setting to retain vector quality that I do not know about.

 

I'll report back on the different output types later this morning, this one really has me curious now.

 

Are there other methods for bringing in PDFs into a document other than the way PTI teaches in the original post?

Link to comment
Share on other sites

I am bringing them into a variable text frame via that script I linked at the top of the post, so that actually makes sense, that somehow that makes it so acrobat cannot properly optimize them. When I get today's batch of data I will try the other formats to see what sizes I come up with.

 

Even though you're bringing them into a text frame, they are still being pulled in as graphics using a graphic tag.

 

The reason bypassing FP and doing in acrobat is no good, is we're talking hundreds of files per day, so it has to be automated, and it has to be accurate. Handling the files individually is not an option.

 

Another option would be to set up a hot folder scenario and use a tool such as pdftk to combine the PDFs from the command line using this syntax:

 

pdftk old1.pdf old2.pdf old3.pdf cat new.pdf

 

Then, you could use Watchdog to run the script on a specific folder every 24 hours.

 

The composition preview is not up to my liking either, as it's just low rez raster preview, unless there's a setting to retain vector quality that I do not know about.

I hear ya on that but sometimes I find it easier to look at in acrobat than in the tiny window on screen (despite the rasterization).

 

Are there other methods for bringing in PDFs into a document other than the way PTI teaches in the original post?

Yes, there are other ways to import PDFs into a document (using a graphic frame for example), but ultimately, you'll still be importing a graphic into your template which will result in a large file size.

Link to comment
Share on other sites

My guess would be that since you're pulling in the PDFs as graphics into FusionPro, they are viewed as such and cannot be compressed as they could be in Acrobat.

That's not accurate. The PDF Library we use for composition is the same one that Acrobat uses. However, you are correct that there can be different kinds of compression applied to PDF documents. But I can also only speculate as to what's going on without seeing the files.

Link to comment
Share on other sites

Well I was able to make files in a couple different formats... VDX was same size, JLT was 2x the size, both of which were within my expectations. I tried PPML and another one I was unfamiliar with, both crashed my machine, but I don't think I wanted to pursue those anyhow, since I want to be able to look at the PDFs as mentioned above.

 

I will look at the PDFTK option you mentioned, if there's a way to script that with variable filenames, perhaps it will be an adequate replacement for FP on this particular process.

 

Dan, I'd love to show you the files, but these are ones I have to work on in an encrypted volume that's locked inside our datacenter. One of those kinda jobs :-) I realize this will hinder my ability to get some help on this as well.

 

For most days, this isn't a problem. Under 1000 orders, it's usually no sweat. But when we have a 5 or 10,000 day, that's when it become a challenge. I was mainly concerned that there might be some optimization settings I was not aware of, that would allow these PDFs to stay a proper size, but it sounds like that isn't the case.

 

Dan, I"m just using the script I linked to in the first post, with some very minor modifications to help me get through a variable-named path to get to the files needing to be consolidated. And the basic composition settings I've used my whole life, which I've never really studied in great depth. So I was hoping there was something easy I was just overlooking.

Link to comment
Share on other sites

Dan, I'd love to show you the files, but these are ones I have to work on in an encrypted volume that's locked inside our datacenter. One of those kinda jobs :-) I realize this will hinder my ability to get some help on this as well.

Can you send the files to Support? Or create some dummy files (without the "secret" content) that reproduce the problem and either post those here or send them to Support?

For most days, this isn't a problem. Under 1000 orders, it's usually no sweat. But when we have a 5 or 10,000 day, that's when it become a challenge. I was mainly concerned that there might be some optimization settings I was not aware of, that would allow these PDFs to stay a proper size, but it sounds like that isn't the case.

Hold on. You mean you're adding 10,000 pages from external PDFs? Oh, actually, I see that in your initial post, you said that "yesterday's run consisted of 40 variable data PDFs (about 2600 12x18 pages)."

 

I can't say that I've ever tried to add anywhere near that number of pages. The "Inserting Multi-page PDF variable graphics" example job here on the forum was not put together with that kind of volume in mind. Also, each of the pages of the PDFs you're bringing in is handled as a graphic (an inline graphic in a text frame in the case of this example, but a graphic nevertheless), and FusionPro itself was not designed with the idea of optimizing an output file with tens of thousands of graphics in mind. It can indeed generate an output file, and that output file should print correctly, but it's never occurred to us here at PTI to try to optimize FusionPro for what you're trying to do.

 

I guess my question to you is: Why do you need to collect an entire day's "run" into one single output file? If you're going to examine your workflow to optimize it, I would start there before looking at other PDF manipulation tool kits.

Dan, I"m just using the script I linked to in the first post, with some very minor modifications to help me get through a variable-named path to get to the files needing to be consolidated. And the basic composition settings I've used my whole life, which I've never really studied in great depth. So I was hoping there was something easy I was just overlooking.

Well, maybe the setting to change is the "Output to multiple files" box on the Output tab. Or, don't try to compose your entire day's jobs in a single composition run in the first place.

 

Also, if you have FusionPro VDP Producer API (FP Server), as your profile says, then there are other, more optimal ways that you could build up an output file from all those other PDF files, without having to do it all with JavaScript rules and inline graphics on inserted pages like in the example here on the forum. With the Producer API, you have the ability to modify the layout (format) of the job (in the DIF file), and effectively have a template with variable layout, not just variable content.

 

So if you really do need to have FusionPro generate a single output file with tens of thousands of pages from other PDFs, it might be better to write some code that calls the DIF Control API or the Producer API to generate a DIF file with a graphic frame on a Body page for each PDF resource page. Although, again, FusionPro isn't really designed to optimize an output file with tens of thousands of unique graphics.

 

For that matter, you might even be able to accomplish what you need with something like JDF, by building up a job ticket to reference all the other PDFs. But that's outside the scope of this forum.

 

At any rate, it sounds like you're trying to do this trick with FusionPro to work around some other issue in your workflow. Why is it so much easier to deal with one giant daily file in production? That's the question you should be asking.

Link to comment
Share on other sites

Dan,

 

sorry for the lateness in my reply. Been buried with other production projects the past few days so I'm just now picking up the development stick again.

 

Lemme try to get all your question answered:

 

1. I can certainly try to mockup a dummy file and send to support, but I think at this point I am going to have better luck solving this particular problem outside of FusionPro. Somebody suggested PDFtk, and I gave that a quick try yesterday and had awesome results. So I"m gonna pursue that path for a bit, and then return to this if I need to.

 

2. Sorry if I was not clear before on page counts. No, not 10,000 pages... on a big day we get 10,000 orders, but they are spread across several products types. I do not combine all together, I combine all for one type together. Brochure, Flier, etc. etc. So my biggest component yesterday was 2600 PDF pages. There are a number of reasons for this consolidation. Partly because of the way we get the files from our client, partly for improved speed in our finishing processes, and partly for ease of accountability. This project in unique in many ways to the work I've done my entire career. But you're right, I may be asking too much of FusionPro Creator. I need to update my profile I guess - I though I had... I had a very old version of Producer (DL-1000 v3) at my last employment. I'm at a new shop now, that is driven by a different VDP platform, but it has some gaps. FusionPro was filling those gaps nicely. We've talked with Terry about getting API, and we may still in the future, but we don't have it or have immediate plans to invest yet.

 

At any rate, this discussion has been very valuable, it has helped me understand 2 or 3 parts of the puzzle a little bit better. I'm gonna pursue the PDFtk option this week and revisit this if needed, but I think for now we can call this one closed. I appreciate everybody's input!

 

Farns

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...