Jump to content

Does FusionPro have a size limit for external data files?


Ryan_B

Recommended Posts

I have an external data file that I'm using to build a table for end of year transaction statements. It is a little over 1.5 gb. Every time I try and compose the template, I receive the following error:

FusionPro has encountered a fatal error and must abort.

Please shut down Acrobat and start again.

If the problem persists, please call PTI Technical Support.

Does FusionPro have a size limit for external data files?

Does anyone know of a solution for this?

Link to comment
Share on other sites

There's no hard and fast limitation programmed into FusionPro, but everything does have its limit. The thing is that, unlike the primary data file, which FusionPro can read portions of as it goes, FusionPro needs to read the entire external data file into memory in order to allow random access.

 

It's hard to make a suggestion about how to make the job work without knowing more specifics.

Link to comment
Share on other sites

I've collected my job files to show you. However, even compressed, the folder is almost 400 MB. Max file size for a zipped attachment is 42.92 MB. How can I get you what you're looking for? Is there part of the collection in particular that you're trying to take a look at?
Link to comment
Share on other sites

I've collected my job files to show you. However, even compressed, the folder is almost 400 MB. Max file size for a zipped attachment is 42.92 MB. How can I get you what you're looking for? Is there part of the collection in particular that you're trying to take a look at?

Do you have access to DropBox or some other cloud storage where you can post the files? I guess what I mainly want to see is the external data file itself, and which fields you're using from it, to hopefully be able to figure out what about it can be reduced, such as duplicated or unused entries.

 

Does that 400 MB include the external data file? If so, that suggests there's something about it which can be reduced.

Link to comment
Share on other sites

The external data file is created via a console application I wrote using a stored procedure that returns only what I need for the table in my template. Every row within the external data file is a separate transaction that needs to be outputted to the table. Therefore, it is unfortunately already trimmed down as much as it can be.

Does that 400 MB include the external data file? If so, that suggests there's something about it which can be reduced.

Yes, it does. Compressing just the external data file still produces almost a 350 MB zip.

 

I've placed the external data file out on our file sharing site here:

http://docs.henrywurst.com/it/table.txt

 

I can share the template with you there as well if you think it'd help any, but unfortunately I cannot share the primary data file due to privacy reasons, e.g., addresses, phone numbers, etc.

Link to comment
Share on other sites

Okay, thanks. I downloaded the file.

 

I'm trying to open it up to look at it and see if there's any way it can be reduced or broken up into more manageable chunks, but frankly, I'm having a hard time doing that. Notepad2 hangs trying to open it, while Notepad++ simply says it's too big. I was finally able to open it in Visual Studio, which is a development (programming) application, but it took a while. (And it's quite a burden on my system to use that much RAM, so my whole computer is running pretty slowly with that file open.)

 

So I think that the problem isn't just that the text file is too big for FusionPro to read as an XDF. It's that it's really too big for a text file, period.

 

I have seen many external data files, with a lot of data in them, and I've even seen jobs that use four different XDFs. But the next largest XDF, after this one, that I've seen was a couple hundred megabytes in size, for about 60,000 records (lines). Your file is three times that big in terms of file size, for its 18 million (!!!) records of data.

 

Therefore, I think the answer to your question is that the practical limit on the size of an external data file is somewhere around 500 megabytes, or half a gigabyte.

 

Here's what I suggest: Don't dump all 18 million lines of data from the database to a single 1.5 gigabyte tab-delimited data file and try to use it as an XDF. Instead, break it up into several data files for ranges of account numbers (presumably the MdbId field is the key/account field). Since the data file has account numbers up to about 116 million, I would try breaking it up into 12 files, one for each range of 10 million account numbers. So you would have a file named something like "0-10-mil-table.text" for account numbers 1 through 10,000,000, another file named "10-20-mil-table.text" for accounts 10,000,001 through 12,000,000, and so on.

 

Then you'll need to modify the FusionPro rule just a bit to access a particular data file based on the account number key it's looking up. Something like this:

var account = Field("Account"); // whatever data field you're keying off of
var accountsPerFile = 10000000; // ten million
var accountFileNum = Int((account - 1) / accountsPerFile) + 1;
var accountFNum10 = accountFileNum * 10;
var XDF_name = (accountFNum10 - 10) + "-" + accountFNum10 + "-mil-table.txt";
var XDF = new ExternalDataFileEx(XDF_name);
// Your existing code to access the XDF here...

The other thing I recommend is chunking the FusionPro output to multiple files, if you're not already doing that. You might also consider running multiple compositions for ranges of account numbers as well.

Link to comment
Share on other sites

Another thing you might want to try is the new multi-line record feature in FusionPro 10. Actually, this seems like exactly the kind of job for which that feature was designed.

 

Presumably your primary data file is shorter, and has one record (line) with each account number, where each of those primary records/accounts maps to multiple records in the XDF. (We call this a "one-to-many" mapping in computer speak.)

 

As noted in the FusionPro 10 release notes, and detailed in the FP 10 User Guide, with the new multi-line record support, you can put all of the data into one file, and specify that a new primary record should be handled when the account number changes. Then, in your rule, instead of looking up matching lines in an XDF, you can iterate through the sub-records for that primary record by calling FusionPro.GetMultiLineRecords(), which returns an ExternalDataFileEx object with the same methods you're using now.

 

Using multi-line records like this allows FusionPro to read in only the part of the data file that it's using for the current record into memory, instead of the entire file, so you shouldn't face any data file size limitations. This is what I recommend.

Link to comment
Share on other sites

  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...