Jump to content

Preserving Literal Text in a Plain Text Resource


dennis.wolfers

Recommended Posts

I need to bring in approximately 50,000 plain text resources (one per record), preserving leading space characters, blank lines, etc. so that the formatting remains intact when using a mono-space font (Courier). TaggedTextFromRaw() looks like the logical solution, but it creates entities for the elements of tags so that checking 'Treat returned strings as tagged text' results in the tags being returned as literal text rather than being interpreted as tags.

 

This is my code:

return TaggedFromRaw(Resource("Raw Bill").content);

 

Screen capture from preview and source text file attached.

 

What am I missing?

Sample Page without font command lines.txt

ScreenShot2015-04-23at12_46_49PM.png.284c2d27637877f8ae8efeef21ab9a3c.png

Edited by dennis.wolfers
Link to comment
Share on other sites

To handle the markup properly, check the "Treat returned strings as tagged text" box in the Rule Editor dialog. Or, remove the call to TaggedTextFromRaw().

 

But I don't think either of these is going to give you monospace text in the way you want, where each space "character" is the same width. FusionPro doesn't really do that.

 

The way to get things to line up in a tabular format, like in your plain text file, in FusionPro is to either use tabs (<t> tags) or a table. In this case, the plain text file probably needs to be parsed to re-generate it as a table.

Link to comment
Share on other sites

Thank you for your quick response!

 

I do have the "Treat returned strings as tagged text" box checked. I'll attach a capture of the output with it un-checked.

 

There is absolutely no way to convert the input files into tables, or any other automated formatting (as you'll see if you look at the text file). Remember that there are tens-of-thousands of these files, each unique!

 

I have a very convoluted work around, but I'm looking for an elegant solution.

 

Is there any way to fine-tune the TaggedTextFromRaw functionality so that it doesn't entity-ize the tags themselves?

 

Dennis

ScreenShot2015-04-23at1_40_35PM.png.b40a27f0dae2f77b37cba169947abaa2.png

Link to comment
Share on other sites

You can somewhat fake this since you're going to be using a mono-spaced font by converting all of the random entities that come over when you import the file into FP to non-breaking spaces (   ). I also converted all of the regular spaces to non-breaking spaces to make them line up. Here's an example ("Treat returned strings as tagged text" in the Rule Editor must be checked):

var page = CreateResource('./Sample\ Page\ without\ font\ command\ lines.txt')
return page.content.replace(/&[^;]*;|\s/g,' ');

 

Attached PDF of output.

text-import-example.pdf

Link to comment
Share on other sites

I do have the "Treat returned strings as tagged text" box checked. I'll attach a capture of the output with it un-checked.

Okay, well, if the file is a Plain Text File Resource, then the contents are automatically turned into tagged markup. That's really the only difference from a Tagged File Resource. So you're doubly-escaping the contents with entities by calling TaggedFromRaw. At any rate, simply using tagged markup is not going to do what you want.

Is there any way to fine-tune the TaggedTextFromRaw functionality so that it doesn't entity-ize the tags themselves?

No, the "tags themselves" are the result of that "entity-izing" having already been done to the Plain Text File Resource.

There is absolutely no way to convert the input files into tables, or any other automated formatting (as you'll see if you look at the text file).

Well, I don't agree that "There is absolutely no way." There actually is a way to convert those files to table markup, but it's something that has to be coded into a rule. Although Step's solution is a lot simpler than what I was thinking of.

Remember that there are tens-of-thousands of these files, each unique!

Okay, but if you come up with a rule to convert them, then it doesn't really matter how many there are. This is why we have computers, to write programs to automate repetitive tasks.

I have a very convoluted work around, but I'm looking for an elegant solution.

Step's solution is pretty elegant, although I have one quibble with it: Converting "all of the random entities that come over" can sweep up other non-space characters which may have been converted to entities, such as ampersands, quotes, and angle brackets. To guard against that, I would change the last line to this:

return page.content.replace(/(\d+|\s/g,' ');

But this is pretty clever by Step.

Link to comment
Share on other sites

Okay, well, if the file is a Plain Text File Resource, then the contents are automatically turned into tagged markup.

Ahh, I did not know that. I couldn't understand why the (what looked to me like) tabs were being converted to entities so I was just trying to get rid of all of them; but that makes much more sense now.

 

Step's solution is pretty elegant, although I have one quibble with it: Converting "all of the random entities that come over" can sweep up other non-space characters which may have been converted to entities, such as ampersands, quotes, and angle brackets. To guard against that, I would change the last line to this:

return page.content.replace(/(\d+|\s/g,' ');

But this is pretty clever by Step.

Thanks, Dan. And after learning how Plain Text File Resources are handled that solution seems much safer.

 

As I said in my original post, this solution only works because of the monospaced font and wouldn't work with a font like Helvetica. I am curious to see something like this formatted into a table, though. I wracked my brain trying to figure out how to tackle that one but came up short when trying to wrap my head around how to delimit cells and keep the formatting without editing the original text files.

Link to comment
Share on other sites

I just read the most recent posts. I really appreciate the thought you both have put into this quirky problem. As it turns out, I came to the same conclusions that you've suggested. Though I really need to learn to use regex, this is the code I came up with that gets the result I need:

 

var bill = ReplaceSubstring(Resource("Raw Bill").content, "", "*");

bill = ReplaceSubstring(bill, " ", "*");

bill = ReplaceSubstring(bill, " ", "*");

 

return bill;

 

P.S. I don't see documentation of entity #32; what is it?

P.P.S. I see that the browser converted my entities into space characters, making my code into nonsense, but I was replacing #19, #32, and regular spaces with #160 (non-breaking space) characters.

Edited by dennis.wolfers
Link to comment
Share on other sites

var bill = ReplaceSubstring(Resource("Raw Bill").content, "", " ");

bill = ReplaceSubstring(bill, " ", " ");

bill = ReplaceSubstring(bill, " ", " ");

 

return bill;

That doesn't seem like it would do anything. I think the vBulletin forum software is being "helpful" here, and converting entities in your post back to spaces. The way I've gotten around this is to change the point size of one of the characters in the entity, such as the # (pound sign). (You can see this in the markup of this post if you Quote it.)

P.S. I don't see documentation of entity #32; what is it?

The entity "&#32;" is a numeric entity, calling out out the character with ASCII code decimal 32, which is the space character. It could also be represented in hexadecimal as "&#x20;"

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...