TaggedTextFromRaw turns extra spaces into DC3 character

Developer · February 2, 2016

Hello fellow forum users!

I am having trouble with the TaggedTextFromRaw function turning extra spaces into the device control 3 character. The device control 3 character is also the HTML entity &-#19; (hyphen added, otherwise I cannot type the entity in the forum post even with the code or html tag).

The issue only happens when there are 2 or more adjacent spaces in a text string.

Is this a bug with the TaggedTextFromRaw function? Are any workarounds available to return spaces instead of the DC3 character entity?

Thank you in advance if anyone can help with this!

Screen shot and example code attached.

DC3.txt

Edited February 2, 2016 by Developer

Dan Korn · February 2, 2016

What actual problem is this causing you?

To FusionPro, and its tagged markup parser, the character 19 entity is not a "DC3" character, it's a special character that denotes a non-collapsing space. That's how the tagged markup parser knows that you really do want multiple spaces to be typeset; otherwise they would be collapsed to a single space as with other HTML-like parsers.

The bottom line is that the text should be typeset correctly in the output. Is the output correct? If so, why do you consider this to be a bug? If not, exactly how is it wrong?

Developer · February 2, 2016

Hi Dan, thank you for your reply! Please allow me to explain the situation I encountered. Basically, regular expressions do not recognize the character 19 entity as a space.

My objective was to format the first line of an address block for a mailing. There were 3 fields separated by spaces on the first line: FirstName, MiddleInitial, and LastName. When MiddleInitial was blank, a regular expression would delete the extra space between FirstName and LastName. However, the regular expression did not recognize the character 19 entity as a space, and therefore did not remove the extra space.

My original code looked like this:

myNameA = TaggedTextFromRaw(Field("FirstName") + ' ' + Field("MiddleInitial") + ' ' + Field("LastName"));
myNameB = '<uppercase>' + myNameA.replace(/  /g, " ") + '</uppercase>';
return myNameB;

Thankfully, in the time since I posted, I found multiple workarounds to the problem. Here is my new code:

myNameArray = [Field("FirstName"),Field("MiddleInitial"),Field("LastName")];
myNameSpaced = myNameArray.filter(Boolean).join(" ");
myName = '<uppercase>' + TaggedTextFromRaw(myNameSpaced) + '</uppercase>';
return myName;

Another workaround is to use an if-then statement to check if MiddleInitial is blank, and then use 1 or 2 spaces accordingly. A third workaround is to change the order of operations by taking 3 separate TaggedTextFromRaw functions and joining them with spaces, then running the result through the regular expression.

When I researched the character 19 entity, it appeared to be a device control character (http://unicode-table.com/en/0013/), so I thought this feature was a bug. FusionPro does render the entity in my document as a space. Thank you for explaining how FusionPro handles the character 19 entity!

Also, I did not realize that consecutive spaces would collapse into a single space when the "Treat returned strings as tagged text" option is set. That is definitely good to know! I will use the character 19 entity where multiple spaces are required in tagged text rules going forward.

Dan Korn · February 2, 2016

My objective was to format the first line of an address block for a mailing. There were 3 fields separated by spaces on the first line: FirstName, MiddleInitial, and LastName. When MiddleInitial was blank, a regular expression would delete the extra space between FirstName and LastName. However, the regular expression did not recognize the character 19 entity as a space, and therefore did not remove the extra space.

My original code looked like this:
myNameA = TaggedTextFromRaw(Field("FirstName") + ' ' + Field("MiddleInitial") + ' ' + Field("LastName"));
myNameB = '<uppercase>' + myNameA.replace(/  /g, " ") + '</uppercase>';
return myNameB;

It's not so much that JavaScript regular expressions don't recognize the entity "", it's that JavaScript doesn't recognize any HTML/XML-like entities at all. To JavaScript, they're quite literally just an ampersand, a pound sign, some digits, and a semicolon. Only FusionPro's tagged markup parser knows that a particular sequence of characters like that should be considered as a space. JavaScript also won't recognize   as a space, nor &#quot; as a double-quote, etc. The JavaScript engine literally has no knowledge of tagged markup. (Well, it does have some deprecated HTML helper functions such as String.bold, but it doesn't know how to parse the tags it generates.)

Thankfully, in the time since I posted, I found multiple workarounds to the problem. Here is my new code:
myNameArray = [Field("FirstName"),Field("MiddleInitial"),Field("LastName")];
myNameSpaced = myNameArray.filter(Boolean).join(" ");
myName = '<uppercase>' + TaggedTextFromRaw(myNameSpaced) + '</uppercase>';
return myName;

Yes, that's a much better approach. In general, it's best to deal with "raw" data as long as you can, and convert to tagged markup as late as possible, because it's hard to deal with the tags otherwise, as you have found out.

Another workaround is to use an if-then statement to check if MiddleInitial is blank, and then use 1 or 2 spaces accordingly.

Sure, although there's an even simpler way to accomplish what you want, without having to check the "Treat returned strings as tagged markup" box:

var myNameArray = [Field("FirstName"),Field("MiddleInitial"),Field("LastName")];
var myNameSpaced = myNameArray.filter(String).join(" ");
var myName = myNameSpaced.toUpperCase();
return myName;

Or, more succinctly:

 return [Field("FirstName"),Field("MiddleInitial"),Field("LastName")].filter(String).join(" ").toUpperCase();

Or:

 return ToUpper([Field("FirstName"),Field("MiddleInitial"),Field("LastName")].filter(String).join(" "));

A third workaround is to change the order of operations by taking 3 separate TaggedTextFromRaw functions and joining them with spaces, then running the result through the regular expression.

Well, again, be careful with this approach, because the tagged markup that TaggedTextFromRaw returns could still contain something that you don't expect, which could mess up your regular expression parsing.

When I researched the character 19 entity, it appeared to be a device control character (http://unicode-table.com/en/0013/), so I thought this feature was a bug.

All bugs are actually features!

Technically, you are correct that Unicode defines code point 19 as "Device Control Three." But Unicode only really includes "control characters" at code points less than 32 like this for completeness; they're holdovers from ASCII, from the old DOS and mainframe days. But there's no "character" there for a font to display as a glyph. Other than a few well-known control characters, such as 9 (horizontal tab), 10 (line feed), 13 (carriage return), and 32 (space), most control characters in the range zero to 32 don't really have any meaning for text or typesetting these days. At any rate, you could think of FusionPro as actually utilizing Device Control Three, and that its internal meaning for the particular "device" of FusionPro is that it's a non-collapsible space.

Actually, to be more technical, FusionPro treats character 19 as an "em space", which is supposed to represent the width of a capital M in the current font (although exactly how FusionPro typesets spaces between words is a bit more complicated than that). It also has special meanings for characters 18 (thin space) and 20 (en space). This is documented in the FusionPro Tags Reference Guide, at the start of the table of entities in the section titled "The Entity Definition File," which is on page 71 in my version.

One other bit of information: You might ask, why doesn't TaggedTextFromRaw simply generate   entities for consecutive spaces? Those would also be preserved by the tagged markup parser. The answer is that there's a somewhat subtle difference between how FusionPro's typesetter handles "regular" spaces (character 32) and other space characters such as "em" spaces (character 19). Specifically, the difference is in how these characters are handled at the beginning of a line. The en spaces (19s) are collapsed at the start of a line only, but preserved in the middle of a line, as spaces are usually handled by typesetters. (Think of is as how, even if you use two spaces after a period, you still don't get a space at the start of a line that starts a new sentence.) If you use all character 32s, then you'll see each of those take up horizontal space, even at the start of a line, which can "break" left-justified text. The latter way of typesetting with the character 32s is perfectly valid, but it's not what people usually want.

FusionPro does render the entity in my document as a space. Thank you for explaining how FusionPro handles the character 19 entity!

Sure.

Also, I did not realize that consecutive spaces would collapse into a single space when the "Treat returned strings as tagged text" option is set. That is definitely good to know! I will use the character 19 entity where multiple spaces are required in tagged text rules going forward.

There is another way you can represent multiple spaces, with the tag <space count=X>, where X is the number of spaces you want to represent (it defaults to one if the count attribute is not present). This is to help external XML parsers and generators, such as the one in .NET for people writing out tagged markup data from things like web-to-print applications utilizing FusionPro Server.

Edited February 2, 2016 by Dan Korn
Add explanation of the difference between 19 and 32 characters.

Developer · April 14, 2016

Thank you for the thorough response! I read it the same day it was posted, but forgot to reply with a thank you until now. I love the forum, I learn so much here!

The approach you outlined that does not require "Treat returned strings as tagged markup" to be checked works great! This is now my standard way of uppercasing lines in an address block.

That FusionPro uses the lower ASCII character space for its own functions makes sense to me now, especially since they would be going unused otherwise.

Following your advice, I searched for the table of entities in the FusionPro Tags Reference Guide, but couldn't find it. I also searched the other two help guides and couldn't find it. I do remember seeing this table in a previous version of FusionPro documentation, perhaps it was removed? I did find a note to look at the "entity.def" file in the installation directory and found the following definitions there. Looks like character 19 is an "En" space. Good to know!

         18          // thin space  1/6 of em space
           19          // en space   1/2 of em space
           20          // em space

Thanks again!

Sign In

TaggedTextFromRaw turns extra spaces into DC3 character

Recommended Posts

Developer

Link to comment

Share on other sites

Dan Korn

Link to comment

Share on other sites

Developer

Link to comment

Share on other sites

Dan Korn

Link to comment

Share on other sites

Developer

Link to comment

Share on other sites

Join the conversation

Community

Activity

FusionPro.com