Jump to content

UTF8 problems


brew

Recommended Posts

I have a customer that is supplying a UTF8 encoded txt file.

The problem is that the higher level UTF8 characters are not coming through as expected.

 

For instance, the a with the accent on it is represented differently than it is iso-8891.

 

Is there a function or something that could at least downconvert those higher level characters?

 

I've written a simple function in globals that will just do a replace for the higherlevel (I copy and paste what Fusion Pro is interpretting) and then use my charmap locally to paste in the replacement of what it should be.

 

Note: In my txtpad editor and other editing "notepads" the characters looks just fine, so I know it's not a problem with the file.

 

Help? I really don't want to have to play cat and mouse with adding on custom hack remapping to characters that I find are broken.

 

Thanks.

Link to comment
Share on other sites

Dan, I'm working on getting a file that's a bit more simplified, but I wanted to see if this sparks anything. Here is the "workaround function" that I'm adding to as I proof new stuff and I get the weird characters:

 

function fixtxt(thetxt) {

originalstuff = thetxt;

fixedstuff=ReplaceSubstring(originalstuff, "á", "á");

fixedstuff=ReplaceSubstring(fixedstuff, "’", "’");

fixedstuff=ReplaceSubstring(fixedstuff, "•", "•");

fixedstuff=ReplaceSubstring(fixedstuff, "“", "“");

fixedstuff=ReplaceSubstring(fixedstuff, "â€", "”");

fixedstuff=ReplaceSubstring(fixedstuff, "‘", "‘");

fixedstuff=ReplaceSubstring(fixedstuff, "ä", "ä");

fixedstuff=ReplaceSubstring(fixedstuff, "â„¢", "™");

fixedstuff=ReplaceSubstring(fixedstuff, "—", "—");

return fixedstuff;

}

Link to comment
Share on other sites

FusionPro 5.8 and newer can support a Unicode input data file, but the appropriate Unicode byte-order mark (BOM) must be present at the beginning of the file to denote which encoding it uses. For UTF-8, the file must begin with the the UTF-8 BOM, which consists of the byte sequence 0xEF,0xBB,0xBF.

 

See: http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

 

If the file does not have the initial BOM, then there's no way for FusionPro, or any other application for that matter, to know it's supposed to treat it as a UTF-8 encoded file.

Link to comment
Share on other sites

  • 6 months later...

Dan, we are using FushionPro Desktop 6.1 and have trouble importing UTF8-encoded data files. We use comma-delimited text files for variable data and if I include the BOM, it is treated as part of the name of the first column in the header.

 

I noticed in the .DEF file under DataDefDict>Assembler>DataSource there is "Encoding" parameter which is set to "default". Can this be changed to something to help with this?

 

Any advice would be appreciated.

Link to comment
Share on other sites

  • 3 weeks later...

Hi Dan I am having similar problems.

 

When you say "appropriate Unicode byte-order mark (BOM) must be present at the beginning of the file" where exactly would this go? Does this go in onrecord start or in the csv file or elsewhere?

 

Using FP Desktop 7.2 and having problems with french accents not coming through. Works well on PC but not on mac.

Link to comment
Share on other sites

When you say "appropriate Unicode byte-order mark (BOM) must be present at the beginning of the file" where exactly would this go? Does this go in onrecord start or in the csv file or elsewhere?

It needs to be part of the file itself. It won't be visible in most text editing applications.

Using FP Desktop 7.2 and having problems with french accents not coming through. Works well on PC but not on mac.

Someone needs to attach a file for me to look at. That's the only way I can help further.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...