Jump to content

The CSV automatically converts to Scientific Notation


Recommended Posts

The data contains multiple zeroes that the CSV converts to Scientific Notation. Is there a way to prevent the CSV from doing that, or is there a rule I can set up to have it automatically convert within my resources?

 

 

Highlight the column, right-click, format cells, select "text."

 

Highlight column again, format cells, custom, enter #0000000000< or how many leading zeroes you have.

 

Save as: .csv. Tell it YES on everything after closing.

 

Is this a "code" the customer is supplying or just a sequence number?

Link to comment
Share on other sites

actually, the numbers occur after, but it would be good to keep any zeroes present in the beginning.

 

It's a membership number, and the zeroes are variable (some have 5, some have 7, etc).

 

I figure I could just go in and manually format to numbers so that it reads true. But, what I'm looking for is to reduce the number of steps the data has to go through so that there's less room for error.

Link to comment
Share on other sites

I'm not sure the best way to do that. Here's an example of what it would look like in the .csv

 

Row 1: MEMBER_ID_1

 

row 2: 3182010000

row 3: 701242000000

 

etc. The numbers vary between 10 and 14 digits. The zeroes also change.

 

But within the FusionPro doc, it winds up just being 318201E+4 or whatever.

Link to comment
Share on other sites

 

But within the FusionPro doc, it winds up just being 318201E+4 or whatever.

 

 

Highlight column, right-click, format cells, custom, Just put the "#" symbol in up to 14 times and save.

 

Re-link to the data files and refresh your preview.

 

I would do a few different values to see if there is any "spacing" issues throughout the run. Maybe someone will chime-in to solve via-a-rule as my way is a band-aid.

Link to comment
Share on other sites

I haven't experienced this when bringing csv files into FusionPro. But I have when I open the file in Excel first then save it. If this is part of your process, try using the csv file directly from your customer to test whether or not FusionPro is the program that is actually converting the numbers. If you have to open the csv file in Excel before using it in FusionPro, change the file's extension to ".txt" then use the "Open" in Excel to "import" the data. When you work through the steps, designate the number fields as "text". This should keep Excel from converting them to scientific notations.
Link to comment
Share on other sites

Thanks for the input. I downloaded directly from their FTP, so someone must have opened it on their end. In any case, I think (hope) future files should be okay. But that adds another step to my end if I have to check every time to make sure.

 

Like I said, I don't really have a problem with going in manually and changing it, but those extra steps could cause problems down the line. That's why I was wondering if there was a JS function.

Link to comment
Share on other sites

The real problem here is in using Excel to manage the data. If you're expecting strings of digits and Excel is munging them, that's yet another in the long list of reasons not to use Excel as a database.

 

That said, you can try something like this to parse the scientific notation:

function ParseScientificNotation(num)
{
   var arr = num.match(/(-?(?:0|[1-9]\d*)(?:\.\d*)?)(?:[eE]([+\-]?\d+))?/);
   return (arr.length > 2) ? arr[1] * Math.pow(10, arr[2]) : num;
}

Adapted from: http://stackoverflow.com/questions/638565/parsing-scientific-notation-sensibly

 

CAVEAT: This function has not been rigorously tested! Use at your own risk.

Link to comment
Share on other sites

Is there a better database system that you can recommend? I'm not usually on that side of the fulfillment equation.

Well, just about any database is a better database than Excel, since Excel isn't really a database at all, it's a spreadsheet application. Now, Excel has unofficially become the "World's Most Popular Database" because people use it that way, but that doesn't change the fact that it can damage your data.

 

Instead of preserving the data the way you enter it like a real database, or even a simple plain text editor, Excel wants to format your data for its own display purposes. This is great if you're actually using it as a spreadsheet, but if you really want your data to be preserved, it's simply the wrong application, because the formatting it does can actually cause you to lose data, many times in subtle ways. You've already discovered one of those ways; others include dropping delimiters for empty columns, allowing embedded newlines (a big no-no for delimited text files), munging text which it thinks represents dates, and poor support for Unicode, all of which can be problematic for editing data files intended for use by FusionPro. And once data is lost or otherwise modified, you can use functions like the one I posted, or tricks in Excel itself like Brad Mather and rpaterick suggested, to try to recover it, but you're never going to be able to do that 100 percent accurately.

 

In addition, the CSV file format introduces many potential ambiguities because commas in the data have to be escaped by enclosing the entire field in quotes, which requires quotes to be escaped, which all ends up being very confusing and ambiguous. FusionPro's delimited text parser has logic to deal with these escapes in the way that Excel generates them, and this works great as far as I know, but I still recommend using tab-delimited data to remove as much chance of ambiguity as possible. (The non-printing tab character is generally not valid as data anyway.)

 

So, your best results are going to come from using tab-delimited data as the input to FusionPro. Exactly how such a tab-delimited data file for FusionPro should be created is a bit beyond the scope of what I can recommend. That said, my impression is that most high-end enterprise customers with large data sets dump data from an actual database system to generate a tab-delimited file, and that this is a generally standard practice for data-centric tools such as FusionPro Server throughout many industries. Examples of commonly-used databases include SQL Server, Oracle, Microsoft Access, and MySQL, just to name a few.

 

Alternatively, if you can't or don't want to use a database application, you can always maintain the tab-delimited file directly and edit it in any plain-text editor without opening it in Excel. A Google search for "tab delimited file editor" turns up several free alternatives as well. It's all just text in the end, and the fewer filters you run it through which try to "helpfully" change the format of the data, the better.

 

(I should add that FusionPro can also take data directly from databases and database-like sources via ODBC, but that's on Windows only, and doesn't currently support Unicode, and it won't always help if the data has already been converted by a tool such as Excel anyway. And of course, FusionPro accepts tagged markup input files, but those are rarely used these days and have their own bit of complexity in terms of escaping certain characters, so I don't really recommend that for most users.)

 

I also realize that many of you here on the forum are print service providers who simply take data from clients, and you don't manage it at all, but you're still expected to make sense of it and even cleanse it. All I can tell you is to echo my warnings about using Excel to your customers.

 

Anyone else want to share how they manage their data? Maybe I'll set up a poll.

Link to comment
Share on other sites

Thanks for the input. I downloaded directly from their FTP, so someone must have opened it on their end.

 

So if you were to request it from the customer to not have to open it, that may solve the issue as Dan just pointed out in his excel explanation.

 

I have requested to customers in the past to submit the file so I do not have to open it. When I have to open it, I'm liable for their data, especially when $ is involved in the campaign(bank loans, approval loans, etc...).

 

When you open the excel file from FTP, do you see the 318201E+4? Your customer has to be seeing this as well, I would imagine?:confused:

Link to comment
Share on other sites

Thanks, Dan. That was very thorough!

 

I will have to investigate database options, but that gives me a great start.

 

In regards to the client awareness: we are still in the testing phases of the development, so I believe what we were given was a sample of past data. I guess during that time it was opened and re-opened before it got passed to the FTP for me to download. It has been made aware to our main contact with the client, and will be brought up during the meetings leading up to bring the project live.

 

Consequently, our own dummy file worked fine; I put it in a folder marked "DO NOT OPEN" as an extra precaution.

Link to comment
Share on other sites

Dan, great post about Excel. I try to convince customer service and customers whenever I can to avoid it. Of course it's pretty futile until a problem arises. One of the biggest is when a customer had sorted the list in excel but didn't select all the columns. Usually it takes a demo to show why their mailing went out with the wrong information with the wrong record. Tab delimited is what I always recommend to receive files and what we always use for input into fusionpro.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...