Raige All American 4386 Posts user info edit post |
I have an app that people are constantly cutting and pasting from word. How do you guys deal with non standard characters such as microsoft words double quotes? Is there a specific way you capture those and convert to normal quotes?
This would also apply to any other non standard characters people cut and paste most often. Thanks! 4/11/2007 8:28:46 AM |
synchrony7 All American 4462 Posts user info edit post |
I'm not sure what you are asking exactly? Do you mean programatically and if so in what language? Are these Unicode characters or just regular ASCII characters outside the normal 0-127 range? 4/11/2007 9:35:04 AM |
Raige All American 4386 Posts user info edit post |
They are out of the normal range. You know the typical squares you can get from cutting and pasting text from msword and putting it into an html page? Those are the characters I'm speaking of.
I've solved my solution temporarily by using FCKEditor in all text fields but I was wondering if there was a simpler solution.
[Edited on April 11, 2007 at 9:41 AM. Reason : ! damn i can't spell] 4/11/2007 9:41:40 AM |
synchrony7 All American 4462 Posts user info edit post |
You can enable HTML forms to display unicode characters (Arabic and Asian characters from other types of keyboards for example) and these special characters are just a different character set. This explains it http://www.cs.tut.fi/~jkorpela/www/windows-chars.html 4/11/2007 9:51:32 AM |
Raige All American 4386 Posts user info edit post |
The concept is that I want to REMOVE non standard characters (IE: msword formated quotes etc) and replace them with standard versions. 4/11/2007 11:34:33 AM |
Raige All American 4386 Posts user info edit post |
btw very useful link. 4/11/2007 11:36:49 AM |
philihp All American 8349 Posts user info edit post |
you will have to define what the "normal" range is, and what "non-standard" characters are... and saying "non-standard characters are anything out of the normal range" doesn't count. 4/11/2007 11:59:02 AM |
scud All American 10804 Posts user info edit post |
You are about to open a pandora's box that you probably aren't ready for - there is no quick answer to your question. The only real answer is that you have to understand the intricacies of the codepages involved
Here are some okay starting points: http://en.wikipedia.org/wiki/Codepage http://en.wikipedia.org/wiki/Windows-1252 http://en.wikipedia.org/wiki/ISO/IEC_8859-1
I'm going to guess that you're pasting into a Java application and running into problems converting 1252 into UCS-2 4/11/2007 11:05:31 PM |
Raige All American 4386 Posts user info edit post |
^ That's what I thought. The only method I think if 99% surefire is using an FCKEditor text box for every single manually input field. I think this is overkill but the people using the tool want to cut and paste everything.
I appreciate the insight. 4/12/2007 9:43:24 AM |
mysteryegg Veteran 163 Posts user info edit post |
Bill Gates is your problem! no but seriously, Word's auto-formatting is what's introducing the characters you don't want. it's unfortunate... if you can't just stop people from inputting into Word... I'd be interested in seeing your solution 4/15/2007 6:07:42 PM |