i've working legacy application , i'm trying work out difference between applications compiled multi byte character set , not set under character set option.
i understand compiling multi byte character set defines _mbcs allows multi byte character set code pages used, , using not set doesn't define _mbcs, in case single byte character set code pages allowed.
in case not set used, i'm assuming can use single byte character set code pages found on page: http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx
therefore, correct in thinking not set used, application won't able encode , write or read far eastern languages since defined in double byte character set code pages (and of course unicode)?
following on this, if multi byte character set defined, both single , multi byte character set code pages available, or multi byte character set code pages? i'm guessing must both european languages supported.
thanks,
andy
further reading
the answers on these pages didn't answer question, helped in understanding: about "character set" option in visual studio 2010
research
so, working research... locale set japanese
effect on hard coded strings
char *foo = "jap text: テスト"; wchar_t *bar = l"jap text: テスト"; compiling unicode
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2
compiling multi byte character set
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2
compiling not set
*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2
conclusion: character encoding doesn't have effect on hard coded strings. although defining chars above seems use locale defined codepage , wchar_t seems use either ucs-2 or utf-16.
using encoded strings in w/a versions of win32 apis
so, using following code:
char *foo = "c:\\temp\\テスト\\テa.txt"; wchar_t *bar = l"c:\\temp\\テスト\\テw.txt"; createfilea(bar, generic_write, 0, null, create_always, file_attribute_normal, null); createfilew(foo, generic_write, 0, null, create_always, file_attribute_normal, null); compiling unicode
result: both files created
compiling multi byte character set
result: both files created
compiling not set
result: both files created
conclusion: both a , w version of api expect same encoding regardless of character set chosen. this, perhaps can assume character set option switch between version of api. a version expects strings in encoding of current code page , w version expects utf-16 or ucs-2.
opening files using w , win32 apis
so using following code:
char filea[max_path] = {0}; openfilenamea ofna = {0}; ofna.lstructsize = sizeof ( ofna ); ofna.hwndowner = null ; ofna.lpstrfile = filea ; ofna.nmaxfile = max_path; ofna.lpstrfilter = "all\0*.*\0text\0*.txt\0"; ofna.nfilterindex =1; ofna.lpstrfiletitle = null ; ofna.nmaxfiletitle = 0 ; ofna.lpstrinitialdir=null ; ofna.flags = ofn_pathmustexist|ofn_filemustexist ; wchar_t filew[max_path] = {0}; openfilenamew ofnw = {0}; ofnw.lstructsize = sizeof ( ofnw ); ofnw.hwndowner = null ; ofnw.lpstrfile = filew ; ofnw.nmaxfile = max_path; ofnw.lpstrfilter = l"all\0*.*\0text\0*.txt\0"; ofnw.nfilterindex =1; ofnw.lpstrfiletitle = null; ofnw.nmaxfiletitle = 0 ; ofnw.lpstrinitialdir=null ; ofnw.flags = ofn_pathmustexist|ofn_filemustexist ; getopenfilenamea(&ofna); getopenfilenamew(&ofnw); and selecting either:
- c:\temp\テスト\テopenw.txt
- c:\temp\テスト\テopenw.txt
yields:
when compiled unicode
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2
when compiled multi byte character set
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2
when compiled not set
*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2
conclusion: again, character set setting doesn't have bearing on behaviour of win32 api. a version seems return string encoding of active code page , w 1 returns utf-16 or ucs-2. can see explained bit in great answer: https://stackoverflow.com/a/3299860/187100.
ultimate conculsion
hans appears correct when says define doesn't have magic it, beyond changing win32 apis use either w or a. therefore, can't see difference between not set , multi byte character set.
no, that's not way works. thing happens macro gets defined, doesn't otherwise have magic effect on compiler. very rare write code uses #ifdef _mbcs test macro.
you leave helper function make conversion. widechartomultibyte(), ole2a() or wctombs(). conversion functions consider multi-byte encodings, guided code page. _mbcs historical accident, relevant 25+ years ago when multi-byte encodings not common yet. using non-unicode encoding historical artifact these days well.
Comments
Post a Comment