c++ - Visual Studio Character Sets 'Not set' vs 'Multi byte character set' -

i've working legacy application , i'm trying work out difference between applications compiled multi byte character set , not set under character set option.

i understand compiling multi byte character set defines _mbcs allows multi byte character set code pages used, , using not set doesn't define _mbcs, in case single byte character set code pages allowed.

in case not set used, i'm assuming can use single byte character set code pages found on page: http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx

therefore, correct in thinking not set used, application won't able encode , write or read far eastern languages since defined in double byte character set code pages (and of course unicode)?

following on this, if multi byte character set defined, both single , multi byte character set code pages available, or multi byte character set code pages? i'm guessing must both european languages supported.

thanks,

andy

further reading

the answers on these pages didn't answer question, helped in understanding: about "character set" option in visual studio 2010

research

so, working research... locale set japanese

effect on hard coded strings

char *foo = "jap text: テスト"; wchar_t *bar = l"jap text: テスト";

compiling unicode

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2

compiling multi byte character set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2

compiling not set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == shift-jis (code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == utf-16 or ucs-2

conclusion: character encoding doesn't have effect on hard coded strings. although defining chars above seems use locale defined codepage , wchar_t seems use either ucs-2 or utf-16.

using encoded strings in w/a versions of win32 apis

so, using following code:

char *foo = "c:\\temp\\テスト\\テa.txt"; wchar_t *bar = l"c:\\temp\\テスト\\テw.txt";  createfilea(bar, generic_write, 0, null, create_always, file_attribute_normal, null); createfilew(foo, generic_write, 0, null, create_always, file_attribute_normal, null);

compiling unicode

result: both files created

compiling multi byte character set

result: both files created

compiling not set

result: both files created

conclusion: both a , w version of api expect same encoding regardless of character set chosen. this, perhaps can assume character set option switch between version of api. a version expects strings in encoding of current code page , w version expects utf-16 or ucs-2.

opening files using w , win32 apis

so using following code:

char filea[max_path] = {0}; openfilenamea ofna = {0}; ofna.lstructsize = sizeof ( ofna ); ofna.hwndowner = null  ; ofna.lpstrfile = filea ; ofna.nmaxfile = max_path; ofna.lpstrfilter = "all\0*.*\0text\0*.txt\0"; ofna.nfilterindex =1; ofna.lpstrfiletitle = null ; ofna.nmaxfiletitle = 0 ; ofna.lpstrinitialdir=null ; ofna.flags = ofn_pathmustexist|ofn_filemustexist ;    wchar_t filew[max_path] = {0}; openfilenamew ofnw = {0}; ofnw.lstructsize = sizeof ( ofnw ); ofnw.hwndowner = null  ; ofnw.lpstrfile = filew ; ofnw.nmaxfile = max_path; ofnw.lpstrfilter = l"all\0*.*\0text\0*.txt\0"; ofnw.nfilterindex =1; ofnw.lpstrfiletitle = null; ofnw.nmaxfiletitle = 0 ; ofnw.lpstrinitialdir=null ; ofnw.flags = ofn_pathmustexist|ofn_filemustexist ;  getopenfilenamea(&ofna); getopenfilenamew(&ofnw);

and selecting either:

c:\temp\テスト\テopenw.txt
c:\temp\テスト\テopenw.txt

yields:

when compiled unicode

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2

when compiled multi byte character set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2

when compiled not set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == shift-jis (code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == utf-16 or ucs-2

conclusion: again, character set setting doesn't have bearing on behaviour of win32 api. a version seems return string encoding of active code page , w 1 returns utf-16 or ucs-2. can see explained bit in great answer: https://stackoverflow.com/a/3299860/187100.

ultimate conculsion

hans appears correct when says define doesn't have magic it, beyond changing win32 apis use either w or a. therefore, can't see difference between not set , multi byte character set.

no, that's not way works. thing happens macro gets defined, doesn't otherwise have magic effect on compiler. very rare write code uses #ifdef _mbcs test macro.

you leave helper function make conversion. widechartomultibyte(), ole2a() or wctombs(). conversion functions consider multi-byte encodings, guided code page. _mbcs historical accident, relevant 25+ years ago when multi-byte encodings not common yet. using non-unicode encoding historical artifact these days well.

Brazier

Search This Blog

c++ - Visual Studio Character Sets 'Not set' vs 'Multi byte character set' -

Comments

Post a Comment