c++ - Convert UTF8 encoded byte buffer to wstring? -
does c++ standard template library (stl) provide method convert utf8 encoded byte buffer wstring?
for example:
const unsigned char* szbuf = (const unsigned char*) "d\xc3\xa9j\xc3\xa0 vu"; std::wstring str = method(szbuf); // should assign "déjà vu" str
i want avoid having implement own utf8 conversion code, this:
const unsigned char* pch = szbuf; while (*pch != 0) { if ((*pch & 0x80) == 0) { str += *pch++; } else if ((*pch & 0xe0) == 0xc0 && (pch[1] & 0xc0) == 0x80) { wchar_t ch = (((*pch & 0x1f) >> 2) << 8) + ((*pch & 0x03) << 6) + (pch[1] & 0x3f); str += ch; pch += 2; } else if (...) { // other cases omitted } }
edit: comments , answer. code fragment performs desired conversion:
std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert; str = convert.from_bytes((const char*)szbuf);
in c++11 can use std::codecvt_utf8
. if don't have that, may able persuade iconv
want; unfortunately, that's not ubiquitous either, not implementations have support utf-8, , i'm not aware of any way find out appropriate thing pass iconv_open
conversion wchar_t
.
if don't have either of things, best bet third-party library such icu. surprisingly, boost not appear have purpose, although coulda missed it.
Comments
Post a Comment