class template
<codecvt>
std::codecvt_utf8_utf16
template < class Elem, unsigned long MaxCode = 0x10ffffUL, codecvt_mode Mode = (codecvt_mode)0 >
class codecvt_utf8_utf16 : public codecvt <Elem, char, mbstate_t>
Convert between UTF-8 and UTF-16
Converts between multibyte sequences encoded in UTF-8 and UTF-16.
The facet uses Elem as its internal character type (encoded as UTF-16), and char as its external character type (encoded as UTF-8). Therefore:
Template parameters
- Elem
- The internal character type, aliased as member intern_type. This shall be a wide character type: wchar_t, char16_t or char32_t.
For 32bit-wide characters, conversions in of characters result in one UTF-16 code unit stored per wide character (as a 32-bit value).
The external character type in this facet is always char.
- MaxCode
- The largest code point that will be translated without reporting a conversion error.
- Mode
- Bitmask value of type codecvt_mode:
label | value | description |
consume_header | 4 | An optional initial header sequence (BOM) is read to determine whether a multibyte sequence converted in is big-endian or little-endian. |
generate_header | 2 | An initial header sequence (BOM) shall be generated to indicate whether a multibyte sequence converted out is big-endian or little-endian. |
little_endian | 1 | The multibyte sequence generated on conversions out shall be little-endian (as opposed to the default big-endian). |
Member types
The following aliases are member types of codecvt_utf8_utf16, inherited from codecvt:
member type | definition | notes |
intern_type | The first template parameter (Elem) | The internal character type (encoded as UTF-16). |
extern_type | char | The external character type (encoded as UTF-8). |
state_type | mbstate_t | Conversion state type (see mbstate_t). |
result | codecvt_base::result | Enum type with the result of a conversion operation (see codecvt_base::result). |
Public member functions inherited from codecvt
- (constructor)
- codecvt constructor (public member function
)
Conversion functions:
- in
- Translate in characters (public member function
)
- out
- Translate out characters (public member function
)
- unshift
- Unshift translation state (public member function
)
Character encoding properties:
- always_noconv
- Return noconv characteristics (public member function
)
- encoding
- Return encoding width (public member function
)
- length
- Return length of translated sequence (public member function
)
- max_length
- Return max length of one character (public member function
)
Virtual protected member functions
The class defines its functionality through its virtual protected member functions:
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
// codecvt_utf8_utf16 example
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
int main ()
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> conversion;
std::string mbs = conversion.to_bytes( u"\u4f60\u597d" ); // ni hao (你好)
// print out hex value of each byte:
std::cout << std::hex;
for (int i=0; i<mbs.length(); ++i)
std::cout << int(unsigned char(mbs[i])) << ' ';
std::cout << '\n';
return 0;
}
|
Output: