怎么读取UNICODE编码的TXT文件并显示出内容里的中文字符串

2013-11-03

如何读取UNICODE编码的TXT文件并显示出内容里的中文字符串//读取TXT可编码是UNICODE的时候显示不出中文wst

如何读取UNICODE编码的TXT文件并显示出内容里的中文字符串


//读取TXT可编码是UNICODE的时候显示不出中文
wstring s;
wstring x;
wifstream input;    
vector<wstring> vec;

wchar_t out;


input.open(filename,ios::in );

if(input.fail())
{
   cout << "打开文件失败！" << endl;
}


while(!input.eof())//获取全部内容进vec容器中
{
 

getline(input, s);//这里出现乱码

vec.push_back(s);
}





x=vec[0]; 

for(int i=0;i<x.size();i++)
{
out=x[i];
wcout<<out;  //一片漆黑，也就是说中文出不来。英文可以，但是有空格。
}


//************************
输出
w o s h i y i g e b i n g

原来
woshiyigebing

[解决办法]
for(int i=0;i<x.size();i++)
{
out=x[i];
wcout<<out; //一片漆黑，也就是说中文出不来。英文可以，但是有空格。
}

不需要循环呀
[解决办法]

#include <stdio.h>
#include <WCHAR.h>
#include <Windows.h>
//windows默认是GBK码显示的，所以出现乱码；

 char* ToGBK(unsigned int ucode/*unicode码,为四个字节*/){

char* Unicode_char=new char[5];                   
wsprintf(Unicode_char,"%wc",(wchar_t)ucode);              
return Unicode_char;//返回gbk码
}
void main(){
/*
0X4E00为汉字　"一"的UNICODE码；
0XD2BB为汉字"一"GBK码

0X963f为汉字"阿"的UNICODE码；
0XB0A2为汉字"阿"GBK码
 */
char *ch;
ch=ToGBK(0X4E00);
printf(ch);
unsigned char low=*ch;//取汉字对应的内存数据
unsigned char high=*(ch+1);
printf("%02X %02X ",low,high);//gbk码
delete ch;
ch=NULL;

}

楼主可看下ToGBK这个函数；应该可以解决你的那个问题，我之前也遇到楼主类似的问题，现在把函数翻出来供上，有问题可留言
[解决办法]
In Visual C++ 2005, fopen supports Unicode file streams. A flag specifying the desired encoding may be passed to fopen when opening a new file or overwriting an existing file, like this:

fopen("newfile.txt", "rw, ccs=<encoding>");

Allowed values of the encoding include UNICODE, UTF-8, and UTF16-LE. If the file is already in existence and is opened for reading or appending, the Byte Order Mark (BOM) is used to determine the correct encoding. It is not necessary to specify the encoding with a flag. In fact, the flag will be ignored if it conflicts with the type of the file as indicated by the BOM. The flag is only used when no BOM is present or if the file is a new file. The following table summarizes the modes used in for various flags given to fopen and Byte Order Marks used in the file.

Encodings Used Based on Flag and BOM
Flag
[解决办法]
No BOM (or new file)
[解决办法]
BOM: UTF-8
[解决办法]
BOM: UTF-16
---------+----------------------+------------+------------
UNICODE
[解决办法]
ANSI
[解决办法]
UTF-8
[解决办法]
UTF-16LE
UTF-8
[解决办法]
UTF-8
[解决办法]
UTF-8
[解决办法]
UTF-16LE
UTF-16LE
[解决办法]
UTF-16LE
[解决办法]
UTF-8
[解决办法]
UTF-16LE

热点排行