正则提取的中文是乱码,该如何处理

2012-04-11

正则提取的中文是乱码正则提取网页时编码是UTF-8,怎么实现提取的是中文？[解决办法]正则跟编码没关系 UTF-8

正则提取的中文是乱码
正则提取网页时编码是UTF-8,怎么实现提取的是中文？

[解决办法]
正则跟编码没关系 UTF-8，中文都是可以的
[解决办法]

你抓取过来网页源码时就编码乱了吗？

正则是肯定不会使你出现“乱码”的
[解决办法]
[\u4e00-\u9fa5]+

提取中文

上你的代码
[解决办法]

探讨

我看网页很好，提取出来发现是乱码，但是数字不是乱码，中文是乱码。

[解决办法]

探讨

http://topic.csdn.net/u/20120225/22/b5912ce0-ed81-4932-8bb3-a456708d69d4.html

就是这个，我按照5楼的写的，提取出来是乱码。

[解决办法]
我怎么抓就没乱码？？

随便提取出所有中文

using System.Net;
using System.IO;

C# code

 /// <summary>        /// 得到整个网页的源码        /// </summary>        /// <param name="Url"></param>        /// <returns></returns>        public static string _GetHtml(string Url)        {            Stream MyInStream = null;            string Html = "";            try            {                HttpWebRequest MyRequest = (HttpWebRequest)WebRequest.Create(Url);                HttpWebResponse MyResponse = (HttpWebResponse)MyRequest.GetResponse();                MyInStream = MyResponse.GetResponseStream();                Encoding encode = System.Text.Encoding.UTF8;                StreamReader sr = new StreamReader(MyInStream, encode);                Char[] read = new Char[256];                int count = sr.Read(read, 0, 256);                while (count > 0)                {                    String str = new String(read, 0, count);                    Html += str;                    count = sr.Read(read, 0, 256);                }            }            catch (Exception)            {                Html = "错误";            }            finally            {                if (MyInStream != null)                {                    MyInStream.Close();                }            }            return Html;        }        static void Main(string[] args)        {            string htmlStr = _GetHtml("http://topic.csdn.net/u/20120225/22/b5912ce0-ed81-4932-8bb3-a456708d69d4.html");            Regex re = new Regex(@"[\u4e00-\u9fa5]+", RegexOptions.None);            MatchCollection mc = re.Matches(htmlStr);            foreach (Match ma in mc)            {                Console.WriteLine(ma.Value);            }                      Console.ReadLine();                           }
[解决办法]
探讨

引用:
你的获取源码的代码。。。下面这两句肯定没弄对

Encoding encode = System.Text.Encoding.UTF8;
StreamReader sr = new StreamReader(MyInStream, encode);


这两句我没写，不懂怎么用，MyInStream是什么？

热点排行