首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 网站开发 > asp.net >

c# 正则 提取 。麻烦各位了 !该怎么解决

2012-03-18 
c# 正则 提取 。麻烦各位了 !C# codediv classpageboxspan classpagebox_pre_nolink上一页/span

c# 正则 提取 。麻烦各位了 !

C# code
<div class="pagebox"><span class="pagebox_pre_nolink">上一页</span><span class="pagebox_num_nonce">1</span><span class="pagebox_num"><a target="_self" href="102641554-2.html" class="page">2</a></span><span class="pagebox_num"><a target="_self" href="102641554-3.html" class="page">3</a></span><span class="pagebox_num"><a target="_self" href="102641554-4.html" class="page">4</a></span><span class="pagebox_num"><a target="_self" href="102641554-5.html" class="page">5</a></span><span class="pagebox_next"><a href="102641554-2.html">下一页</a></span></div>



输出 102641554-2.html 2
  102641554-3.html 3
  102641554-4.html 4
  102641554-5.html 5

也是说要 class="page"的标签的src属性和值 ,两个都要 ,源数据还有其它的标签 所以class="page"条件也要 。

[解决办法]
href="([^"]+)"[^>]*>(\d+)</a>



1: 102641554-2.html
2: 2
1: 102641554-3.html
2: 3
1: 102641554-4.html
2: 4
1: 102641554-5.html
2: 5

[解决办法]
C# code
            string str = @"<div class=""pagebox""><span class=""pagebox_pre_nolink"">上一页</span><span class=""pagebox_num_nonce"">1</span><span class=""pagebox_num""><a target=""_self"" href=""102641554-2.html"" class=""page"">2</a></span><span class=""pagebox_num""><a target=""_self"" href=""102641554-3.html"" class=""page"">3</a></span><span class=""pagebox_num""><a target=""_self"" href=""102641554-4.html"" class=""page"">4</a></span><span class=""pagebox_num""><a target=""_self"" href=""102641554-5.html"" class=""page"">5</a></span><span class=""pagebox_next""><a href=""102641554-2.html"">下一页</a></span></div>";            Regex reg = new Regex(@"(?is)<a[^>]*?href=(['""\s]?)(?<url>[^'""\s]+)\1[^>]*?class=""page""[^>]*?>(?<text>.*?)</a>");            foreach (Match m in reg.Matches(str))                Console.WriteLine("{0} {1}", m.Groups["url"].Value, m.Groups["text"].Value); 

热点排行