首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > VB >

求救啊正则读网页有关问题,高手们请帮帮忙

2012-02-22 
求救啊..正则读网页问题,高手们请帮帮忙啊我要读取网页代码中的内容输入数据库,下面是其中一段我想要:tab

求救啊..正则读网页问题,高手们请帮帮忙啊
我要读取网页代码中的内容输入数据库,下面是其中一段我想要:
<table   class= "os "   width= "100% ">
  <tr>
  <th   align= "left "   colspan=2>
  (总称)WA   est   2007   ( <a   style= "color:   #FFFFFF "   href= "/osOview/Foll/spId/1/sitel/0 "> Foll </a> ,   Asiall)  
  <br/>
  <div   class= "smallText "   style= "font-weight:   normal;   color:   #FFFFFF "> Highest   (average)   os   and   prties   listed   for   selected   bes </div>
  </th>
  </tr>
  <tr   bgcolor= "#FFFFFF ">
  <td   valign= "top "   width= "15% ">
  <a   href= "/osOview/产品1-产品2/ev/97305898/site/0 ">
  产品1- <br> 产品2 </a> <br/>
  上市时间   24/06/07   17:30 </td>
  <td   style= "padding-bottom:4px; ">
  <table   class= "osInc "   style= "margin-bottom:10px; ">
  <tr>
  <td   width=150> <a   href= "/osDetail/Foll-Asiall-WA_est_2007/evId/97305898/bTypeId/43/scId/2/sitel/0 "> 价格 </a> : <br>   (2)   &nbsp; </td>
  <td   width=100> Home&nbsp;(27%) <br> 价格1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Draw&nbsp;(29%) <br> 价格2 </td> <td   width=15> &nbsp; </td>
<td   width=100> Away&nbsp;(44%) <br> 价格3 </td> <td   width=15> &nbsp; </td>
</tr>
  <tr>
  <td   width=150> <a   href= "/osDetail/Foll-Asiall-WA_est_2007/evId/97305898/bTypeId/48/scId/2/sitel/0 "> 颜色 </a> : <br>   (4)   &nbsp; </td>
  <td   width=100> Home&nbsp;0&nbsp; <br> 颜色1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Away&nbsp; <br> 颜色2 </td> <td   width=15> &nbsp; </td>
</tr>
  <tr>
  <td   width=150> <a   href= "/oddsDetail/Football-Asia-WAFF_West_Asian_Football_Championship_2007/eventId/97305898/betTypeId/47/scopeId/2/site/0 "> 款式 </a> : <br>   (4)   &nbsp; </td>
  <td   width=100> Over&nbsp;2&nbsp; <br> 款式1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Under&nbsp; <br> 款式2 </td> <td   width=15> &nbsp; </td>
</tr>
  </table>
<table   class= "os "   width= "100% ">
  <tr>
  <th   align= "left "   colspan=2>
  (总称)WA   est   2008   ( <a   style= "color:   #FFFFFF "   href= "/osOview/Foll/spId/1/sitel/0 "> Foll </a> ,   Asiall)  
  <br/>
  <div   class= "smallText "   style= "font-weight:   normal;   color:   #FFFFFF "> Highest   (average)   os   and   prties   listed   for   selected   bes </div>
  </th>
  </tr>
  <tr   bgcolor= "#FFFFFF ">
  <td   valign= "top "   width= "15% ">


  <a   href= "/osOview/产品1-产品2/ev/97305898/site/0 ">
  产品1- <br> 产品2 </a> <br/>
  上市时间   24/06/07   17:30 </td>
  <td   style= "padding-bottom:4px; ">
  <table   class= "osInc "   style= "margin-bottom:10px; ">
  <tr>
  <td   width=150> <a   href= "/osDetail/Foll-Asiall-WA_est_20078/evId/97305898/bTypeId/43/scId/2/sitel/0 "> 价格 </a> : <br>   (2)   &nbsp; </td>
  <td   width=100> Home&nbsp;(27%) <br> 价格1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Draw&nbsp;(29%) <br> 价格2 </td> <td   width=15> &nbsp; </td>
<td   width=100> Away&nbsp;(44%) <br> 价格3 </td> <td   width=15> &nbsp; </td>
</tr>
  <tr>
  <td   width=150> <a   href= "/osDetail/Foll-Asiall-WA_est_2007/evId/97305898/bTypeId/48/scId/2/sitel/0 "> 颜色 </a> : <br>   (4)   &nbsp; </td>
  <td   width=100> Home&nbsp;0&nbsp; <br> 颜色1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Away&nbsp; <br> 颜色2 </td> <td   width=15> &nbsp; </td>
</tr>
  <tr>
  <td   width=150> <a   href= "/oddsDetail/Football-Asia-WAFF_West_Asian_Football_Championship_2007/eventId/97305898/betTypeId/47/scopeId/2/site/0 "> 款式 </a> : <br>   (4)   &nbsp; </td>
  <td   width=100> Over&nbsp;2&nbsp; <br> 款式1 </td> <td   width=15> &nbsp; </td>
<td   width=100> Under&nbsp; <br> 款式2 </td> <td   width=15> &nbsp; </td>
</tr>
  </table>
我想通过正则读取其中的(总称)名和产品1、产品2、价格、价格1、价格2、颜色、颜色1、颜色2、款式、款式1、款式2,读下来写进数据库的表的格式是:
总称       产品1       产品2     价格     价格1     价格2       颜色       颜色1     颜色2     款式       款式1     款式2
本人刚学vb又刚接触正则,实在是没办法啊,请各位大虾帮我想想办法啊,小弟不胜感激啊,在线等

[解决办法]
vb能用正则?
发个ASP的给你
Public Function GetNewsBody(NewsUrl)
Dim BodyHtml
BodyHtml=GetHtml(NewsUrl) '取得源码
BodyHtml=RegReplace(BodyHtml, "\n ", " ") '替换换行符
BodyHtml=RegExpTest(GetBodyRegex,BodyHtml,0) '取正文

BodyHtml=RegReplace(BodyHtml, " <script.*?> .*? </script> ", " ") '替换脚本
BodyHtml=RegReplace(BodyHtml, "( <div.*?> )(.+) ", "$2 ") '去第一个DIV
BodyHtml=RegReplace(BodyHtml, "(.+)( </div> ) ", "$1 ") '去最后一个DIV
BodyHtml=RegReplace(BodyHtml, " <form.*?> ", " ") '去form
BodyHtml=RegReplace(BodyHtml, " </form> ", " ") '去form

'下载并替换图片路径
Set re=new RegExp
re.IgnoreCase =true
re.Global=True
'下面的正则中.SubMatches(4)=文件名全名.SubMatches(5)文件扩展名
re.Pattern = "((http):(?:\/\/){1}(?:(?:\w)+[.])+(net|com|cn|org|cc|tv|[0-9]{1,4})(\S*\/)((?:\S)+[.]{1}(gif|jpg|jpeg|png|bmp))) "
Set RemoteFile = re.Execute(BodyHtml)
Dim SaveFileName

For Each RemoteFileUrl in RemoteFile
SaveFileName = RemoteFileUrl.SubMatches(4)
Call SaveRemoteFile(SaveFileFolder & SaveFileName,RemoteFileUrl) '保存图片


BodyHtml=Replace(BodyHtml,RemoteFileUrl,SaveFileFolder & SaveFileName) '替换图片路径
Next

'删除链接
if DeleteLink=true then
BodyHtml=RegReplace(BodyHtml, " <(/)?a.*?> ", " ")
end if

GetNewsBody=BodyHtml
End Function
[解决办法]
因为你要提取的内容太多了,一句句写正则很花脑筋。下面是本人以前写的一个去掉绝大部分HTML标签的VB函数,希望对你有用,同时希望得到你的加分。

Private Function htmTotxt(ByVal s As String)
Dim temp As String
Set objRegExp = New RegExp
objRegExp.IgnoreCase = True '转换-正则(大小写)
objRegExp.Global = True '全局性
objRegExp.Pattern = " <!--[\s\S]*?--> "
s = objRegExp.Replace(s, " ")
objRegExp.Pattern = " <(style)[^ <]*> [^ <]* <\/\1> " '式样表
s = objRegExp.Replace(s, " ")
objRegExp.Pattern = " <(script)[^ <]*> [\s\S]*? <\/\1> " '脚本
s = objRegExp.Replace(s, " ")
objRegExp.Pattern = " <br[^> ]*> " 'br
s = objRegExp.Replace(s, vbCrLf)
objRegExp.Pattern = " <(title)[^ <]*> [^ <]* <\/\1> " '标题
s = objRegExp.Replace(s, " ")
objRegExp.Pattern = " <[^ <]*> " 'all html
s = objRegExp.Replace(s, " ")
' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
s = Replace(s, "&quot; ", Chr(34)) '双引号
s = Replace(s, "&#39; ", Chr(39)) '单引号
s = Replace(s, "&lt; ", " < ") '左 <
s = Replace(s, "&gt; ", "> ") '右>
s = Replace(s, "&#40; ", "( ") '左(
s = Replace(s, "&#41; ", ") ") '右)
s = Replace(s, "&#42; ", "* ") '*
s = Replace(s, "&#45; ", "- ") 'SQL注释符
s = Replace(s, "&amp; ", "& ") '&
s = Replace(s, "&#59; ", "; ") '分号
s = Replace(s, "&middot; ", "· ") '
' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
objRegExp.Pattern = "&[^;]{2,4}; "
s = objRegExp.Replace(s, " ")
objRegExp.Pattern = "\n[ \f\r\t\v]* "
s = objRegExp.Replace(s, vbCrLf)
objRegExp.Pattern = "[\n\x0a\x0d]+ "
s = objRegExp.Replace(s, vbCrLf) ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '
objRegExp.Pattern = "^[\s]* " '开头的空白
s = objRegExp.Replace(s, " ")
Set objRegExp = Nothing

htmTotxt = s
End Function

热点排行