急!!已获取网页源码到string变量,再提取其中超链接的问题。
不要用Webbrowser控件,如何把源码string中所有超链接提取出来???前段时间弄好的了。。后来不知道怎么删掉,这两天百度谷歌找疯了都找不到了。。。只找到这一段。它是用Webbrowser的,现在我想问如何把我存源码的string变量直接转个格式赋予到theHTML中?记得是可以,但是现在怎么也不会写了。。。help me!!!
Dim theHTML As New HTMLDocument
Set theHTML = wb.document
' wb = ActiveX WebBrowser
Dim collLink As IHTMLElementCollection
' Get all links
Set collLink = theHTML.All.tags("a")
For i = 0 To collLink.length - 1
Debug.Print "Link " & CStr(i + 1) & ": " & collLink(i) & vbNewLine
Next
[解决办法]
源码的串,你已经得到了。那么只要用正则表达式对象一处理就可以了。
可以参考下面的示例,有什么问题自己去看过正则表达式的文章进行修改就是了:
Sub GetURL(ByVal s As String)
Dim re As RegExp
Dim mh As Match
Dim mhs As MatchCollection
Set re = New RegExp
re.Global = True
re.Pattern = "href= ""(http(s)?://[\s\S]+?)"""
If re.Test(s) = False Then Exit Sub
Set mhs = re.Execute(s)
For Each mh In mhs
Debug.Print mh.SubMatches(0)
Next
End Sub
[解决办法]
Dim theHTML As New HTMLDocument
wb.document.body.innerHTML=你的源码
Set theHTML = wb.document
' wb = ActiveX WebBrowser
Dim collLink As IHTMLElementCollection
' Get all links
Set collLink = theHTML.All.tags("a")
For i = 0 To collLink.length - 1
Debug.Print "Link " & CStr(i + 1) & ": " & collLink(i) & vbNewLine
Next
[解决办法]
Private Sub Form_Load() Dim a As New HTMLDocument a.body.innerHTML = "<a herf=www.baidu.com>aa</a>"' a.write "<a herf=www.baidu.com>aa</a>" MsgBox a.links(0).hrefEnd Sub