# Q:想要擷取出網頁中某區塊標籤所有的文字(包含html標籤等)內的某一資料...結果是擷取出網頁內所有同屬性的資料...
# crawl library ----
library( bitops)
library( XML)
library( RCurl)
# only for windows -----
signatures
<- system .
file ( "CurlSSL" , cainfo
= "cacert.pem" , package
= "RCurl" )
home <- "http://w...content-available-to-author-only...e.com/aussteller/messen/index.php?OK=1&sortierid=0&maxPerPage=20&i_cockpitkeyfindwo=2&i_cockpitkeyfindart=1&currPage=1"
home <- getURL( home, cainfo = signatures)
home <- htmlParse( home)
# 共有20筆的block
block <- getNodeSet( home, "//div[@class='shm']" )
# length(block) # 共20筆
doc <- block[ [ 1 ] ] # 抓取第一筆
# doc
# <div class="shm">
# <div class="listdates">
# <div class="date">08Jan-17Jan2016</div>
# </div>
# <div class="search_result_list_box">
# <div class="city">London, United Kingdom</div>
# <div class="firma"><a href="show.php?id=352&timer=m1452657261&tmid=&currPage=1&maxPerPage=20&params=timer%3Dm1452657261%26amp%3Btimer%3Dm1452657261%26amp%3Bi_cockpitkeyfindwo%3D2%26amp%3Bi_cockpitkeyfindart%3D1%26amp%3Bsortierid%3D0%26amp%3Btimer%3Dm1452657261%26amp%3BmaxPerPage%3D20%26amp%3BshowPrintlist%3D0%26amp%3BmaxPerPage%3D20">London Boat Show</a></div>
# </div>
# <div class="search_result_box_right">
# <div class="branchen"><strong>Business sectors:</strong> Boats</div>
# </div>
# <div class="fixfloat"></div>
# </div>
# 想要抓取 block[[1]]筆的 date
date <- xpathSApply( doc, "//div[@class='date']" , xmlValue)
# 結果秀出網頁內全部 date 的資料
# [1] "08Jan-17Jan2016" "08Jan-17Jan2016" "09Jan-17Jan2016" "07Jan-15Jan2017" "06Jan-14Jan2018"
# [6] "09Jan-17Jan2016" "09Jan-17Jan2016" "09Jan-17Jan2016" "09Jan-17Jan2016" "10Jan-13Jan2016"
# [11] "10Jan-13Jan2016" "10Jan-13Jan2016" "10Jan-13Jan2016" "11Jan-13Jan2016" "11Jan-13Jan2016"
# [16] "11Jan-13Jan2016" "11Jan-14Jan2016" "11Jan-14Jan2016" "11Jan-14Jan2016" "11Jan-14Jan2016"
# [21] "11Jan-14Jan2016" "11Jan-24Jan2016" "09Jan-22Jan2017"# your code goes here
IyBROuaDs+imgeaTt+WPluWHuue2sumggeS4reafkOWNgOWhiuaomeexpOaJgOacieeahOaWh+WtlyjljIXlkKtodG1s5qiZ57Gk562JKeWFp+eahOafkOS4gOizh+aWmS4uLue1kOaenOaYr+aTt+WPluWHuue2sumggeWFp+aJgOacieWQjOWxrOaAp+eahOizh+aWmS4uLgoKIyBjcmF3bCBsaWJyYXJ5IC0tLS0KbGlicmFyeShiaXRvcHMpCmxpYnJhcnkoWE1MKQpsaWJyYXJ5KFJDdXJsKQoKIyBvbmx5IGZvciB3aW5kb3dzIC0tLS0tCnNpZ25hdHVyZXMgPC0gc3lzdGVtLmZpbGUoIkN1cmxTU0wiLCBjYWluZm89ImNhY2VydC5wZW0iLCBwYWNrYWdlPSJSQ3VybCIpCgpob21lIDwtICJodHRwOi8vdy4uLmNvbnRlbnQtYXZhaWxhYmxlLXRvLWF1dGhvci1vbmx5Li4uZS5jb20vYXVzc3RlbGxlci9tZXNzZW4vaW5kZXgucGhwP09LPTEmc29ydGllcmlkPTAmbWF4UGVyUGFnZT0yMCZpX2NvY2twaXRrZXlmaW5kd289MiZpX2NvY2twaXRrZXlmaW5kYXJ0PTEmY3VyclBhZ2U9MSIKaG9tZSA8LSBnZXRVUkwoaG9tZSwgY2FpbmZvID0gc2lnbmF0dXJlcykKaG9tZSA8LSBodG1sUGFyc2UoaG9tZSkKCiMg5YWx5pyJMjDnrYbnmoRibG9jawpibG9jayA8LWdldE5vZGVTZXQoaG9tZSwgIi8vZGl2W0BjbGFzcz0nc2htJ10iKQoKIyBsZW5ndGgoYmxvY2spICAjIOWFsTIw562GCgpkb2MgPC0gYmxvY2tbWzFdXSAjICDmipPlj5bnrKzkuIDnrYYKCiMgZG9jCiMgPGRpdiBjbGFzcz0ic2htIj4gICAgICAKIyAgIDxkaXYgY2xhc3M9Imxpc3RkYXRlcyI+ICAgCiMgICA8ZGl2IGNsYXNzPSJkYXRlIj4wOEphbi0xN0phbjIwMTY8L2Rpdj4KIyAgIDwvZGl2PiAgIAojICAgPGRpdiBjbGFzcz0ic2VhcmNoX3Jlc3VsdF9saXN0X2JveCI+CiMgICA8ZGl2IGNsYXNzPSJjaXR5Ij5Mb25kb24sIFVuaXRlZCBLaW5nZG9tPC9kaXY+CiMgICA8ZGl2IGNsYXNzPSJmaXJtYSI+PGEgaHJlZj0ic2hvdy5waHA/aWQ9MzUyJmFtcDt0aW1lcj1tMTQ1MjY1NzI2MSZhbXA7dG1pZD0mYW1wO2N1cnJQYWdlPTEmYW1wO21heFBlclBhZ2U9MjAmYW1wO3BhcmFtcz10aW1lciUzRG0xNDUyNjU3MjYxJTI2YW1wJTNCdGltZXIlM0RtMTQ1MjY1NzI2MSUyNmFtcCUzQmlfY29ja3BpdGtleWZpbmR3byUzRDIlMjZhbXAlM0JpX2NvY2twaXRrZXlmaW5kYXJ0JTNEMSUyNmFtcCUzQnNvcnRpZXJpZCUzRDAlMjZhbXAlM0J0aW1lciUzRG0xNDUyNjU3MjYxJTI2YW1wJTNCbWF4UGVyUGFnZSUzRDIwJTI2YW1wJTNCc2hvd1ByaW50bGlzdCUzRDAlMjZhbXAlM0JtYXhQZXJQYWdlJTNEMjAiPkxvbmRvbiBCb2F0IFNob3c8L2E+PC9kaXY+CiMgICA8L2Rpdj4KIyAgIDxkaXYgY2xhc3M9InNlYXJjaF9yZXN1bHRfYm94X3JpZ2h0Ij4KIyAgIDxkaXYgY2xhc3M9ImJyYW5jaGVuIj48c3Ryb25nPkJ1c2luZXNzIHNlY3RvcnM6PC9zdHJvbmc+IEJvYXRzPC9kaXY+CiMgICA8L2Rpdj4KIyAgIDxkaXYgY2xhc3M9ImZpeGZsb2F0Ij48L2Rpdj4KIyA8L2Rpdj4gCgojIOaDs+imgeaKk+WPliBibG9ja1tbMV1d562G55qEIGRhdGUKZGF0ZSA8LSB4cGF0aFNBcHBseShkb2MsICIvL2RpdltAY2xhc3M9J2RhdGUnXSIsIHhtbFZhbHVlKQoKIyDntZDmnpznp4Dlh7rntrLpoIHlhaflhajpg6ggZGF0ZSDnmoTos4fmlpkKIyBbMV0gIjA4SmFuLTE3SmFuMjAxNiIgIjA4SmFuLTE3SmFuMjAxNiIgIjA5SmFuLTE3SmFuMjAxNiIgIjA3SmFuLTE1SmFuMjAxNyIgIjA2SmFuLTE0SmFuMjAxOCIKIyBbNl0gIjA5SmFuLTE3SmFuMjAxNiIgIjA5SmFuLTE3SmFuMjAxNiIgIjA5SmFuLTE3SmFuMjAxNiIgIjA5SmFuLTE3SmFuMjAxNiIgIjEwSmFuLTEzSmFuMjAxNiIKIyBbMTFdICIxMEphbi0xM0phbjIwMTYiICIxMEphbi0xM0phbjIwMTYiICIxMEphbi0xM0phbjIwMTYiICIxMUphbi0xM0phbjIwMTYiICIxMUphbi0xM0phbjIwMTYiCiMgWzE2XSAiMTFKYW4tMTNKYW4yMDE2IiAiMTFKYW4tMTRKYW4yMDE2IiAiMTFKYW4tMTRKYW4yMDE2IiAiMTFKYW4tMTRKYW4yMDE2IiAiMTFKYW4tMTRKYW4yMDE2IgojIFsyMV0gIjExSmFuLTE0SmFuMjAxNiIgIjExSmFuLTI0SmFuMjAxNiIgIjA5SmFuLTIySmFuMjAxNyIjIHlvdXIgY29kZSBnb2VzIGhlcmU=