Python的libxml2库支持xpath。但默认没有包含该库,需要单独安装。
libxml2 Win32版可以在如下地址下载:
http://xmlsoft.org/sources/win32/python/
我的Python版本是2.5,这里我下载安装了libxml2-python-2.6.30.win32-py2.5.exe
安装程序会将libxml2安装到python2.5的默认目录下(我安装的是ActivePython-2.5.2.2-win32-x86.msi,默认安装路径是C:Python25)。
另外一种安装方法是利用easy_install工具,它有点类似linux下的yum工具。
详见: http://codespeak.net/lxml/installation.html
Get the easy_install tool and run the following as super-user (or administrator):
easy_install lxml
-
On MS Windows, the above will install the binary builds that we provide. If there is no binary build of the latest release yet, please search PyPI for the last release that has them and pass that version to easy_install like this:
easy_install lxml==2.2.2
-
On Linux (and most other well-behaved operating systems), easy_install will manage to build the source distribution as long as libxml2 and libxslt are properly installed, including development packages, i.e. header files, etc. Use your package management tool to look for packages like libxml2-dev or libxslt-devel if the build fails, and make sure they are installed.
-
On MacOS-X, use the following to build the source distribution, and make sure you have a working Internet connection, as this will download libxml2 and libxslt in order to build them:
STATIC_DEPS=true easy_install lxml
附:setuptools-0.6c11.win32-py2.5.exe 即easy_install,注意:本安装包适用于Python25。
setuptools-0.6c11.win32-py2.5.rar |
解压后,直接安装即可。
然后,命令行切换至C:\Python25\Lib\site-packages,并运行 easy_install lxml==2.2.2 即可完成libxml2的安装。
安装后可以用下面的程序测试,让我们一起来见识一下强大的xpath!
File: Click to Download
#coding:utf-8
import codecs
import sys
#不加如下行,无法打印Unicode字符,产生UnicodeEncodeError错误。?
sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)
from lxml import etree
html = r'''<div>
<div>redice</div>
<div id="email">redice@163.com</div>
<div name="address">中国</div>
<div>http://www.redicecn.com</div>
</div>'''
tree = etree.HTML(html)
#获取email。email所在的div的id为email
nodes = tree.xpath("//div[@id='email']")
print nodes[0].text
#获取地址。地址所在的div的name为address
nodes = tree.xpath("//div[@name='address']")
print nodes[0].text
#获取博客地址。博客地址位于email之后兄弟节点的第二个
nodes = tree.xpath("//div[@id='email']/following-sibling::div[2]")
print nodes[0].text
呵呵,谢谢
VaTG790i.最好的<a href=http://www.kyfei.com>网站推广软件</a>,
非常好
....................
;ui;普i;uighur;ui;ui;个
在unix网络编程中看到了关于TCP/IP的一些内容,我感觉还是写的不够。正在下载中,一定
下载地址呢