无法使用 libxml2 和 xpath 获取节点
Can't get node with libxml2 and xpath
我有以下代码,可以从HTML页面找到所有元素:
string AParser::cleanHTMLDocument(const string& aDoc) {
vector<xmlNodePtr> nodesToRemove;
xmlDocPtr doc = xmlParseDoc((xmlChar *)aDoc.c_str());
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathObjectPtr result = xmlXPathEvalExpression(
(const xmlChar *)string("//link").c_str(), context);
if (xmlXPathNodeSetIsEmpty(result->nodesetval)) {
xmlXPathFreeObject(result);
xmlXPathFreeContext(context);
xmlFreeDoc(doc);
LOG(WARNING)<< "XPath is invalid, bailing out.";
return string();
}
const int size = result->nodesetval->nodeNr;
for(int i = size - 1; i >= 0; i--) {
LOG(DEBUG)<< result->nodesetval->nodeTab[i]->name;
}
}
,但由于某种原因, xmlXPathNodeSetIsEmpty
始终是正确的。我在这里错过了什么吗?
更新:输入文档
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.1//EN' 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>
<title>The Republic, by Plato</title>
<link href="0.css" type="text/css" rel="stylesheet"/>
<link href="1.css" type="text/css" rel="stylesheet"/>
<link href="pgepub.css" type="text/css" rel="stylesheet"/>
<meta content="EpubMaker 0.3.20a6 by Marcello Perathoner <webmaster@gutenberg.org>" name="generator"/>
</head>
<body>
<div xml:space="preserve" class="pgmonospaced pgheader"><br/>The Project Gutenberg EBook of The Republic, by Plato<br/><br/>This eBook is for the use of anyone anywhere at no cost and with<br/>almost no restrictions whatsoever. You may copy it, give it away or<br/>re-use it under the terms of the Project Gutenberg License included<br/>with this eBook or online at www.gutenberg.org<br/><br/><br/>Title: The Republic<br/><br/>Author: Plato<br/><br/>Translator: B. Jowett<br/><br/>Release Date: August 27, 2008 [EBook #1497]<br/>Last Updated: November 5, 2012<br/><br/>Language: English<br/><br/><br/>*** START OF THIS PROJECT GUTENBERG EBOOK THE REPUBLIC ***<br/><br/><br/><br/><br/>Produced by Sue Asscher, and David Widger<br/><br/><br/><br/><br/><br/></div>
<p><br/>
<br/></p>
<h1 id="pgepubid00000">THE REPUBLIC</h1>
<p><br/></p>
<h2>By Plato</h2>
<p><br/></p>
<h3 id="pgepubid00001">Translated by Benjamin Jowett</h3>
<p><br/>
<br/>
<br/>
Note: See also "The Republic" by Plato, Jowett, etext #150<br/>
<br/></p>
<hr/>
<p><br/>
<br/></p>
<h2 id="pgepubid00002">Contents</h2>
<table summary="">
<tbody><tr>
<td>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-0.htm.html#link2H_INTR" class="pginternal">INTRODUCTION AND ANALYSIS.</a></p>
<br/>
<p class="toc"><a class="c1 pginternal" href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-7.htm.html#link2H_4_0002">THE REPUBLIC.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-7.htm.html#link2H_4_0003" class="pginternal">PERSONS OF THE DIALOGUE.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-7.htm.html#link2H_4_0004" class="pginternal">BOOK I.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-8.htm.html#link2H_4_0005" class="pginternal">BOOK II.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-9.htm.html#link2H_4_0006" class="pginternal">BOOK III.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-10.htm.html#link2H_4_0007" class="pginternal">BOOK IV.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-11.htm.html#link2H_4_0008" class="pginternal">BOOK V.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-12.htm.html#link2H_4_0009" class="pginternal">BOOK VI.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-14.htm.html#link2H_4_0010" class="pginternal">BOOK VII.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-14.htm.html#link2H_4_0011" class="pginternal">BOOK VIII.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-16.htm.html#link2H_4_0012" class="pginternal">BOOK IX.</a></p>
<p class="toc"><a href="@public@vhost@g@gutenberg@html@files@1497@1497-h@1497-h-16.htm.html#link2H_4_0013" class="pginternal">BOOK X.</a></p>
</td>
</tr>
</tbody></table>
<p><br/>
<br/></p>
<hr/>
<p><br/>
<br/>
<a id="link2H_INTR"><!-- H2 anchor --></a></p>
<h2 id="pgepubid00003">INTRODUCTION AND ANALYSIS.</h2>
<p>The Republic of Plato is the longest of his works with the exception of the Laws, and is certainly the greatest of them. There are nearer approaches to modern metaphysics in the Philebus and in the Sophist; the Politicus or Statesman is more ideal; the form and institutions of the State are more clearly drawn out in the Laws; as works of art, the Symposium and the Protagoras are of higher excellence. But no other Dialogue of Plato has the same largeness of view and the same perfection of style; no other shows an equal knowledge of the world, or contains more of those thoughts which are new as well as old, and not of one age only but of all. Nowhere in Plato is there a deeper irony or a greater wealth of humour or imagery, or more dramatic power. Nor in any other of his writings is the attempt made to interweave life and speculation, or to connect politics with philosophy. The Republic is the centre around which the other Dialogues may be grouped; here philosophy reaches the highest point (cp, especially in Books V, VI, VII) to which ancient thinkers ever attained. Plato among the Greeks, like Bacon among the moderns, was the first who conceived a method of knowledge, although neither of them always distinguished the bare outline or form from the substance of truth; and both of them had to be content with an abstraction of science which was not yet realized. He was the greatest metaphysical genius whom the world has seen; and in him, more than in any other ancient thinker, the germs of future knowledge are contained. The sciences of logic and psychology, which have supplied so many instruments of thought to after-ages, are based upon the analyses of Socrates and Plato. The principles of definition, the law of contradiction, the fallacy of arguing in a circle, the distinction between the essence and accidents of a thing or notion, between means and ends, between causes and conditions; also the division of the mind into the rational, concupiscent, and irascible elements, or of pleasures and desires into necessary and unnecessary—these and other great forms of thought are all of them to be found in the Republic, and were probably first invented by Plato. The greatest of all logical truths, and the one of which writers on philosophy are most apt to lose sight, the difference between words and things, has been most strenuously insisted on by him (cp. Rep.; Polit.; Cratyl), although he has not always avoided the confusion of them in his own writings (e.g. Rep.). But he does not bind up truth in logical formulae,—logic is still veiled in metaphysics; and the science which he imagines to 'contemplate all truth and all existence' is very unlike the doctrine of the syllogism which Aristotle claims to have discovered (Soph. Elenchi).</p>
</body></html>
您要查询的文档使用XML名称空间。您必须忽略命名空间或注册并使用它。
to 忽略命名空间,查询所有节点,并在谓词中比较本地名称(无需命名空间),例如//*[local-name(.) = 'link']
。
到注册一个名称空间,调用xmlXPathRegisterNs
,然后在前缀前缀所有带有[ns-prefix]:
名称空间的节点。例如:
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathRegisterNs(context, 'xhtml', 'http://www.w3.org/1999/xhtml');
xmlXPathObjectPtr result = xmlXPathEvalExpression(
(const xmlChar *)string("//xhtml:link").c_str(), context);
作为Jens发布的解决方案,您可以使用libxml2的HTML Parser解析HTML文档。您要做的就是用xmlParseDoc
替换CC_5。
相关文章:
- C++为构建时间获取QDateTime的可靠方法
- lambda参数转换为constexpr技巧,然后获取带链接的数组
- 如何使用 < 和 > 命令获取 c++ 中的输入和输出?
- 使用指针从C++中的数组中获取最大值
- 如何获取std::result_of函数的返回类型
- 如何在openssl-ecc中获取十六进制格式的私钥
- 使用Unreal C++获取VR耳机的世界位置/方向
- 获取日期异步信号安全吗?如果在信号处理程序中使用,它会导致死锁吗
- 从C字符串中获取奇怪的字符串长度
- 为什么我的for循环不能正确获取argv
- 从python中调用C++函数并获取返回值
- 如何获取一个数字的前3位
- 获取字符串的长度并将其分配给数组
- 无法获取菜单选择以运行函数.C++
- 数组长度,为什么从命令行获取时不能使用它?
- Boost Spirit,获取迭代器内部语义动作
- 尝试通过OCI例程从Oracle获取blob数据,但出现错误:ORA-01008:并非所有变量都绑定
- 具有默认值的引用获取函数
- 无法使用 <script> XPath 表达式通过 libxml++ 获取 XHTML 内容
- 无法使用 libxml2 和 xpath 获取节点