c++ Xerces解析器加载HTML并搜索HTML元素

C++ Xerces Parser Load HTML and Search for HTML Elements

本文关键字:HTML 搜索 元素 加载 Xerces c++      更新时间:2023-10-16

我试图加载HTML与Xerces DOMDocument c++解析器和搜索特定的HTML元素。我很难找到如何做到这一点的好例子。我所发现的似乎只是解析XML。有人能帮忙吗?谢谢。

看看这个:http://xerces.apache.org/xerces-c/program-dom-3.html

也有一个关于DOMDocument的例子:

////创建一个小文档树//

{
    XMLCh tempStr[100];
    XMLString::transcode("Range", tempStr, 99);
    DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(tempStr, 0);
    XMLString::transcode("root", tempStr, 99);
    DOMDocument*   doc = impl->createDocument(0, tempStr, 0);
    DOMElement*   root = doc->getDocumentElement();
    XMLString::transcode("FirstElement", tempStr, 99);
    DOMElement*   e1 = doc->createElement(tempStr);
    root->appendChild(e1);
    XMLString::transcode("SecondElement", tempStr, 99);
    DOMElement*   e2 = doc->createElement(tempStr);
    root->appendChild(e2);
    XMLString::transcode("aTextNode", tempStr, 99);
    DOMText*       textNode = doc->createTextNode(tempStr);
    e1->appendChild(textNode);
    // optionally, call release() to release the resource associated with the range after done
    DOMRange* range = doc->createRange();
    range->release();
    // removedElement is an orphaned node, optionally call release() to release associated resource
    DOMElement* removedElement = root->removeChild(e2);
    removedElement->release();
    // no need to release this returned object which is owned by implementation
    XMLString::transcode("*", tempStr, 99);
    DOMNodeList*    nodeList = doc->getElementsByTagName(tempStr);
    // done with the document, must call release() to release the entire document resources
    doc->release();
};

…再见。

编辑:

但是我如何将HTML加载到DOMDocument中并对HTML元素进行搜索?这就是我想弄明白的。

XercesDOMParser解析器;

parser.loadGrammar("语法。dtd",语法::DTDGrammarType);

parser.setValidationScheme (XercesDOMParser:: Val_Always);

处理程序处理程序;

解析器。

parser.parse("xmlfile.xml");