数据被添加到curl检索的内容中

Data gets added to curl's retrieved content

本文关键字:检索 curl 添加 数据      更新时间:2023-10-16

我使用带有C++的CURL来获取网站的源代码,我将内容放在带有函数的字符串中,但我得到了额外的数据(0和几行新行)

这是我的代码(这不是全部代码,因为项目有点大)

这是获取内容/将其放入字符串的功能

size_t writefunc(void *ptr, size_t size, size_t nmemb, string pContent)
{
    pContent += (char *)ptr;
    return size*nmemb;
}

下面是我如何初始化CURL对象

string Content;
CURL *pCURL = curl_easy_init();
if(!pCURL)
{
    cout << "Couldn't create a curl object" << endl;
    return 0;
}
curl_easy_setopt(pCURL, CURLOPT_WRITEFUNCTION, writefunc);
curl_easy_setopt(pCURL, CURLOPT_FOLLOWLOCATION, true);
curl_easy_setopt(pCURL, CURLOPT_COOKIEJAR, "cookie_file.txt");
curl_easy_setopt(pCURL, CURLOPT_WRITEDATA, &Content);
curl_easy_setopt(pCURL, CURLOPT_POST, true);

在OSX El Capitan 10.11.4-Xcode 7.3上测试。作品

--

注意:如果您需要SSL连接,只需添加#define USE_SSL并更改验证(CURLOPT_SSL_VERIFYPEERCURLOPT_SSL_VERIFYHOST)(如果您需要确保对等方或主机具有正确的证书)。

您也不需要我在下面的代码中指定的很多选项。。

编辑:我发现您的代码中有问题。您正在执行POST请求。您真正想要的是GET请求,因为您想要获得网页的来源。

//
//  main.cpp
//  TestCurl
//
//  Created by Brandon T on 2016-04-21.
//  Copyright © 2016 XIO. All rights reserved.
//
#include <iostream>
#include <curl/curl.h>

size_t writefunc(void *contents, size_t size, size_t nmemb, void *userp)
{
    std::string *page_source = static_cast<std::string *>(userp);
    if (page_source)
    {
        page_source->append(static_cast<char *>(contents), size * nmemb);
    }
    return size * nmemb;
}
int main(int argc, const char * argv[])
{
    std::string page_url = "http://stackoverflow.com/questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1#comment61096734_36757217";
    std::string user_agent = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36";
    std::string page_source;

    CURL *curl_handle = curl_easy_init();
    if (curl_handle)
    {
        curl_easy_setopt(curl_handle, CURLOPT_FAILONERROR, 1L);
        curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, user_agent.c_str());
        curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, writefunc);
        curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &page_source);
        curl_easy_setopt(curl_handle, CURLOPT_AUTOREFERER, 1L);
        #ifdef USE_SSL
        curl_easy_setopt(curl_handle, CURLOPT_USE_SSL, CURLUSESSL_TRY);
        curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYPEER, 0L);  //2L
        curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYHOST, 0L);  //2L
        #endif
        curl_easy_setopt(curl_handle, CURLOPT_COOKIEJAR, "cookies.txt");
        curl_easy_setopt(curl_handle, CURLOPT_COOKIEFILE, "cookies.txt");
        curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
        curl_easy_setopt(curl_handle, CURLOPT_URL, page_url.c_str());
        curl_easy_setopt(curl_handle, CURLOPT_UPLOAD, 0L);
        curl_easy_setopt(curl_handle, CURLOPT_POST, 0L);
        CURLcode res = curl_easy_perform(curl_handle);
        if (res != CURLE_OK)
        {
            std::string error_message = curl_easy_strerror(res);
            curl_easy_cleanup(curl_handle);
            std::cerr << error_message;
            return 0;
        }
        curl_easy_cleanup(curl_handle);
        std::cout << page_source;
    }

    return 0;
}

结果:

*   Trying 104.16.35.249...
* Connected to stackoverflow.com (104.16.35.249) port 80 (#0)
> GET /questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1 HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36
Accept: */*
Cookie: __cfduid=ddbf5d3e848c27dcbc1fded421106e2311461286187; prov=b57e8199-4ea1-4ad3-a9bb-cad71f707835
< HTTP/1.1 200 OK
< Date: Fri, 22 Apr 2016 00:59:16 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=60
< Expires: Fri, 22 Apr 2016 01:00:16 GMT
< Last-Modified: Fri, 22 Apr 2016 00:59:16 GMT
< Vary: *
< X-Frame-Options: SAMEORIGIN
< X-Request-Guid: 36d372e7-2da8-4c7f-ab22-ecd8cb96fa39
< Server: cloudflare-nginx
< CF-RAY: 297521d23dac016a-ORD
< 
* Connection #0 to host stackoverflow.com left intact
<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/QAPage">
<head>

加上此页面的源代码。