使用boost::asio无堆栈协程通过HTTP下载多个文件

Using boost::asio stackless coroutines to download several files via HTTP

本文关键字:下载 HTTP 文件 程通过 boost asio 堆栈 使用      更新时间:2023-10-16

我将Roberto Ierusalimschy的示例从Programming in Lua翻译为使用协程通过HTTP下载几个文件的c++使用boost::asio和堆栈协程。下面是代码:

#include <iostream>
#include <chrono>
#include <boost/asio.hpp>
#include <boost/asio/spawn.hpp>
using namespace std;
using namespace boost::asio;
io_service ioService;
void download(const string& host, const string& file, yield_context& yield)
{
  clog << "Downloading " << host << file << " ..." << endl;
  size_t fileSize = 0;
  boost::system::error_code ec;
  ip::tcp::resolver resolver(ioService);
  ip::tcp::resolver::query query(host, "80");
  auto it = resolver.async_resolve(query, yield[ec]);
  ip::tcp::socket socket(ioService);
  socket.async_connect(*it, yield[ec]);
  ostringstream req;
  req << "GET " << file << " HTTP/1.0rnrn";
  write(socket, buffer(req.str()));
  while (true)
  {
    char data[8192];
    size_t bytesRead = socket.async_read_some(buffer(data), yield[ec]);
    if (0 == bytesRead) break;
    fileSize += bytesRead;
  }
  socket.shutdown(ip::tcp::socket::shutdown_both);
  socket.close();
  clog << file << " size: " << fileSize << endl;
}
int main()
{
  auto timeBegin = chrono::high_resolution_clock::now();
  vector<pair<string, string>> resources =
  {
    {"www.w3.org", "/TR/html401/html40.txt"},
    {"www.w3.org", "/TR/2002/REC-xhtml1-20020801/xhtml1.pdf"},
    {"www.w3.org", "/TR/REC-html32.html"},
    {"www.w3.org", "/TR/2000/REC-DOM-Level-2-Core-20001113/DOM2-Core.txt"},
  };
  for(const auto& res : resources)
  {
    spawn(ioService, [&res](yield_context yield)
    {
      download(res.first, res.second, yield);
    });
  }
  ioService.run();
  auto timeEnd = chrono::high_resolution_clock::now();
  clog << "Time: " << chrono::duration_cast<chrono::milliseconds>(
            timeEnd - timeBegin).count() << endl;
  return 0;
}

现在我正试图将代码转换为使用来自boost::asio的无堆栈协同程序,但文档不足以让我grok如何以这种方式组织代码才能做到这一点。有人能提供解决方案吗?

这是一个基于Boost提供的无堆栈协程的解决方案。考虑到它们本质上是一种hack,我不认为这个解决方案特别优雅。c++ 20可能会做得更好,但我认为这超出了这个问题的范围。

#include <functional>
#include <iostream>
#include <boost/asio.hpp>
#include <boost/asio/yield.hpp>
using boost::asio::async_write;
using boost::asio::buffer;
using boost::asio::error::eof;
using boost::system::error_code;
using std::placeholders::_1;
using std::placeholders::_2;
/**
 * Stackless coroutine for downloading file from host.
 *
 * The lifetime of the object is limited to one () call. After that,
 * the object will be copied and the old object is discarded. For this
 * reason, the socket_ and resolver_ member are stored as shared_ptrs,
 * so that they can live as long as there is a live copy. An alternative
 * solution would be to manager these objects outside of the coroutine
 * and to pass them here by reference.
 */
class downloader : boost::asio::coroutine {
  using socket_t = boost::asio::ip::tcp::socket;
  using resolver_t = boost::asio::ip::tcp::resolver;
public:
  downloader(boost::asio::io_service &service, const std::string &host,
             const std::string &file)
      : socket_{std::make_shared<socket_t>(service)},
        resolver_{std::make_shared<resolver_t>(service)}, file_{file},
        host_{host} {}
  void operator()(error_code ec = error_code(), std::size_t length = 0,
                  const resolver_t::results_type &results = {}) {
    // Check if the last yield resulted in an error.
    if (ec) {
      if (ec != eof) {
        throw boost::system::system_error{ec};
      }
    }
    // Jump to after the previous yield.
    reenter(this) {
      yield {
        resolver_t::query query{host_, "80"};
        // Use bind to skip the length parameter not provided by async_resolve
        auto result_func = std::bind(&downloader::operator(), this, _1, 0, _2);
        resolver_->async_resolve(query, result_func);
      }
      yield socket_->async_connect(*results, *this);
      yield {
        std::ostringstream req;
        req << "GET " << file_ << " HTTP/1.0rnrn";
        async_write(*socket_, buffer(req.str()), *this);
      }
      while (true) {
        yield {
          char data[8192];
          socket_->async_read_some(buffer(data), *this);
        }
        if (length == 0) {
          break;
        }
        fileSize_ += length;
      }
      std::cout << file_ << " size: " << fileSize_ << std::endl;
      socket_->shutdown(socket_t::shutdown_both);
      socket_->close();
    }
    // Uncomment this to show progress and to demonstrace interleaving
    // std::cout << file_ << " size: " << fileSize_ << std::endl;
  }
private:
  std::shared_ptr<socket_t> socket_;
  std::shared_ptr<resolver_t> resolver_;
  const std::string file_;
  const std::string host_;
  size_t fileSize_{};
};
int main() {
  auto timeBegin = std::chrono::high_resolution_clock::now();
  try {
    boost::asio::io_service service;
    std::vector<std::pair<std::string, std::string>> resources = {
        {"www.w3.org", "/TR/html401/html40.txt"},
        {"www.w3.org", "/TR/2002/REC-xhtml1-20020801/xhtml1.pdf"},
        {"www.w3.org", "/TR/REC-html32.html"},
        {"www.w3.org", "/TR/2000/REC-DOM-Level-2-Core-20001113/DOM2-Core.txt"},
    };
    std::vector<downloader> downloaders{};
    std::transform(resources.begin(), resources.end(),
                   std::back_inserter(downloaders), [&](auto &x) {
                     return downloader{service, x.first, x.second};
                   });
    std::for_each(downloaders.begin(), downloaders.end(),
                  [](auto &dl) { dl(); });
    service.run();
  } catch (std::exception &e) {
    std::cerr << "exception: " << e.what() << "n";
  }
  auto timeEnd = std::chrono::high_resolution_clock::now();
  std::cout << "Time: "
            << std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd -
                                                                     timeBegin)
                   .count()
            << std::endl;
  return 0;
}

使用Boost 1.72和g++ -lboost_coroutine -lpthread test.cpp编译。示例输出:

$ ./a.out 
/TR/REC-html32.html size: 606
/TR/html401/html40.txt size: 629
/TR/2002/REC-xhtml1-20020801/xhtml1.pdf size: 115777
/TR/2000/REC-DOM-Level-2-Core-20001113/DOM2-Core.txt size: 229699
Time: 1644

()函数末尾的日志行可以取消注释,以演示下载的交错。