在D2中读取字节的最快方式

Fastest way of reading bytes in D2

本文关键字:方式 字节 D2 读取      更新时间:2023-10-16

我希望尽可能快地从文件读取单个字节到D2应用程序。应用程序需要一个字节一个字节,所以读取更大的数据块不是阅读器接口的选项。

为此,我用c++, Java, D2创建了一些简单的实现:https://github.com/gizmomogwai/performance.

正如您所看到的,我尝试了普通读取、应用程序代码中的缓冲区和内存映射文件。对于我的使用,内存映射解决方案效果最好,但奇怪的是,D2比java慢。我本来希望D2在c++和Java之间着陆(c++代码用-O3 -g编译,D2代码用-O -release编译)。

所以请告诉我我在这里做错了什么,以及如何加快D2的实现。

为了给你一个用例的概念,这里是一个c++实现:

class StdioFileReader {
private:
  FILE* fFile;
  static const size_t BUFFER_SIZE = 1024;
  unsigned char fBuffer[BUFFER_SIZE];
  unsigned char* fBufferPtr;
  unsigned char* fBufferEnd;
public:
  StdioFileReader(std::string s) : fFile(fopen(s.c_str(), "rb")), fBufferPtr(fBuffer), fBufferEnd(fBuffer) {
    assert(fFile);
  }
  ~StdioFileReader() {
    fclose(fFile);
  }
  int read() {
    bool finished = fBufferPtr == fBufferEnd;
    if (finished) {
      finished = fillBuffer();
      if (finished) {
    return -1;
      }
    }
    return *fBufferPtr++;
  }
private:
  bool fillBuffer() {
    size_t l = fread(fBuffer, 1, BUFFER_SIZE, fFile);
    fBufferPtr = fBuffer;
    fBufferEnd = fBufferPtr+l;
    return l == 0;
  }
};
size_t readBytes() {
  size_t res = 0;
  for (int i=0; i<10; i++) {
    StdioFileReader r("/tmp/shop_with_ids.pb");
    int read = r.read();
    while (read != -1) {
      ++res;
      read = r.read();
    }
  }
  return res;
}

与D中的"相同"解决方案相比要快得多:

struct FileReader {
  private FILE* fFile;
  private static const BUFFER_SIZE = 8192;
  private ubyte fBuffer[BUFFER_SIZE];
  private ubyte* fBufferPtr;
  private ubyte* fBufferEnd;
  public this(string fn) {
    fFile = std.c.stdio.fopen("/tmp/shop_with_ids.pb", "rb");
    fBufferPtr = fBuffer.ptr;
    fBufferEnd = fBuffer.ptr;
  }
  public int read(ubyte* targetBuffer) {
    auto finished = fBufferPtr == fBufferEnd;
    if (finished) {
      finished = fillBuffer();
      if (finished) {
        return 0;
      }
    }
    *targetBuffer = *fBufferPtr++;
    return 1;
  }
  private bool fillBuffer() {
    fBufferPtr = fBuffer.ptr;
    auto l = std.c.stdio.fread(fBufferPtr, 1, BUFFER_SIZE, fFile);
    fBufferEnd = fBufferPtr + l;
    return l == 0;
  }
}
size_t readBytes() {
  size_t count = 0;
  for (int i=0; i<10; i++) {
    auto reader = FileReader("/tmp/shop_with_ids.pb");
    ubyte buffer[1];
    ubyte* p = buffer.ptr;
    auto c = reader.read(p);
    while (1 == c) {
      ++count;
      c = reader.read(p);
    }
  }
  return count;
}

很可能是因为sfread。没有人保证它在D和C中做同样的事情——你很可能使用完全不同的CRT(除非你使用Digital Mars c++编译器?)。

这意味着库可能正在做一些事情,比如同步等,这会减慢速度。唯一可以知道的方法是强制 D使用与C相同的库,通过告诉链接器链接到相同的库。

在你能做到这一点之前,你是在拿苹果和橘子作比较。如果不可能,那么直接从调用操作系统,然后比较结果——这样可以保证底层调用对两个都是相同的。

如果您使用std.stdio模块会发生什么:

import std.stdio;
struct FileReader {
  private File fFile;
  private enum BUFFER_SIZE = 8192;//why not enum?
  private ubyte[BUFFER_SIZE] fBuffer=void;//avoid (costly) initialization to 0
  private ubyte[] buff;
  public this(string fn) {
    fFile = File("/tmp/shop_with_ids.pb", "rb");
  }
  /+
  public ~this(){//you really should have been doing this if you used std.c.stdio.fopen
                 //but it's unnecessary for std.stdio's File (it's ref counted)
    fFile.close();
  }
  +/
  public int read(out ubyte targetBuffer) {
    auto finished = buff.length==0;
    if (finished) {
      finished = fillBuffer();
      if (finished) {
        return 0;
      }
    }
    targetBuffer = buff[0];
    buff = buff[1..$];
    return 1;
  }
  private bool fillBuffer() {
    if(!fFile.isOpen())return false;
    buff = fFile.rawRead(fBuffer[]);
    return buff.length>0;
  }
}
size_t readBytes() {
  size_t count = 0;
  for (int i=0; i<10; i++) {
    auto reader = FileReader("/tmp/shop_with_ids.pb");
    ubyte buffer;
    auto c = reader.read(buffer);
    while (1 == c) {
      ++count;
      c = reader.read(buffer);
    }
  }
  return count;
}

如果你想要真正的速度比较,你应该使用-release -O -inline(这关闭调试(主要是数组OOB检查)优化并内联它可以)(当然也类似于c++的解决方案)