正在从文件中读取位

Reading bits from file

本文关键字：读取文件更新时间：2023-10-16

例如，我可以使用从文件中读取4个字节

ifstream r(filename , ios::binary | ios::in)
uint_32 readHere;
r.read( (char*)&readHere, 4 )

但是我怎么能读取4.5字节=4字节和4位。

我想到的是

ifstream r(filename , ios::binary | std::in)
uint_64t readHere;
r.read( (char*)&readHere, 5 ) // reading 5 bytes ;
uint_64t tmp = readHere & 11111111 // extract 5th bytes
tmp = tmp >> 4  // get first half of the bites
readHere = (( readHere >> 8 ) << 8) | tmp     // remove 5th byte then add 4 bits

但我不确定我应该如何取半个字节，如果是第一个还是最后一个4。有没有更好的方法来取回它？

无论是在文件中还是在内存中，您都可以读取或写入的最小单元是char（在通用系统中是一个字节（*））。您可以按字节浏览较长的元素，有效地说，这里的端序很重要。

uint32_t u = 0xaabbccdd;
char *p = static_cast<char *>(&u);
char c = p[0];    // c is 0xdd on a little endian system and 0xaa on a big endian one

但是，一旦你在字节内，你所能做的就是使用逐位和和移位来提取低位或高位。这里不再有endianness，除非您决定使用一个约定。

顺便说一句，如果你在网络接口上读取，甚至在单独传输位的串行线上读取，你一次只能读取一个完整的字节，并且无法在一次读取中只读取4位，在下一次读取时读取其他4位。

（*）旧的系统（80年代的CDC）过去每个字符有6位，但当时C++还不存在，我不确定那里是否存在C编译器。

目前还不清楚这是您控制的文件格式，还是其他文件格式。无论如何，让我们假设您有一些整数数据类型，可以保存36位无符号值：

typedef uint64_t u36;

现在，无论您的系统使用的是大端序还是小端序，您都可以通过一次一个字节的操作，以可预测的顺序将值写入二进制流。让我们使用big-endian，因为将比特组合在一起创建值稍微容易一些。

您可以使用简单的移位和掩蔽到一个小缓冲区中。唯一要决定的是在哪里截断半字节。但是，如果你遵循将每个值再移位8位的模式，那么余数自然会落在高位。

ostream & write_u36( ostream & s, u36 val )
{
    char bytes[5] = {
        (val >> 28) & 0xff,
        (val >> 20) & 0xff,
        (val >> 12) & 0xff,
        (val >> 4 ) & 0xff,
        (val << 4 ) & 0xf0
    };
    return s.write( bytes, 5 );
}

但这并不是你写这些数字的方式。你必须把第5个字节拖到完成，或者你可以把下一个值打包进去。或者你总是一次写两个值：

ostream & write_u36_pair( ostream & s, u36 a, u36 b )
{
    char bytes[9] = {
        (a >> 28) & 0xff,
        (a >> 20) & 0xff,
        (a >> 12) & 0xff,
        (a >> 4 ) & 0xff,
        (a << 4 ) & 0xf0 | (b >> 32) & 0x0f,
        (b >> 24) & 0xff,
        (b >> 16) & 0xff,
        (b >> 8) & 0xff,
        b & 0xff
    };
    return s.write( bytes, 9 );
}

因此，现在，您可能会看到如何读取值并将其反序列化为整数。最简单的方法是一次读两本。

istream & read_u36_pair( istream & s, u36 & a, u36 & b )
{
    char bytes[9];
    if( s.read( bytes, 9 ) )
    {
        a = (u36)bytes[0] << 28
          | (u36)bytes[1] << 20
          | (u36)bytes[2] << 12
          | (u36)bytes[3] << 4
          | (u36)bytes[4] >> 4;
        b = ((u36)bytes[4] & 0x0f) << 32
          | (u36)bytes[5] << 24
          | (u36)bytes[6] << 16
          | (u36)bytes[7] << 8
          | (u36)bytes[8];
    }
    return s;
}

如果你想一次读取一个，你需要跟踪一些状态，这样你就知道要读取多少字节（5或4），以及要应用哪些移位操作。像这样天真的东西：

struct u36deser {
    char bytes[5];
    int which = 0;
};
istream & read_u36( istream & s, u36deser & state, u36 & val )
{
    if( state.which == 0 && s.read( state.bytes, 5 ) )
    {
        val = (u36)state.bytes[0] << 28
            | (u36)state.bytes[1] << 20
            | (u36)state.bytes[2] << 12
            | (u36)state.bytes[3] << 4
            | (u36)state.bytes[4] >> 4;
         state.which = 1;
    }
    else if( state.which == 1 && s.read( state.bytes, 4 ) )
    {
        val = ((u36)state.bytes[4] & 0x0f) << 32  // byte left over from previous call
            | (u36)state.bytes[0] << 24
            | (u36)state.bytes[1] << 16
            | (u36)state.bytes[2] << 8
            | (u36)state.bytes[3];
        state.which = 0;
    }
    return s;
}

所有这些都是纯粹的假设，这似乎是你问题的重点。还有许多其他串行化位的方法，其中一些方法并不明显。