C++ 如何从文件创建 byte[] 数组(我不是说逐字节读取文件)?

C++ How to create byte[] array from file (I don't mean reading file byte by byte)?

本文关键字:字节 读取 文件 文件创建 byte 数组 C++      更新时间:2023-10-16

我有一个问题,我既不能解决自己也找不到答案。我有一个文件包含这样一个字符串:

01000000 d08c9ddf0115d1118c7a00c04

我想读取文件的方式,我将手动这样做:

char fromFile[] =
"x01x00x00x00xd0x8cx9dxdfx011x5dx11x18xc7xa0x0cx04";

我很感激你的帮助。

我想用c++来做(最好是vc++)。

谢谢!

int t194(void)
{
   // imagine you have n pair of char, for simplicity, 
   // here n is 3 (you should recognize them)
   char pair1[] = "01"; // note:
   char pair2[] = "8c"; //   initialize with 3 char c-style strings
   char pair3[] = "c7"; //
   {
      // let us put these into a ram based stream, with spaces
      std::stringstream ss;
      ss << pair1 << " " << pair2 << " " << pair3;
      // each pair can now be extracted into
      // pre-declared int vars
      int i1 = 0;
      int i2 = 0;
      int i3 = 0;
      // use formatted extractor to convert
      ss >> i1 >> i2 >> i3;
      // show what happened (for debug only)
      std::cout << "Confirm1:"  << std::endl;
      std::cout << "i1: " << i1 << std::endl;
      std::cout << "i2: " << i2 << std::endl;
      std::cout << "i3: " << i3 << std::endl << std::endl;
      // output is:
      // Confirm1:
      // i1: 1
      // i2: 8
      // i3: 0
      // Shucks, not correct.
      // We know the default radix is base 10
      // I hope you can see that the input radix is wrong,
      // because c is not a decimal digit, 
      // the i2 and i3 conversions stops before the 'c'
   }
   // pre-delcare
   int i1 = 0;
   int i2 = 0;
   int i3 = 0;
   {
      // so we try again, with radix info added
      std::stringstream ss;
      ss << pair1 << " " << pair2 << " " << pair3; 
      // strings are already in hex, so we use them as is
      ss >> std::hex  // change radix to 16
         >> i1 >> i2 >> i3;
      // now show what happened
      std::cout << "Confirm2:"  << std::endl;
      std::cout << "i1: " << i1 << std::endl;
      std::cout << "i2: " << i2 << std::endl;
      std::cout << "i3: " << i3 << std::endl << std::endl;
      // output now:
      // i1: 1
      // i2: 140
      // i3: 199
      // not what you expected?  Though correct, 
      // now we can see we have the wrong radix for output
      // add output radix to cout stream
      std::cout << std::hex  // add radix info here!
                << "i1: " << i1 << std::endl
                // Note: only need to do once for std::cout
                << "i2: " << i2 << std::endl
                << "i3: " << i3 << std::endl << std::endl
                << std::dec;
      // output now looks correct, and easily comparable to input
      // i1: 1
      // i2: 8c
      // i3: c7
      // So: What next?
      //     read the entire string of hex input into a single string
      //     separate this into pairs of chars (perhaps using    
      //       string::substr())
      //     put space separated pairs into stringstream ss
      //     extract hex values until ss.eof()
      //     probably should add error checks
      // and, of course, figure out how to use a loop for these steps
      //
      // alternative to consider:
      //   read 1 char at a time, build a pairing, convert, repeat 
   }

   //
   // Eventually, you should get far enough to discover that the
   // extracts I have done are integers, but you want to pack them
   // into an array of binary bytes.
   //
   // You can go back, and recode to extract bytes (either 
   // unsigned char or uint8_t), which you might find interesting.
   //
   // Or ... because your input is hex, and the largest 2 char
   // value will be 0xff, and this fits into a single byte, you
   // can simply static_cast them (I use unsigned char)
   unsigned char bin[] = {static_cast<unsigned char>(i1),
                          static_cast<unsigned char>(i2),
                          static_cast<unsigned char>(i3) };
   // Now confirm by casting these back to ints to cout
   std::cout << "Confirm4:  "
             << std::hex << std::setw(2) << std::setfill('0')
             << static_cast<int>(bin[0]) << " "
             << static_cast<int>(bin[1]) << " "
             << static_cast<int>(bin[2]) << std::endl;
   // you also might consider a vector (and i prefer uint8_t)
   // because push_back operations does a lot of hidden work for you
   std::vector<uint8_t>  bytes;
   bytes.push_back(static_cast<uint8_t>(i1));
   bytes.push_back(static_cast<uint8_t>(i2));
   bytes.push_back(static_cast<uint8_t>(i3));
   // confirm
   std::cout << "nConfirm5:  ";
   for (size_t i=0; i<bytes.size(); ++i)
      std::cout << std::hex << std::setw(2) << std::setfill(' ')
                << static_cast<int>(bytes[i]) << " ";
   std::cout << std::endl;

注意:字节或字符的cout(或ss)可能令人困惑,并不总是给出您期望的结果。我的背景是嵌入式软件,而且我在处理字节流I/o方面的经验少得惊人。在处理流i/o时,这一点往往会使我的工作产生偏差。

   // other considerations:
   //
   // you might read 1 char at a time.  this can simplify
   //    your loop, possibly easier to debug
   //    ... would you have to detect and remove eoln?  i.e.  'n'
   //    ... how would you handle a bad input 
   //        such as not hex char, odd char count in a line
   //
   // I would probably prefer to use getline(),
   //    it will read until eoln(), and discard the 'n'
   //    then in each string, loop char by char, creating char pairs, etc.
   //
   // Converting a vector<uint8_t> to char bytes[] can be an easier
   //    effort in some ways.  A vector<> guarantees that all the values
   //    contained are 'packed' back-to-back, and contiguous in
   //    memory, just right for binary stream output
   //
   //    vector.size() tells how many chars have been pushed
   //
   // NOTE: the formatted 'insert' operator ('<<') can not
   //       transfer binary data to a stream.  You must use
   //       stream::write() for binary output.
   //
   std::stringstream ssOut;
   // possible approach:
   // 1 step reinterpret_cast 
   // - a binary block output requires "const char*"
   const char* myBuff = reinterpret_cast<const char*>(&myBytes.front());
   ssOut.write(myBuff, myBytes.size()); 
   // block write puts binary info into stream
   // confirm
   std::cout << "nConfirm6:  ";
   std::string s = ssOut.str();  // string with binary data
   for (size_t i=0; i<s.size(); ++i)
   {
      // because binary data is _not_ signed data, 
      // we need to 'cancel' the sign bit
      unsigned char ukar = static_cast<unsigned char>(s[i]); 
      // because formatted output would interpret some chars 
      //   (like null, or n), we cast to int
      int  intVal = static_cast<int>(ukar);  
      // cast does not generate code
      // now the formatted 'insert' operator 
      // converts and displays what we want
      std::cout << std::hex << std::setw(2) << std::setfill('0') 
                << intVal << " ";
   }
   std::cout << std::endl;
   //
   //
   return (0);
} // int t194(void)

下面的代码片段应该很有帮助!

std::ifstream input( "filePath", std::ios::binary );
std::vector<char> hex((
        std::istreambuf_iterator<char>(input)), 
        (std::istreambuf_iterator<char>()));
std::vector<char> bytes;
for (unsigned int i = 0; i < hex.size(); i += 2) {
    std::string byteString = hex.substr(i, 2);
    char byte = (char) strtol(byteString.c_str(), NULL, 16);
    bytes.push_back(byte);
}
char* byteArr = bytes.data()

我理解你的问题的方式是,你只想要数字的二进制表示,即删除ascii(或ebcdic)部分。输出数组的长度将是输入数组的一半。

这是一些粗糙的伪代码。

对于每个输入字符c:

if (isdigit(c)) c -= '0';
else if (isxdigit(c) c -= 'a' + 0xa;  //Need to check for isupper or islower)

然后,根据输入数组中c的索引:

if (! index % 2) output[outputindex] = (c << 4) & 0xf0;
else output[outputindex++] = c & 0x0f;

这是一个函数,它接受描述中的字符串,并输出每个数字前面都有x的字符串。

#include <iostream>
#include <algorithm>
#include <string>
std::string convertHex(const std::string& str)
{
    std::string retVal;
    std::string hexPrefix = "\x";
    if (!str.empty())
    {
        std::string::const_iterator it = str.begin();
        do
        {
            if (std::distance(it, str.end()) == 1)
            {
                 retVal += hexPrefix + "0";
                 retVal += *(it);
                 ++it;
            }
            else
            {
                retVal += hexPrefix + std::string(it, it+2);
                it += 2;
            }
        } while (it != str.end());
    }
    return retVal;
}
using namespace std;
int main()
{
     cout << convertHex("01000000d08c9ddf0115d1118c7a00c04") << endl;
     cout << convertHex("015d");
}
输出:

x01x00x00x00xd0x8cx9dxdfx01x15xd1x11x8cx7ax00xc0x04
x01x5d

基本上就是一个do-while循环。字符串是由遇到的每一对字符组成的。如果剩下的字符数为1(意味着只有一个数字),则在数字前面添加一个"0"。

我想我会使用代理类来读取和写入数据。不幸的是,所涉及的操纵符的代码有点冗长(说得委婉点)。

#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
struct byte {
    unsigned char ch;
    friend std::istream &operator>>(std::istream &is, byte &b) {
        std::string temp;
        if (is >> std::setw(2) >> std::setprecision(2) >> temp) 
            b.ch = std::stoi(temp, 0, 16);       
        return is;
    }
    friend std::ostream &operator<<(std::ostream &os, byte const &b) {
        return os << "\x" << std::setw(2) << std::setfill('0') << std::setprecision(2) << std::hex << (int)b.ch;
    }
};
int main() {
    std::istringstream input("01000000d08c9ddf115d1118c7a00c04");
    std::ostringstream result;
    std::istream_iterator<byte> in(input), end;
    std::ostream_iterator<byte> out(result);
    std::copy(in, end, out);
    std::cout << result.str();
}

我真的不喜欢冗长的操纵符,但除此之外,它看起来很干净。

您可以尝试使用fscanf进行循环

unsigned char b;    
fscanf(pFile, "%2x", &b);

编辑:

#define MAX_LINE_SIZE 128
FILE* pFile = fopen(...);
char fromFile[MAX_LINE_SIZE] = {0};
char b = 0;
int currentIndex = 0;
while (fscanf (pFile, "%2x", &b) > 0 && i < MAX_LINE_SIZE)
    fromFile[currentIndex++] = b;