如何解析无符号字符数组为数值数据

how to parse unsigned char array to numerical data

本文关键字：数值数据数组字符何解析无符号更新时间：2023-10-16

我的问题设置如下:

我有一个源发送UDP数据包到我的接收计算机
接收计算机接收UDP数据包并将其接收到unsigned char *message。

我可以使用

按字节顺序打印数据包

for(int i = 0; i < sizeof(message); i++) {
    printf("0x%02 n", message[i];
}

这就是我在的地方!现在我想开始解析我接收到的这些字节到网络中，作为short, int, long和string。

我写了一系列的函数，比如:

short unsignedShortToInt(char[] c) {
    short i = 0;
    i |= c[1] & 0xff;
    i <<= 8;
    i |= c[0] & 0xff;
   return i;
}

解析字节并将其转换为整型、长型和短型。我可以使用sprintf()从字节数组创建字符串。

我的问题是——从我的大量UDP数据包获得子字符串的最好方法是什么?数据包长度超过100个字符，所以我想用一种简单的方法将message[0:6]或message[20:22]传递给这些变化效用函数。

可能的选项:

我可以使用strcpy()为每个函数调用创建一个临时数组，但这似乎有点混乱。
我可以把整个包变成一个字符串，并使用std::string::substr。这看起来不错，但我担心将无符号字符转换为有符号字符(字符串转换过程的一部分)可能会导致一些错误(也许这种担心是没有根据的?)。
也许另一种方式?

所以我要求你，stackoverflow，推荐一个干净，简洁的方法来完成这个任务!

谢谢!

为什么不使用适当的序列化?

。MsgPack

您需要一个如何区分消息的方案。例如，你可以让它们自描述，像这样:

struct my_message {
  string protocol;
  string data;
};

和基于协议的调度解码。

您最好使用经过测试的序列化库，而不是发现您的系统容易受到缓冲区溢出攻击和故障。

我认为你有两个问题需要解决。首先，您需要确保从字符缓冲区中提取整数数据后在内存中正确对齐。接下来，您需要确保提取后的整数数据的字节顺序正确。

对齐问题可以通过将包含整型数据类型的union叠加在正确大小的字符数组上来解决。网络字节顺序问题可以使用标准的ntohs()和ntohl()函数来解决。只有当发送软件也使用由这些函数的逆生成的标准字节顺序时，这才会起作用。

见:http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html

这里有几个未经测试的函数，你可能会觉得有用。我认为他们应该做你想做的事。

#include <netinet/in.h>
/**
 * General routing to extract aligned integral types
 * from the UDP packet.
 *
 * @param data Pointer into the UDP packet data
 * @param type Integral type to extract
 *
 * @return data pointer advanced to next position after extracted integral.
 */
template<typename Type>
unsigned char const* extract(unsigned char const* data, Type& type)
{
    // This union will ensure the integral data type is correctly aligned
    union tx_t
    {
        unsigned char cdata[sizeof(Type)];
        Type tdata;
    } tx;
    for(size_t i(0); i < sizeof(Type); ++i)
        tx.cdata[i] = data[i];
    type = tx.tdata;
    return data + sizeof(Type);
}
/**
 * If strings are null terminated in the buffer then this could be used to extract them.
 *
 * @param data Pointer into the UDP packet data
 * @param s std::string type to extract
 *
 * @return data pointer advanced to next position after extracted std::string.
 */
unsigned char const* extract(unsigned char const* data, std::string& s)
{
    s.assign((char const*)data, std::strlen((char const*)data));
    return data + s.size();
}
/**
 *  Function to parse entire UDP packet
 *
 * @param data The entire UDP packet data
 */
void read_data(unsigned char const* const data)
{
    uint16_t i1;
    std::string s1;
    uint32_t i2;
    std::string s2;
    unsigned char const* p = data;
    p = extract(p, i1); // p contains next position to read
    i1 = ntohs(i1);
    p = extract(p, s1);
    p = extract(p, i2);
    i2 = ntohl(i2);
    p = extract(p, s2);
}

希望对你有帮助。

编辑:

我已经编辑了示例以包含字符串。这在很大程度上取决于字符串在流中的存储方式。本例假设字符串为以空结尾的c-string。

EDIT2:

哎呀，修改代码接受unsigned字符。

如果数组长度只有100个字符，只需创建一个char buffer[100]和一个queue，这样您就不会错过处理任何消息。

接下来，你可以按照你所描述的对缓冲区进行索引，如果你知道消息的结构体，那么你就知道索引点。

接下来你可以union的类型，如

union myType{
    char buf[4];
    int x;
 }

如果你需要的话，给你一个char类型的整型值