提取两个单词之间的域

extract domain between two words

本文关键字：单词之间两个提取更新时间：2023-10-16

我在日志文件中有一些行，如下所示：

11-test.domain1.com已登录。。。

37-user1.users.main2.org已登录。。。

48-me.server.domain3.net已登录。。。

如何在没有子域的情况下提取每个域？介于"-"answers"已记录"之间。

我在c++（linux）中有以下代码，但它提取得不好。当然，如果你有一些例子的话，一些返回提取字符串的函数会很好。

       regex_t    preg;
       regmatch_t mtch[1];
       size_t     rm, nmatch;
       char tempstr[1024] = "";
       int start;
       rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
       nmatch = 1;
       while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
               {
                 strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
                 printf("%sn", tempstr);
                 start +=mtch[0].rm_eo;
                 memset(host, '', strlen(host));
               }
       regfree(&preg);

谢谢！

附言：不，我不能使用perl，因为这部分是在其他人制作的一个更大的c程序中。

编辑：

我用以下代码替换代码：

   const char *p1 = strstr(buffer, "-")+1;
   const char *p2 = strstr(p1, " Logged");
   size_t len = p2-p1;
   char *res = (char*)malloc(sizeof(char)*(len+1));
   strncpy(res, p1, len);
   res[len] = '';

这很好地提取了包括子域在内的整个域。如何从abc.def.domain.com中提取domain.com或domain.net？

strtok是一个好的选择吗？我如何计算最后一个点？

#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
    boost::regex re(".+-(?<domain>.+)\s*Logged");
    std::string examples[] = 
    {
        "11-test.domain1.com Logged ...",
        "37-user1.users.domain2.org Logged ..."
    };
    std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
    std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
    {
        boost::smatch match;
        if (boost::regex_search(s, match, re))
        {
            std::cout << match["domain"] << std::endl;
        }
    });
}

http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5类似于boost:：regex。不知道pcre。

是否为标准格式？看起来是这样的，有分裂函数吗？

编辑：这里有一些逻辑。遍历要解析的每个域查找函数以定位第一个字符串"-"的索引接下来查找第二个字符串的索引减去第一个字符串"Logged"现在您拥有完整的域。

一旦你有了完整的域"拆分"域到你选择的对象（我使用了一个数组）现在您已经将数组分解，找到要重新组装（连接）的值的索引，以仅捕获域。

注释用C#编写

定义第一个值和第二个值的主要方法

`static void Main（string[]args）{string firstValue="-"；string secondValue="已记录"；列表域=新列表｛"11-test.domain1.com已登录"、"37-user1.users.domain2.org已登录"answers"48-me.server.domain3.net已登录"｝；foreach（域中的字符串dns）{Debug.WriteLine（Utility.GetStringBetweenFirstAndSecond（dns，firstValue，secondValue））；}}`

解析字符串的方法：

`public string GetStringBetweenFirstAndSecond（字符串str，字符串firstStringToFind，字符串secondStringToFind）{string domain=字符串。空的if（string.IsNullOrEmpty（str））{//抛出异常，优雅地返回，无论你决定什么}其他的{//这一切都可以在一行中完成，但我把它分解开来，以便更好地理解。//返回第一次出现。//int start=str.IndexOf（firstStringToFind）+1；//int end=str.IndexOf（secondStringToFind）；//domain=str.Substring（开始，结束-开始）；//即绝对不太清晰，但不会产生不必要的物体domain=str.Substring（（str.IndexOf（firstStringToFind）+1），str.Index Of（secondStringToFind）-（str.IIndexOf；string[]dArray=域。拆分（'.'）；如果（dArray.Length>0）{如果（dArray.Length>2）{domain=字符串。格式（"｛0｝.｛1｝"，dArray[dArray.Length-2]，dArray[dArray.Length-1]）；}}}返回域；}`