使用带有std::set的自定义比较器

using a custom comparator with std::set

本文关键字：set 自定义比较器 std 更新时间：2023-10-16

我正试图创建一个按长度排列的从文件中读取的单词列表。为此，我尝试使用带有自定义比较器的std:：set。

class Longer {
 public:
  bool operator() (const string& a, const string& b)
    { return a.size() > b.size();}
};
set<string, Longer> make_dictionary (const string& ifile){
  // produces a map of words in 'ifile' sorted by their length
  ifstream ifs {ifile};
  if (!ifs) throw runtime_error ("couldn't open file for reading");
  string word;
  set<string, Longer> words;
   while (ifs >> word){
     strip(word);
     tolower(word);
     words.insert(word);
 }
 remove_plurals(words);
 if (ifs.eof()){       
   return words;
  }
  else
    throw runtime_error ("input failed");
}

由此，我期望一个文件中所有单词的列表按照它们的长度排列。相反，我得到了一个非常短的列表，输入中每个长度只出现一个单词：

polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a

知道这里发生了什么吗？

使用比较器，等长单词是等价的，并且在一个集合中不能有重复的等价条目。

要维护多个单词，您应该修改比较器，使其在长度相同的情况下也执行字典比较。

您的比较器仅按长度进行比较，这意味着大小相同但不同的字符串被std::set视为等效字符串。（如果a < b和b < a都不为真，则std::set对它们一视同仁，<是您的自定义比较器函数。）

这意味着您的比较器还应该考虑字符串内容，以避免出现这种情况。这里的关键词是词典学比较，这意味着你要考虑多个比较标准。第一个标准将是字符串长度，第二个标准是字符串本身。编写字典比较的一种简单方法是使用std::tuple，它提供了一个比较运算符，通过重载operator<来对组件执行字典比较。

为了使您用operator>编写的长度"反向"排序与通常使用的operator<兼容，只需取字符串的负大小，即首先将a.size() > b.size()重写为-a.size() < -b.size()，然后将其与字符串本身组成元组，最后将元组与<:进行比较

class Longer {
public:
    bool operator() (const string& a, const string& b)
    {
        return std::make_tuple(-a.size(),  a )
             < std::make_tuple(-b.size(),  b );
        //                     ^^^^^^^^^  ^^^
        //                       first    second
        //                   criterion    criterion
    }
};