自定义支持attribute((format))

Custom support for attribute((format))

本文关键字：format attribute 支持自定义更新时间：2023-10-16

GCC和Clang都支持对变量参数函数(如printf)进行编译时检查。这些编译器接受如下语法:

extern void dprintf(int dlevel, const char *format, ...)
  __attribute__((format(printf, 2, 3)));  /* 2=format 3=params */

在OSX上，Cocoa框架也对NSString使用了扩展:

#define NS_FORMAT_FUNCTION(F,A) __attribute__((format(__NSString__, F, A)))

在我们公司，我们有一个自定义的c++框架，里面有一堆类，比如BaseString，都是从BaseObject派生出来的。在BaseString中有一些类似于sprintf的变量参数方法，但有一些扩展。例如，"%S"期望的参数类型为BaseString*，而"%@"期望的参数类型为BaseObject*。

我想在我们的项目中执行一个参数的编译时检查，但是由于扩展，__attribute__((format(printf)))给出了很多误报警告。

是否有一种方法可以为两个编译器中的一个定制__attribute__((format))的支持?如果这需要对编译器源代码进行补丁，它是否可以在合理的时间内完成?

使用最新版本的GCC(我建议使用4.7或更新版本，但您可以尝试使用GCC 4.6)，您可以通过GCC插件(使用PLUGIN_ATTRIBUTES钩子)或MELT扩展添加自己的变量和函数属性。MELT是一种扩展GCC的领域特定语言(作为[meta-]插件实现)。

如果使用插件(例如MELT)，你不需要重新编译GCC的源代码。但是你需要一个插件支持的GCC(检查gcc -v)。

2020年，MELT不再更新(因为缺乏资金);然而，你可以为GCC 10编写自己的GCC插件，用c++来做这些检查。

^{一些Linux发行版在gcc中不启用插件-请向您的发行版供应商投诉;另一些则提供了一个用于GCC插件开发的包，例如用于Debian或Ubuntu的gcc-4.7-plugin-dev。}

这是可行的，但肯定不容易;问题的部分原因是BaseString和BaseObject是用户定义的类型，因此需要动态定义格式说明符。幸运的是，gcc至少支持这一点，但仍然需要修补编译器。

神奇之处在于gcc/c-family/c-format.c中的handle_format_attribute函数，它为引用用户定义类型的格式说明符调用初始化函数。一个很好的例子是gcc_gfc格式类型，因为它为locus *定义了一个格式说明符%L:

/* This will require a "locus" at runtime.  */
{ "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "R", NULL },

显然，虽然你想要基于print_char_table的format_char_info数组，因为它定义了标准的printf说明符;相比之下，gcc_gfc大大减少了。

添加gcc_gfc的补丁是http://gcc.gnu.org/ml/fortran/2005-07/msg00018.html;从该补丁中应该可以很明显地看出您需要如何以及在哪里进行添加。

在问了这个问题一年半之后，我想出了一个完全不同的方法来解决真正的问题:有没有办法静态检查自定义可变格式语句的类型?

为了完整性，也因为它可以帮助其他人，这里是我最终实现的解决方案。与原来的问题相比，它有两个优点:

相对简单:在不到一天内实现;
编译器独立:可以检查任何平台(Windows, Android, OSX，…)的c++代码。

Perl脚本解析源代码，找到格式化字符串并解码其中的百分比修饰符。然后用调用模板标识函数CheckFormat<>封装所有参数。例子:

str->appendFormat("%hhu items (%.2f %%) from %S processed", 
    nbItems, 
    nbItems * 100. / totalItems, 
    subject);

就变成:

str->appendFormat("%hhu items (%.2f %%) from %S processed", 
    CheckFormat<CFL::u, CFM::hh>(nbItems  ), 
    CheckFormat<CFL::f, CFM::_>(nbItems * 100. / totalItems  ), 
    CheckFormat<CFL::S, CFM::_, const BaseString*>(subject  ));

枚举CFL, CFM和模板函数CheckFormat必须在像这样的通用头文件中定义(这是一个摘录，有大约24个重载)。

enum class CFL
{
    c, d, i=d, star=i, u, o=u, x=u, X=u, f, F=f, e=f, E=f, g=f, G=f, p, s, S, P=S, at
};
enum class CFM
{
    hh, h, l, z, ll, L=ll, _
};
template<CFL letter, CFM modifier, typename T> inline T CheckFormat(T value) { CFL test= value; (void)test; return value; }
template<> inline const BaseString* CheckFormat<CFL::S, CFM::_, const BaseString*>(const BaseString* value) { return value; }
template<> inline const BaseObject* CheckFormat<CFL::at, CFM::_, const BaseObject*>(const BaseObject* value) { return value; }
template<> inline const char* CheckFormat<CFL::s, CFM::_, const char*>(const char* value) { return value; }
template<> inline const void* CheckFormat<CFL::p, CFM::_, const void*>(const void* value) { return value; }
template<> inline char CheckFormat<CFL::c, CFM::_, char>(char value) { return value; }
template<> inline double CheckFormat<CFL::f, CFM::_, double>(double value) { return value; }
template<> inline float CheckFormat<CFL::f, CFM::_, float>(float value) { return value; }
template<> inline int CheckFormat<CFL::d, CFM::_, int>(int value) { return value; }
...

在出现编译错误之后，很容易通过将正则表达式CheckFormat<[^<]*>((.*?) )替换为其捕获来恢复原始形式。

在c++11中，可以通过将__attribute__ ((format))替换为constexpr、decltype和可变参数包的巧妙组合来解决这个问题。将格式字符串传递给constexpr函数，该函数在编译时提取出所有%说明符，并验证第n个说明符是否与(n+1) st参数的decltype匹配。

这是解决方案的草图…

如果你有:

int x = 3;
Foo foo;
my_printf("%d %Qn", x, foo);

您将需要my_printf的宏包装器，使用这里描述的技巧，以获得如下内容:

#define my_printf(fmt, ...) 
{ 
    static_assert(FmtValidator<decltype(makeTypeHolder(__VA_ARGS__))>::check(fmt), 
        "one or more format specifiers do not match their arguments"); 
    my_printf_impl(fmt, ## __VA_ARGS__); 
}

你需要写FmtValidator和makeTypeHolder()。

makeTypeHolder看起来像这样:

    template<typename... Ts> struct TypeHolder {};
    template<typename... Ts>
    TypeHolder<Ts...> makeTypeHolder(const Ts&... args)
    {
        return TypeHolder<Ts...>();
    }

它的目的是创建一个由传递给my_printf()的参数类型唯一确定的类型。然后FmtValidator需要验证这些类型是否与fmt中的%说明符一致。

接下来，需要编写FmtValidator<T>::check()以在编译时提取%说明符(即作为constexpr函数)。这需要一些编译时递归，像这样:

    template<typename... Ts>
    struct FmtValidator;
    // recursion base case
    template<>
    struct FmtValidator<>
    {
        static constexpr bool check(const char* fmt)
        {
            return *fmt == '' ? true :
                    *fmt != '%' ? check(fmt + 1) :
                    fmt[1] == '%' ? check(fmt + 2) : false;
        }
    };
    // recursion
    template<typename T, typename... Ts>
    struct FmtValidator<TypeHolder<T, Ts...>>
    {
        static constexpr bool check(const char* fmt)
        {
            // find the first % specifier in fmt, validate it against T,
            // and then recursively dispatch with Ts... and the remainder of fmt
            ...
        }
    };

针对单个%说明符的单个类型的验证，您可以这样做:

    template<>
    struct specmatch<int>
    {
        static constexpr bool match(const char* c, const char* cend)
        {
            return strmatches(c, cend, "d") ||
                    strmatches(c, cend, "i");
        }
    };
    // add other specmatch specializations for float, const char*, etc.

然后，您可以自由地使用自己的自定义类型编写自己的验证器。

自定义支持__attribute__((format))

Custom support for __attribute__((format))

自定义支持attribute((format))

Custom support for attribute((format))