python+libclang;来回迭代:将字段注释绑定到字段

python + libclang; iterating back and forth: binding field comments to the field

本文关键字:字段 注释 绑定 迭代 python+libclang      更新时间:2023-10-16

我花了一段时间才找到一种合适的方法,使用libclang 3.9.1和python 3.5.2将C++结构中的字段绑定到其注释。

到目前为止,我已经启动并运行了此设置:假设我有文件Foo.h:

typedef int arbType;
struct Foo {
//First bar comment
//Second bar comment
int Bar; //Third bar comment - after bar
/* First line baz comment - before baz
Second line baz comment - before baz
*/
arbType Baz; //Third line baz comment - after baz
};

我的python代码只提取内联注释:

#bind_comments.py
import clang.cindex
def get_cur_comments(cursor):
comment = ''
print ('nGetting comment for:', cursor.spelling.decode())
parent_cur = cursor.lexical_parent
token_iter = parent_cur.get_tokens()
for token in token_iter:
if token.cursor == cursor:            
while token.kind.name != 'PUNCTUATION':
token = next(token_iter)
token = next(token_iter)
if token.kind.name == 'COMMENT':
comment = token.spelling.decode().strip('/')
return comment
def main():
index = clang.cindex.Index.create()
tu = index.parse(b'Foo.h', [b'-x', b'c++'])
tu_iter = tu.cursor.get_children()
next(tu_iter)
root_cursor = next(tu_iter)
for cur in root_cursor.type.get_fields():
print(get_cur_comments(cur))
if __name__ == '__main__':
main()

输出:

C:>bind_comments.py
Getting comment for: Bar
'Third bar comment - after bar'
Getting comment for: Baz
'Third line baz comment - after baz'

现在,对于我的问题,按重要性排序,按降序排列:

  1. 如何在字段之前绑定注释?我查看了python中的许多"窥视"解决方案,以便在迭代令牌时发现下一个是否是我感兴趣的游标(字段),但在我的情况下没有发现任何可以正确实现的解决方案。为了向你展示我有多认真,以下是我研究的一些解决方案:

    • SO Q:如何锁定蟒蛇生成器中的元素
    • 代码配方:在迭代过程中向前看一项
    • 只是另一个代码配方:提前查看迭代器
  2. 概念缺陷:我还不知道如何区分:

    struct Foo {
    int Bar; // This comment belong to bar
    // As well as this one
    // While this comment belong to baz already
    int Baz;
    };
    
  3. 性能问题:请注意,对于每个字段,我都在迭代it结构的整个令牌列表。如果它是一个大的,并且我有很多代币-我想这会花掉我的钱。我想找到一些捷径。。我考虑过将标记保存在全局列表中,但如果字段是另一个结构/类的声明呢?将其父母的令牌添加到列表中?这开始变得一团糟

只是那些还不知道libclang的人的助手:

>>> print(root_cursor.spelling.decode())
Foo
>>> root_cursor.type.get_fields()
<list_iterator object at 0x0177B770>
>>> list(root_cursor.type.get_fields())
[<clang.cindex.Cursor object at 0x0173B940>, <clang.cindex.Cursor object at 0x017443A0>]
>>> for cur in root_cursor.type.get_fields():
...   print (cur.spelling.decode())
...
Bar
Baz
>>> root_cursor.get_tokens()
<generator object TokenGroup.get_tokens at 0x01771180>

libclang直接支持使用Cursor属性brief_commentraw_comment提取javadoc风格的注释

对您的输入代码进行一点调整:

s = '''
typedef int arbType;
struct Foo {
/// Brief comment about bar
///
/// Extra Text about bar
int Bar; 
/** Brief comment about baz
*
* Extra Text about baz
*/
arbType Baz; 
/// Brief only comment
int blah;
};
'''
import clang.cindex
from clang.cindex import CursorKind
idx = clang.cindex.Index.create()
tu = idx.parse('tmp.cpp', args=['-std=c++11'],  unsaved_files=[('tmp.cpp', s)],  options=0)
for c in tu.cursor.walk_preorder():
if c.kind == CursorKind.FIELD_DECL:
print c.brief_comment
print c.raw_comment
print 

产品:

Brief comment about bar
/// Brief comment about bar
///
/// Extra Text about bar
Brief comment about baz
/** Brief comment about baz
*
* Extra Text about baz
*/
Brief only comment
/// Brief only comment