LLVM:如何在运行时跟踪非类型语言的 Value* 的数据类型?

LLVM: How to keep track of data types of the Value* at runtime for untyped language?

本文关键字：型语言 Value 数据类型类型跟踪运行时 LLVM 更新时间：2023-10-16

我正在使用LLVM实现一种非类型编程语言来生成后端代码。为了跟踪特定变量的当前类型，我使用了一个结构StructTy_struct_datatype_t定义为：

PointerTy_8 = PointerType::get(IntegerType::get(TheContext, 8), 0);
StructTy_struct_datatype_t = StructType::create(TheContext, "struct.datatype_t");
std::vector<Type *> StructTy_struct_datatype_t_fields;
StructTy_struct_datatype_t_fields.push_back(IntegerType::get(TheContext, 32));
StructTy_struct_datatype_t_fields.push_back(PointerTy_8);
// which represents the struct
typedef struct datatype_t {
int type; // holds an integer that tells me the type (1 = int, 2 = float, ...)
void* v; // holds a pointer to the actual value
} datatype_t;

然后，假设我有一个这样的函数

def function_add(a, b) {
return a + b;
}

我希望这个函数能够接受

function_add(1, 1); // returns 2; (int)
function_add(1.0, 1.0); // returns 2.0 (float)
function_add("str1", "str2"); // returns "str1str2" (string)

处理二进制操作的代码，即。a + b如下

Value* L = lhs_codegen_elements.back();
Value* R = rhs_codegen_elements.back();
if (!L || !R) {
logError("L or R are undefined");
return codegen;
}
AllocaInst* lptr_datatype = (AllocaInst*)((LoadInst*)L)->getPointerOperand();
AllocaInst* rptr_datatype = (AllocaInst*)((LoadInst*)R)->getPointerOperand();
ConstantInt* const_int32_0 = ConstantInt::get(TheContext, APInt(32, StringRef("0"), 10));
ConstantInt* const_int32_1 = ConstantInt::get(TheContext, APInt(32, StringRef("1"), 10));
GetElementPtrInst* lptr_type =
GetElementPtrInst::Create(StructTy_struct_datatype_t, lptr_datatype, {const_int32_0, const_int32_0}, "type");
GetElementPtrInst* rptr_type =
GetElementPtrInst::Create(StructTy_struct_datatype_t, rptr_datatype, {const_int32_0, const_int32_0}, "type");
GetElementPtrInst* lptr_v =
GetElementPtrInst::Create(StructTy_struct_datatype_t, lptr_datatype, {const_int32_0, const_int32_1}, "v");
GetElementPtrInst* rptr_v =
GetElementPtrInst::Create(StructTy_struct_datatype_t, rptr_datatype, {const_int32_0, const_int32_1}, "v");
LoadInst* lload_inst_type = load_inst_codegen(TYPE_INT, lptr_type);
LoadInst* rload_inst_type = load_inst_codegen(TYPE_INT, rptr_type);
LoadInst* lload_inst_v = load_inst_codegen(TYPE_VOID_POINTER, lptr_v);
LoadInst* rload_inst_v = load_inst_codegen(TYPE_VOID_POINTER, rptr_v);
CmpInst* cond1 =
new ICmpInst(ICmpInst::ICMP_EQ, lload_inst_type, ConstantInt::get(TheContext, APInt(32, TYPE_DOUBLE)));
Function* function_bb = dyn_cast<Function>(bb);
BasicBlock* label_if_then_double = BasicBlock::Create(TheContext, "if.then.double", function_bb);
BasicBlock* label_if_then_long = BasicBlock::Create(TheContext, "if.then.long", function_bb);
BranchInst* branch_inst = BranchInst::Create(label_if_then_double, label_if_else, cond1, bb);
L->dump(); // %load_inst = load %struct.datatype_t, %struct.datatype_t* %alloca_datatype_v, align 8
R->dump(); // %load_inst = load %struct.datatype_t, %struct.datatype_t* %alloca_datatype_v1, align 8
L->getType()->dump(); // %struct.datatype_t = type { i32, i8* }
R->getType()->dump(); // %struct.datatype_t = type { i32, i8* }
lload_inst_type->dump(); //   %load_inst = load i32, i32* %type, align 4
rload_inst_type->dump(); //   %load_inst = load i32, i32* %type, align 4
lload_inst_v->dump(); //   %load_inst = load i8*, i8** %v, align 8
rload_inst_v->dump(); //   %load_inst = load i8*, i8** %v, align 8
if (op == '+') {
// issue: how to take the decision without knowing the type lload_inst_v holds
BinaryOperator::Create(Instruction::FAdd, lload_inst_v, rload_inst_v, "add", label_if_then_double);
// or
BinaryOperator::Create(Instruction::Add, lload_inst_v, rload_inst_v, "add", label_if_then_long);
}

所以问题是我需要知道哪个是lload_inst_type和rload_inst_type持有的类型，以便我可以将方法从 LLVM APIBinaryOperator::Create(Instruction::FAdd, ...)切换为floats，BinaryOperator::Create(Instruction::Add, ...)用于ints，例如。

但是，我刚刚意识到我无法弄清楚AllocaInst的价值，LoadInst生成后端时(至少我不知道该怎么做)。

如何在运行时跟踪Value*的数据类型？
我是否选择了错误的策略来实现非类型语言？

如果您的源语言类型系统是非类型化的，则必须对 LLVM 隐藏，因为它是 IR 类型的。您必须设计一种方法来在运行时跟踪类型，也许是某种枚举的标记对象系统。函数调用必须检查在运行时传入的类型，并选择要调用的相应函数。

LLVM不提供任何此功能，这必须由语言运行时的类型系统负责。