制作汇编程序的设计模式

Design Pattern For Making An Assembler

本文关键字：设计模式汇编程序更新时间：2023-10-16

我在做一个8051汇编器。

在一切之前都是一个标记器，它读取下一个标记，设置错误标志，识别EOF等。
然后是编译器的主循环，它读取下一个标记并检查有效的助记符:

mnemonic= NextToken();
if (mnemonic.Error)
{
    //throw some error
}
else if (mnemonic.Text == "ADD")
{
    ...
}
else if (mnemonic.Text == "ADDC")
{
    ...
}

还有几个例子。比这更糟糕的是每个case中的代码，它检查有效的参数，然后将其转换为已编译的代码。现在它看起来像这样:

if (mnemonic.Text == "MOV")
{
    arg1 = NextToken();
    if (arg1.Error) { /* throw error */ break; }
    arg2 = NextToken();
    if (arg2.Error) { /* throw error */ break; }
    if (arg1.Text == "A")
    {
        if (arg2.Text == "B")
            output << 0x1234; //Example compiled code
        else if (arg2.Text == "@B")
            output << 0x5678; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
    else if (arg1.Text == "B")
    {
        if (arg2.Text == "A")
            output << 0x9ABC; //Example compiled code
        else if (arg2.Text == "@A")
            output << 0x0DEF; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
}

对于每个助记符，我必须检查有效的参数，然后创建正确的编译代码。非常相似的代码，用于检查每种情况下每个助记符重复的有效参数。

那么是否存在改进这段代码的设计模式呢?
或者简单地用一种更简单的方法来实现它?

编辑:多亏了他，我接受了plinth的回答。如果你对此有什么想法，我很乐意学习。谢谢所有。

多年来，我编写了许多汇编程序进行手动解析，坦率地说，您可能最好使用语法语言和解析器生成器。

原因如下——一条典型的装配线可能看起来像这样:

[label:] [instruction|directive][newline]

和指令将是:

plain-mnemonic|mnemonic-withargs

和一个指令将是:

plain-directive|directive-withargs

等。使用像Gold这样的解析器生成器，您应该能够在几个小时内写出8051的语法。与手动解析相比，这种解析的优点是，您将能够在汇编代码中使用足够复杂的表达式，例如:

.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)

手工做起来可真麻烦。

如果你想手工操作，制作一个所有助记符的表，其中还包括它们支持的各种允许的寻址模式，以及每种寻址模式，每种变体将占用的字节数和它的操作码。像这样:

enum {
    Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode; 
/* for a 4 char mnemonic, this struct will be 5 bytes.  A typical small processor
 * has on the order of 100 instructions, making this table come in at ~500 bytes when all
 * is said and done.
 * The time to binary search that will be, worst case 8 compares on the mnemonic.
 * I claim that I/O will take way more time than look up.
 * You will also need a table and/or a routine that given a mnemonic and addressing mode
 * will give you the actual opcode.
 */
struct InstructionInfo {
    char Mnemonic[4];
    char AddessingMode;
}
/* order them by mnemonic */
static InstructionInfo instrs[] = {
    { {'A', 'D', 'D', ''}, Direct|Extended|Indexed },
    { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', ''}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */
static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);
InstructionInfo *GetInstruction(char *mnemonic) {
   /* binary search for mnemonic */
}
int InstructionSize(AddressingMode mode)
{
    switch (mode) {
    case Inplied: return 1;
    / * etc */
    }
 }

那么你将得到一个每条指令的列表，而该列表又包含了所有寻址模式的列表。

解析器变成这样:

char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
    AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
    if (IsValidInstruction(mnemonic, info)) {
        GenerateCode(mnemonic, info);
    }
    else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

是。大多数汇编程序使用一个数据表来描述指令:助记符、操作码、操作数形式等。

我建议查看as的源代码。但是我找起来有些困难。看这里。(感谢侯赛因)

我认为您应该研究访问者模式。它可能不会使代码更简单，但会减少耦合并提高可重用性。SableCC是一个java框架，用于构建广泛使用它的编译器。

当我在玩微码模拟器工具时，我将所有内容转换为Instruction类的后代。从Instruction开始分为Arithmetic_Instruction、Branch_Instruction等类别。我使用工厂模式来创建实例。

最好的办法可能是掌握汇编语言语法规范。编写一个词法分析器来转换为令牌(**请不要使用if-else - if-else梯子)。然后根据语义，发出代码。

很久以前，汇编程序至少要经过两次:第一次是解析常量并形成骨架代码(包括符号表)。第二步是生成更具体或绝对的值。

你最近读过龙之书吗?

您看过"Command Dispatcher"模式吗?

http://en.wikipedia.org/wiki/Command_pattern

一般的想法是创建一个对象来处理每个指令(命令)，并创建一个查找表，将每个指令映射到处理程序类。每个命令类都有一个公共接口(command。执行(*args)为例)，这肯定会给你一个更干净/更灵活的设计比你目前庞大的开关语句