从头开始设计业务消息解析器/重写

designing business msgs parser / rewriting from scratch

本文关键字：重写消息业务从头开始更新时间：2023-10-16

我负责项目中的关键应用程序。它做一些与解析业务消息（传统标准）相关的事情，处理它们，然后将一些结果存储在数据库中（另一个应用程序会接收这些结果）。经过一年多的工作（我还有其他应用程序需要照顾），这个应用程序终于稳定了。我引入了严格的TDD政策，我有20%的单元测试覆盖率（感谢Michael Feathers的书！），大部分都在关键部分。我也有一些白盒健身测试（其中涵盖了整个业务场景）。我觉得我无法进一步重构这个应用程序，我可以放心地使用它。它设计得太糟糕了，我想重写它。应用程序本身大约有2万个具有挑战性的遗留C/C++代码。还有其他依赖项，但我设法将其中大部分解耦。

我所拥有的只是Sun C++编译器、cppunitlite、STLPort和Boost。请不要建议使用其他技术（没有XML、Java等），因为这不是我所在组织的选项。我想用现代C++（也许可以玩元编程…），TDD从头到尾。

我需要解析大约30种类型的消息。它们每个都由3-10行组成，大多数都很相似。这是万恶之源->大量的代码重复。每条消息都有一个类，描述应该如何解析它。看看主要的继承树：

                             MSG_A                     MSG_B
                            /                        /     
                    MSG_A_NEW   MSG_A_CNL      MSG_B_NEW   MSG_B_CNL

两棵树都长得更深。A_ NEW和MSG_。它应该由单个类来处理，这些类可以注入一些小的自定义。

我最初的计划是有一个通用的msg类，将被定制。一些实体（构建器…？）将查看消息并初始化能够解析消息的适当对象。另一个实体将能够发现它是哪条线，这些信息将被建设者使用。我计划编写几个解析器，只负责解析一行特定的代码。这将允许我在解析不同的消息时重用它。

我很难用一种优雅且可扩展的方式来解决一些挑战。每种类型的消息：

具有最小和最大数量if行-有一些必备的线条-有一些可选行-某些行必须在特定的位置（即日期不能在消息类型之前），订单很重要

我需要能够验证消息的格式。

我不确定我在这里对设计挑战的解释是否足够好。我的设计经验非常有限。我已经修复了一段时间的错误，最后我会做一些有趣的修改：）

你对此有什么高级建议？您可以在本说明中确定哪些设计模式？主要的设计限制是可维护性和可扩展性，性能垫底（无论如何，我们还有其他瓶颈…）

我建议您不要从包含以下常见代码的基类继承特定的消息处理类：


      CommonHandler
            ^                                   ^
            |                                   |  = inheritance
       MsgAHandler
        ^       ^
        |       |
ANewHandler     ACnlHandler

这种方法的可重用性很差：例如，如果您想处理某种需要从A_NEW和A_CNL执行任务的消息，那么您很快就会得到多重继承。

相反，我会选择一个包含公共代码的类，该类调用接口来自定义该公共代码。类似这样的东西：


BasicHandler <>--- IMsgHandler    ------------
             1  1  ^  ^   ^  ^    *           |            ^
                   |  |   |  |                |            |   = inheritance
         MsgAHandler  |   |  ANewHandler    1 |
             ACnlHandler  HandlerContainer <>-/           <>-  = containment

The HandlerContainer class can be used to group the behaviour of other handlers together.

This pattern is called 'Composite', if I'm not mistaken. And to create the correct instances of the handlers, you will of course need some kind of factory.

Good luck!

That does sound like a fun challenge. :-)

Your "initial plan" sounds like a good one: factor out all of the similar processing between all of the messages and put the code for them in a base message class. The changing items can become virtual functions (such as CheckForRequiredLines or VerifyLineOrder, perhaps), possibly with default implementations for the most common case. Then derive other classes for the specific message types.

It's hard to give generic advice for a design problem like this. It seems to me that your main parser function corresponds to the Factory Method pattern, but that's the only one I can easily identify. (I'm not too familiar with the names of design patterns -- I use many of them, but I only learned that they have names a couple years ago.)

You probably are already aware of this, but just in case... You should pick up/borrow the Gang of Four design patterns book for help in identifying and applying appropriate patterns. This is the canonical reference, and it contains cross-references and tables to help you decide what patterns might fit your application. It might be difficult for people here to identify specific patterns that might help you, based just on that description.

I would suggest looking at the libraries provided by boost, for example Tuple or mpl::vector. These libraries allows you to create a list of unrelated types and then operate over them. The very rough idea is that you have sequences of types for each message kind:

Seq1 -> MSG_A_NEW, MSG_A_CNL
Seq2 -> MSG_B_NEW, MSG_B_CNL

一旦您知道了消息类型，就可以使用适当的元组和将第一个元组类型应用于数据的函数模板。然后是元组中的下一个条目，依此类推

这确实假设数据流的布局在编译时是已知的，但它的优点是不需要为数据结构支付任何运行时开销。