ODR的意图是什么

What is the intention of ODR?

本文关键字：是什么意图 ODR 更新时间：2023-10-16

我确实理解ODR所说的，但我不理解它试图实现什么。

我看到违反它的两个后果——用户会出现语法错误，这完全没问题。而且可能会有一些致命的错误，同样，用户将是唯一有罪的人。

作为违反ODR并得到一些致命错误的例子，我想象如下：

a.cpp

struct A
{
int a;
double b;
};
void f(A a)
{
std::cout << a.a << " " << a.b << std::endl;
}

main.cpp

struct A
{
int a;
int b;
};
void f(A a);
int main()
{
A a = {5, 6};
f(a);
return 0;
}

若这个例子和ODR无关，请纠正我。

那么，ODR是否试图禁止用户做这种有害的事情？我不这么认为。

它是否试图为编译器编写者设置一些规则，以避免违反规则的潜在危害？可能不会，因为大多数编译器都不会检查ODR冲突。

还有什么？

ODR规定了哪些C++程序是格式良好的。ODR违规意味着你的程序格式不正确，标准没有规定程序将做什么，是否应该编译等等。大多数ODR违规都被标记为"不需要诊断"，以使编译器编写器的工作更容易。

这允许C++编译器对您提供给它的代码做出某些简化的假设，比如::A在任何地方都是相同的结构类型，而不必在每个使用点进行检查。

编译器可以免费获取您的代码并将其编译为c:格式。或者其他什么。它可以自由地检测ODR冲突，并使用它来证明代码分支无法运行，并消除导致这种情况的路径。

当函数期望获得其中一个结构，而您将其重新声明为不同的结构时，该函数接收哪个结构，以及如何接收？记住，C++是静态的，所以如果你按值发送结构，函数必须知道它的结构。因为C++是类型安全的，允许违反ODR将违反这种类型安全。

最重要的是，如果没有ODR，会有什么好处？我能想到成百上千的事情，如果没有它，就会变得更加困难，而且一无所获。从字面上看，在同一个命名空间中踩踏以前声明的类型是没有灵活性的。在最好的情况下，这只会使多次入选不需要头球后卫，这充其量是一个非常小的收获。

据我所知，该规则的目的是防止在不同的翻译单元中对对象进行不同的定义。

// a.cpp
#include <iostream>
class SharedClass {
int a, b, c;
bool d;
int e, f, g;
public:
// ...
};
void a(const SharedClass& sc) {
std::cout << "sc.a: " << sc.getA() << 'n'
<< "sc.e: " << sc.getE() << 'n'
<< "sc.c: " << sc.getC() << std::endl;
}
// -----
// b.cpp
class SharedClass {
int b, e, g, a;
bool d;
int c, f;
public:
// ...
};
void b(SharedClass& sc) {
sc.setA(sc.getA() - 13);
sc.setG(sc.getG() * 2);
sc.setD(true);
}
// -----
// main.cpp
int main() {
SharedClass sc;
/* Assume that the compiler doesn't get confused & have a heart attack,
*  and uses the definition in "a.cpp".
* Assume that by the definition in "a.cpp", this instance has:
*   a = 3
*   b = 5
*   c = 1
*   d = false
*   e = 42
*   f = -129
*   g = 8
*/
// ...
a(sc); // Outputs sc.a, sc.e, and sc.c.
b(sc); // Supposedly modifies sc.a, sc.g, and sc.d.
a(sc); // Does NOT do what you think it does.
}

考虑到这个程序，您可能会认为SharedClass在a.cpp和b.cpp中的行为相同，因为它具有相同名称的相同字段。但是，请注意，字段的顺序不同。因此，每个翻译单元都会看到这样的结果(假设4字节int和4字节对齐)：

如果编译器使用隐藏对齐成员：

// a.cpp
Class layout:
0x00: int  {a}
0x04: int  {b}
0x08: int  {c}
0x0C: bool {d}
0x0D: [alignment member, 3 bytes]
0x10: int  {e}
0x14: int  {f}
0x18: int  {g}
Size: 28 bytes.
// b.cpp
Class layout:
0x00: int  {b}
0x04: int  {e}
0x08: int  {g}
0x0C: int  {a}
0x10: bool {d}
0x11: [alignment member, 3 bytes]
0x14: int  {c}
0x18: int  {f}
Size: 28 bytes.
// main.cpp
One of the above, up to the compiler.
Alternatively, may be seen as undefined.

如果编译器将相同大小的字段放在一起，按从大到小的顺序排列：

// a.cpp
Class layout:
0x00: int  {a}
0x04: int  {b}
0x08: int  {c}
0x0C: int  {e}
0x10: int  {f}
0x14: int  {g}
0x18: bool {d}
Size: 25 bytes.
// b.cpp
Class layout:
0x00: int  {b}
0x04: int  {e}
0x08: int  {g}
0x0C: int  {a}
0x10: int  {c}
0x14: int  {f}
0x18: bool {d}
Size: 25 bytes.
// main.cpp
One of the above, up to the compiler.
Alternatively, may be seen as undefined.

请注意，尽管类在两个定义中的大小相同，但其成员的顺序完全不同。

Field comparison (with alignment member):
a.cpp field     b.cpp field
a               b
b               e
c               g
d & {align}     a
e               d & {align}
f               c
g               f
Field comparison (with hidden reordering):
a.cpp field     b.cpp field
a               b
b               e
c               g
e               a
f               c
g               f
d               d

因此，从a()的角度来看，b()实际上更改了sc.e、sc.c以及sc.a或sc.d(取决于它的编译方式)，从而完全更改了第二个调用的输出。[请注意，这甚至可能出现在你从未想过的无害情况下，例如a.cpp和b.cpp对SharedClass有相同的定义，但指定了不同的对齐方式。这将改变对齐成员的大小，再次在不同的翻译单元中为类提供不同的内存布局。]

现在，如果相同的字段在不同的翻译单元中以不同的方式排列，就会发生这种情况。想象一下，如果类在不同的单元中有完全不同的字段，会发生什么。

// c.cpp
#include <string>
#include <utility>
// Assume alignment of 4.
// Assume std::string stores a pointer to string memory, size_t (as long long), and pointer
//  to allocator in its body, and is thus 16 (on 32-bit) or 24 (on 64-bit) bytes.
// (Note that this is likely not the ACTUAL size of std::string, but I'm just using it for an
//  example.)
class SharedClass {
char c;
std::string str;
short s;
unsigned long long ull;
float f;
public:
// ...
};
void c(SharedClass& sc, std::string str) {
sc.setStr(std::move(str));
}

在这个文件中，我们的SharedClass应该是这样的：

Class layout (32-bit, alignment member):
0x00: char                {c}
0x01: [alignment member, 3 bytes]
0x04: string              {str}
0x14: short               {s}
0x16: [alignment member, 2 bytes]
0x18: unsigned long long  {ull}
0x20: float               {f}
Size: 36 bytes.
Class layout (64-bit, alignment member):
0x00: char                {c}
0x01: [alignment member, 3 bytes]
0x04: string              {str}
0x1C: short               {s}
0x1E: [alignment member, 2 bytes]
0x20: unsigned long long  {ull}
0x28: float               {f}
Size: 44 bytes.
Class layout (32-bit, reordered):
0x00: string              {str}
0x10: unsigned long long  {ull}
0x18: float               {f}
0x1C: short               {s}
0x1E: char                {c}
Size: 31 bytes.
Class layout (64-bit, reordered):
0x00: string              {str}
0x18: unsigned long long  {ull}
0x20: float               {f}
0x24: short               {s}
0x26: char                {c}
Size: 39 bytes.

这个SharedClass不仅有不同的字段，而且大小完全不同。试图将每个翻译单元视为具有相同的SharedClasscan和将打破某些东西，而无声地使每个定义相互协调是不可能的。想象一下，如果我们在SharedClass的同一个实例上调用a()、b()和c()会发生什么混乱，甚至如果我们试图使成为SharedClass的实例会发生什么。有了三个不同的定义，编译器不知道哪一个是实际的定义，事情可能会变得很糟糕。

这完全打破了单元间的可操作性，要求使用类的所有代码要么在同一个翻译单元中，要么在每个单元中共享完全相同的类定义。因此，ODR要求每个单元只能定义一个类，并且在所有单元中共享相同的定义，以确保它始终具有相同的定义并防止整个问题。

类似地，考虑这个简单的函数func()。

// z.cpp
#include <cmath>
int func(int x, int y) {
return static_cast<int>(round(pow((2 * x) - (3 * y), x + y) - (x / y)));
}
// -----
// y.cpp
int func(int x, int y) { return x + y; }
// -----
// x.cpp
int q = func(9, 11);
// Compiler has a heart attack, call 911.

编译器将无法判断您所指的是func()的哪个版本，实际上会将它们视为同一个函数。这自然会打破局面。当一个版本有副作用(例如更改全局状态或导致内存泄漏)，而另一个版本没有时，情况会变得更糟。

在这种情况下，ODR旨在保证任何给定的函数在所有翻译单元中共享相同的定义，而不是在不同的单元中具有不同的定义。这一点很容易更改(为了ODR的目的，将所有函数都视为inline，但如果显式或隐式声明为inline，则仅将它们视为CCD25)，但这可能会以不可预见的方式造成麻烦。

现在，考虑一个更简单的情况，全局变量。

// i.cpp
int global_int;
namespace Globals {
int ns_int = -5;
}
// -----
// j.cpp
int global_int;
namespace Globals {
int ns_int = 5;
}

在这种情况下，每个翻译单元定义变量global_int和Globals::ns_int，这意味着程序将有两个不同的变量，它们的名称完全相同。这只能在链接阶段结束，链接器将符号的每个实例视为引用相同的实体。Globals::ns_int将比global_int有更多的问题，因为在文件中硬编码了两个不同的初始化值；假设链接器不只是爆炸，则保证程序具有未定义的行为。

ODR的复杂性各不相同，具体取决于所涉及的实体。有些东西在整个程序中只能有一个定义，但有些东西可以有多个定义，只要它们完全相同，并且每个翻译单元只有一个定义。无论情况如何，其目的都是每个单位都将以完全相同的方式看待实体。

不过，这样做的主要原因是方便。编译器不仅更容易假设ODR在每个翻译单元中都得到了严格遵循，而且速度更快，CPU、内存和磁盘占用更少。如果没有ODR，编译器将不得不比较每个翻译单元，以确保每个共享类型和内联函数定义都是相同的，并且每个全局变量和非内联函数都只在一个翻译单元中定义。这自然需要它在编译任何单元时从磁盘加载每个单元，使用大量系统资源，如果程序员遵循良好的编程实践，这些资源实际上并不需要。有鉴于此，强迫程序员遵循ODR可以让编译器认为一切都很好，这使得它的工作(以及程序员在等待编译器时的工作和/或闲逛)变得容易得多。[与此相比，确保在一个单元内遵循ODR是孩子的游戏。]

简单地说，一个定义规则保证：

应该在程序中只定义一次的实体只定义了一次。
可以在多个翻译单元(类、内联函数、模板函数)中定义的实体具有等效的定义，从而产生等效的编译代码。等价必须是完美的，才能在运行时使用任何一个定义：许多定义是不可区分的。