C++ 中的统计模拟

Statistical simulation in c++

本文关键字：模拟统计 C++ 更新时间：2023-10-16

我正在尝试解决斯坦福大学免费在线CS106B课程中给出的问题。问题的文本如下所示。我写了一个函数，但我不确定逻辑是否正确(当你知道你有正确的答案时，不是这些程序之一)。请参阅下面的问题和我的代码。我将不胜感激任何反馈/建议。

问题：考虑一个1000名选民的选举，两个候选人之间的差距为一个百分点，即50.5%的选票给一个候选人，49.5%的人投票给另一个候选人。投票机在8%的时间内犯错误，并记录对相反候选人的投票。这个错误率是否高到足以使选举结果无效？只要有一点统计学知识，就不难计算出无效结果的确切概率，但模拟这个过程就更容易了。为候选人 A 生成 505 票和候选人 B 的 495 票序列，其中每张选票在记录时有 8% 的几率被反转。尽管选民的初衷，但总票数是否会导致B击败A？该结果代表模拟中的一个试验。如果您多次重复此试验并跟踪结果，则比率：

(选举结果无效的审判次数)/(审判总数)

提供无效选举结果的百分比几率的估计值。

编写一个程序，提示用户输入投票模拟参数，然后执行 500 次模拟试验并报告上面计算的比率。该程序的示例运行如下所示：

输入投票人数：10000 输入候选人之间的百分比分布：.005
输入投票错误百分比：.15 500次试验后选举结果无效的几率= 13.4%

程序应注意验证用户选择的模拟参数是否在范围内(百分比必须为 0 到 1.0，投票者数应为正数)，并在必要时重新提示输入有效输入。请注意，由于模拟中的随机性，预计结果会因运行而异。

代码(附言我使用了斯坦福CPP图书馆)：

#include <iostream>
#include "console.h"
#include "gwindow.h" // for GWindow
#include "simpio.h"  // for getLine
#include "vector.h"  // for Vector
#include "queue.h"   // for queues
# include "random.h"
using namespace std;

/* FUNCTION PROTOTYPES */
void ElectionSimulation();

/* MAIN METHOD */
int main(){
ElectionSimulation();
return 0;
}
/* FUNCTION DEFINITIONS */
void ElectionSimulation(){
int numVoters = 
getInteger("Enter number of voters: ", 
"You must enter a positive integer, try again");
int numSimulations =
getInteger("Enter the number of election simulations: ",
"You must enter a positive integer, try again" );
double voterSpread =
getDoubleBetween("Enter spread between candidates, e.g. for 10%
enter 0.1 etc: ", 0.0, 1.0);
double votingError =
getDoubleBetween("Enter vote recording error chance, e.g. for   
15% enter 0.15 etc: ", 0.0, 1.0);

// Determine the correct number of votes for each candidate 
// given the spread and numVotes
int correctVotesLower = numVoters*(0.5 - 0.5*voterSpread);
int correctVotesHigher = numVoters*(0.5 + 0.5*voterSpread);
int invalidElections = 0;

// Run simulations
for (int i = 0 ; i<numSimulations; i++){
// Before every simulation, set the correct number 
// of votes for each candidate   
int votesLower = correctVotesLower;
int votesHigher = correctVotesHigher;

// Redistribute votes due to vote recording error
for (int j = 0; j<correctVotesLower; j++){
if (randomChance(votingError)){
votesLower--;
votesHigher++;
}
}
for (int k = 0; k<correctVotesHigher; k++){
if (randomChance(votingError)){
votesLower++;
votesHigher--;
}
}

if(votesLower > votesHigher) {invalidElections++;}
}
cout << "After " << numSimulations << 
" simulations, elections were invalid "
<< (double)invalidElections*100.0/(double)numSimulations
<< " percent of times" << endl;
}

特别是，如果我输入以下参数(如问题文本中给出的)：

numVoters = 10000;
numSumulations = 500;
voterSpread = 0.005;
votingError = 0.15;

我得到无效选举大约 30% 的时间。似乎有点高。问题文本说在这些参数下，我应该得到大约 13.4%(由于随机性，每次运行略有不同)。我认为我的逻辑有问题，但我不知道在哪里。

我相信你的程序是正确的。

如果人们以 0.5025 的概率投票给候选人 A，而投票机以 0.15 的概率错误地注册了投票，那么这意味着投票机将以概率 0.5025*(1-0.15) + (1-0.5025)*0.15 = 0.50175 注册候选人 A。当我将其代入二项分布以查找 10000 票中 A 少于 5000 票的概率时，我发现概率约为 0.36。

这只是一个粗略的估计，不是一个正确的计算，但它表明你的30%可能不太高。

(更新：可以肯定的是，我还编写了一个快速的Python程序，使用不同的技术解决了这个问题，它也给出了大约30%。

更新2：今天早上醒来，我想到了一种计算确切概率的方法，并且不得不尝试一下。所以这里有一种方法可以用scipy找到它;

import scipy.stats as ss
numVoters = 10000
voterSpread = 0.005
votingError = 0.15
correctVotersLower = int(numVoters*(0.5 - 0.5*voterSpread))
correctVotersHigher = int(numVoters*(0.5 + 0.5*voterSpread))
votersDifference = correctVotersHigher - correctVotersLower
minHighErrors = (votersDifference + 1) / 2
lowerErrorDist = ss.binom(correctVotersLower, votingError)
higherErrorDist = ss.binom(correctVotersHigher, votingError)
print sum([higherErrorDist.sf(x + minHighErrors) * lowerErrorDist.pmf(x) for x in range(0,correctVotersLower)])

我得到的概率大约是 0.305598。