Winsock接收给乱码混合有用的html

Winsock recv giving gibberish mixed with useful html

本文关键字:混合 有用 html Winsock      更新时间:2023-10-16

我试图得到一个网页的html源www.chemguide.co.uk(它有页面不长英里)使用winsock实现在c++中。大多数数据都是好的,但是在输出的某些点上有一个特定的字符(在控制台上看起来像_,在这里看起来像I)被重复,我认为是8个一组,还有一些其他奇怪的字符。

而且,文档的某些部分似乎打印在页面末尾(标记)之后。下面是代码:

// Portprog.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <winsock2.h>
#include <sys/types.h>
#include <stdio.h>
#include <iostream>
#include <string>
#include <fstream>

#pragma comment(lib, "ws2_32.lib") //Winsock library
int getHTML(std::string *result)
{
    WSADATA wsa;
    SOCKET s;
    SOCKADDR_IN server;
    using std::string;
    using std::cout;
    using std::endl;
    cout << "Initialising Winsock...";
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0)
    {
        cout << "Failed. Error Code: " << WSAGetLastError();
        return 1;
    }
    cout << "Winsock initialised." << endl;
    if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
    {
        cout << "Could not create socket: " << WSAGetLastError() << endl;
        return 1;
    }
    cout << "Socket created." << endl;
    server.sin_addr.s_addr = inet_addr("217.27.240.124");
    server.sin_family = AF_INET;
    server.sin_port = htons(80); //host to network endian short
    //Connect to remote server
    if (connect(s, (SOCKADDR *)&server, sizeof(server)) < 0)
    {
        cout << "Connection failed." << endl;
        return 1;
    }
    cout << "Connected." << endl;
    //Send some data
    string srequest = "GET / HTTP/1.1rn";
    srequest += "Host: chemguide.co.ukrn";
    srequest += "Connection: closern";
    srequest += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5rn";
    srequest += "rn";
    char crequest[10000];
    int requestSize = srequest.length() + 1;
    strncpy_s(crequest, srequest.c_str(), requestSize);
    if (send(s, crequest, requestSize, 0) < 0)
    {
        cout << "Data could not be sent." << endl;
        return 1;
    }
    cout << "Data sent." << endl;
    //Receive a reply from the server
    std::string server_reply = "";
    int recv_length;
    char buffer[1000];
    int i = 0;
    do
    {
        i = recv_length = recv(s, buffer, sizeof(buffer), 0);
        server_reply += buffer;
    } while (i != 0);
    cout << "Reply received." << endl;
    *result = server_reply;
    closesocket(s);
    WSACleanup();
    return 0;
}
int main(int argc, char *argv[])
{
    std::string response = "";
    getHTML(&response);
    std::cout << response << std::endl;
    std::ofstream file("output.txt");
    file << response;
    file.close();
    return 0;
}

输出如下:

HTTP/1.1 200 OK
Date: Mon, 03 Aug 2015 00:22:17 GMT
Server: Apache/2.2.11
Last-Modified: Mon, 13 Apr 2015 11:56:25 GMT
ETag: "99190a-1ec2-51399cdaacc40"
Accept-Ranges: bytes
Content-Length: 7874
Connection: close
Content-Type: text/html


<html>
<head>
<title>chemguide:  helping you to understand Chemistry - Main Menu</title>
<meta name="description"
content="Main menu of a site aimed to help advanced level chemistry students to understand chemistry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />

</head>
<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">
<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>

<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌè="#006600" size="6" face="Helvetica, Arial"><p align="center"><b>Helping you to understand Chemistry</b></p></font>
<font color="#000000" size="5" face="Helvetica, Arial">
<p align="center"><b>MAIN MENU</b></p>
</font>
<pre>
</pre>
<table align="center" cellpadding="10" border="1">
<tr valign="top"><td bgcolor="#cccccc"> <font color="#ff0000" face="Helvetica, Arial" size="2"><b>New!  </b></a></font><font color="#000000" face="Helvetica, Arial" size="2">stry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />

</head>
<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">
<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>

<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌÌI have just come across a really good site of short chemistry revision videos.  You can find more about it at the top of the <a href="links.html#top"></font>links</a> page.</td></tr>
</table>
<pre>
</pre>
<table align="center" cellpadding="10" border="1">

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="keywordsearch.html#top"><b>Keyword searching</b></a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌè Chemistry.</b></font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="http://www.chemguideforcie.co.uk/index.html"><b>CIE syllabus support</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for CIE (Cambridge International) A level students and teachers.</b></font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="atommze="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌÌenu.html#top">Atomic Structure and Bonding</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers basic atomic properties (electronic structures, ionisation energies, electron affinities, atomic and ionic radii, and the atomic hydrogen emission spectrum), bonding (including intermolecular bonding) and structures (ionic, molecular, giant covalent and metallic).</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌèize="2"><a href="physmenu.html#top">Physical Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers simple kinetic theory, ideal and real gases, chemical energetics, rates of reaction including catalysis, an introduction to chemical equilibria, redox equilibria, acid-base equilibria (pH, buffer solutions, indicators, etc), solubility products, and phase equilibria (including Raoult's Law and the use of various phase diagetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌÌrams).</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="analysismenu.html#top">Instrumental analysis</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Explains how you can analyse substances using machines - mass spectrometry,  infra-red spectroscopy, NMR, UV-visible absorption spectrometry and chromatography.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgmenu.html#top">Basic Organic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes help on bonding, naming and isomerism, and a discussion of organic acids and bases.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgpropsmenu.html#top">Properties of organic compounds</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers the physical and chemical properties of compounds on UK A ÌÌÌÌÌÌÌÌèlevel chemistry syllabuses, and includes a limited amount of biochemistry.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="mechmenu.html#top">Organic Reaction Mechanisms</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers all the mechanisms required by the current UK A level chemistry syllabuses.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="about.html#top">About this site</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes a contact address if you have found any difficulties with the site.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="qandclist.html#top">Questions and comments</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A selection of questions that I have been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌèts.  There are also a number of chemistry questions that I have been asked and which I haven't been able to find good answers for!</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="book.html#top">Chemistry Calculations</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A description of the author's book on calculations at UK A level chemistry standard.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="suggestions.html#top">Textbook suggestions</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Suggestions for textbooks and revision guides covering the UK AS and A level chemistry syllabuses, with links to Amazon.co.uk if you want to follow them up.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌ̘es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>
<pre>
</pre>
<hr />
<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>
</table></center>
</BODY>
</HTML>
tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ6es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>
<pre>
</pre>
<hr />
<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>
</table></center>
</BODY>
</HTML>
tr>
<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ

我使用的是Visual Studio 2013。这是我的stdafx.h文件:

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//
#pragma once
#define _WINSOCK_DEPRECATED_NO_WARNINGS
//#define _CRT_SECURE_NO_WARNINGS
#include "targetver.h"
#include <stdio.h>
#include <tchar.h>

// TODO: reference additional headers your program requires here

问题是您将读取的数据视为字符串,但您似乎忘记了c++中的C风格字符串以特殊字符''结束。

因此,您需要读取比缓冲区大小少一个字符,并通过在末尾添加结束符来结束您读取的缓冲区:

if (i >= 0)
    buffer[i] = '';

您得到乱码的原因是因为当您将缓冲区附加到字符串server_reply时,+=操作符函数查找此终止符以找到要附加的字符串的末尾,如果终止符+=操作符函数将继续直到找到与终止符字符对应的字节,这甚至可能超出buffer的限制。不终止字符串会导致未定义行为


另外,当接收时,您不检查错误,如果recv返回SOCKET_ERROR(不等于零),您认为会发生什么?你将以一个无限循环结束。