如何在 64 位进程中拦截 API 方法调用

How to intercept API method calls in a 64bit process?

本文关键字：API 方法调用进程更新时间：2023-10-16

Background

我正在开发一种旧产品，该产品可以通过注入 dll 成功拦截注入进程试图将任意方法调用转换为任意 dll。特别是gdi32.dll图书馆。不幸的是，当它嵌入到 64 位应用程序中时，它不起作用。它成为一个热门话题，是时候升级其功能了。同样不幸的是，来源没有评论（典型的>：-<），从外观上看，写这篇文章的人对x86指令集相当熟悉。我已经很多年没有从事组装工作了，当我这样做时，它是摩托罗拉组装。

在搜索互联网后，我从一位英特尔员工那里看到了这篇文章。如果我们的源代码没有比本文早大约 7 年，我会说这正是我们的 NoComments 先生开发人员学习执行 API 方法拦截的地方。这就是程序的相似之处。本文也总结在一个不错的pdf（拦截系统API调用）中，也可以从上述网站找到链接。

问题

我想真正了解英特尔网页链接中提供的示例，以便我可以很好地为 64 位场景创建解决方案。它有据可查，对我来说更容易理解。下面是 InterceptAPI（）例程的摘录。我添加了我自己的注释，用"//#"表示（原始注释由标准"//"表示），其中我解释了我认为我知道的和我不知道的：

BOOL InterceptAPI(HMODULE hLocalModule, const char* c_szDllName,
    const char* c_szApiName, DWORD dwReplaced, DWORD dwTrampoline, int offset) 
{ 
    //# Just a foreword.  One of the bigger mysteries of this routine to me is
    //# this magical number 5 and the offset variable.  Now I'm assuming, that
    //# there are 5 bytes at the beginning of every method that are basically 
    //# there to set up some sort of pre-method-jump context switch, since its
    //# about to leave the current method and jump to another.  So I'm guessing
    //# that for all scenarios, the minimum number of bytes is 5, but for some
    //# there may be more than 5 bytes so that's what the "offset" variable is
    //# for. In the aforementioned article, the author writes "One additional 
    //# complication exists, in that the sixth byte of the original code may be
    //# part of the previous instruction. In that case, the function overwrites
    //# part of the previous instruction and then crashes."  So some method
    //# starting code contains multi-byte opcodes while others don't apparently.
    //# And if you don't know the instruction set well enough, I'm guessing
    //# you'll just have to figure it out by trial and error.
    int i; 
    DWORD dwOldProtect;
    //# Fetching the address of the method that we want to capture and reroute
    //# Example: c_szDllName="user32",   c_szApiName="SelectObject"
    DWORD dwAddressToIntercept = (DWORD)GetProcAddress( 
        GetModuleHandle((char*)c_szDllName), (char*)c_szApiName); 

    //# Storing address of method we are about to intercept in another variable
    BYTE *pbTargetCode = (BYTE *) dwAddressToIntercept;
    //# Storing address of method we are going to use to take the place of the 
    //# intercepted method in another variable.
    BYTE *pbReplaced = (BYTE *) dwReplaced; 
    //# "Trampoline" appears to be a "Microsoft Detours" term, but its basically
    //# a pointer so that we can get to the original "implementation" of the method
    //# we are intercepting.  Most of the time your replacement function will
    //# want to call the original function so this is pretty important.  What its
    //# pointing to must already be pre allocated by the caller.  The author of
    //# the aforementioned article states "Prepare a dummy function that has the
    //# same declaration that will be used as the trampoline. Make sure the dummy
    //# function is more than 10 bytes long." I believe I'd prefer allocating this
    //# memory within this function itself just to make using this InterceptAPI()
    //# method easier, but this is the implementation as it stands.
    BYTE *pbTrampoline = (BYTE *) dwTrampoline; 

    // Change the protection of the trampoline region 
    // so that we can overwrite the first 5 + offset bytes.
    //# This is voodoo magic to me, but I'm guessing you just can't hop on the
    //# stack and start changing execute instructions without ringing some
    //# alarms, so this makes sure the alarms don't ring. Here we are allowing
    //# permissions so we can change the bytes at the beginning of our
    //# trampoline method.
    VirtualProtect((void *) dwTrampoline, 5+offset, PAGE_WRITECOPY, &dwOldProtect); 
    //# More voodoo magic to me, but this appears to be a way to copy over extra
    //# opcodes that may be needed.  Some opcodes are multi byte I believe so this
    //# is where you can make sure you don't miss them.
    for (i=0;i<offset;i++) 
        *pbTrampoline++ = *pbTargetCode++; 
    //# Resetting the pbTargetCode pointer since it was modified it in the above
    //# for loop.
    pbTargetCode = (BYTE *) dwAddressToIntercept; 

    // Insert unconditional jump in the trampoline.
    //# This is pretty understandable.  0xE9 the x86 JMP command.  I looked
    //# this up in Intel's documentation and it can be followed by a 16-bit
    //# offset or a 32-bit offset. The 16-bit version is not supported in 64-bit
    //# architecture but lets just hope they are all 32-bit and that this does
    //# indeed do what it is intended in 64-bit scenarios
    *pbTrampoline++ = 0xE9;        // jump rel32 
    //# So basically here it looks like we are following up our jump command with
    //# the address its supposed to jump too.  This is a relative offset, that's why
    //# we are subtracting pbTargetCode and pbTrampoline.  Also, since JMP opcodes
    //# jump relative to the address AFTER the jump address, that's why we are
    //# adding 4 to pbTrampoline.  Also, offset is added to pbTargetCode because we
    //# advanced the pointers in the for loop above an "offset" number of bytes.
    *((signed int *)(pbTrampoline)) = (pbTargetCode+offset) - (pbTrampoline + 4); 
    //# Not quite sure why we are changing the permissions on the trampoline function
    //# again, but looks like we are making it executable here.  Maybe this is the
    //# last thing we have to do before it is actually callable and usable.
    VirtualProtect((void *) dwTrampoline, 5+offset, PAGE_EXECUTE, &dwOldProtect); 

    // Overwrite the first 5 bytes of the target function 
    //# It seems we are now setting permissions so we can modify the original
    //# intercepted routine.  It is still pointing to its original code so we
    //# need to eventually redirect it.
    VirtualProtect((void *) dwAddressToIntercept, 5, PAGE_WRITECOPY, &dwOldProtect); 
    //# This will now instruct the original method to instead jump to the next
    //# address it sees on the stack.
    *pbTargetCode++ = 0xE9;        // jump rel32
    //# this is the address we want our original intercepted method to jump to.
    //# Where its jumping to will have the code of our replacement method.
    //# The "+ 4" is because the jump occurs relative to the address of the
    //# NEXT instruction after the 4byte address.
    *((signed int *)(pbTargetCode)) = pbReplaced - (pbTargetCode +4); 
    //# Changing the permissions of our original intercepted routine back to execute
    //# permissions so it can be called by other methods.
    VirtualProtect((void *) dwAddressToIntercept, 5, PAGE_EXECUTE, &dwOldProtect); 

    // Flush the instruction cache to make sure  
    // the modified code is executed.
    //# I guess this is just to make sure that if any instructions from the old
    //# state of the methods we changed, have wound up in cache, that it gets
    //# purged out of there before it gets used.
    FlushInstructionCache(GetCurrentProcess(), NULL, NULL); 
    return TRUE; 
}

我想我对这段代码中发生的事情有很好的理解。所以百万美元的问题是：这不适用于 64 位进程吗？ 我的第一个想法是，"哦，地址现在应该是8个字节，所以这一定是出了问题。但我认为 JMP 命令仍然只采用相对 32 位地址，因此即使在 64 位进程中使用 32 位地址，操作代码也应该仍然有效。除此之外，我唯一相信的可能是我们在方法调用开始时的神奇的 5 个字节实际上是其他一些神奇的数字。有人有更好的见解吗？

注意：我知道还有其他一些解决方案，例如"Microsoft绕道"和"EasyHook"。前者太贵了，我目前正在探索后者，但到目前为止令人失望。所以，我想把讨论专门放在这个话题上。我觉得它很有趣，也是解决我问题的最佳解决方案。所以请不要"嘿，我对这篇文章一无所知，但请尝试{在此处插入第三方解决方案}。

由于建议的代码看起来是Microsoft平台的目标，我建议您只使用Detours。使用Detours，您的蹦床将在32位和64位系统上工作。

有很多东西在你的例子中不起作用。

1）您正在虚拟保护PAGE_WRITECOPY，这将失败。你想虚拟保护PAGE_EXECUTE_READWRITE。

2）如果您的"填充程序"距离您尝试挂钩的dll超过4GB，则您的补丁跳转不起作用，因为您使用的是jmp指令的E9形式。

3）当您恢复VirtualProtect时，您正在保护PAGE_EXECUTE，而不是PAGE_EXECUTE_READ。在实践中，你实际上应该使用你从第一个VirtualProtect中获得的flProtect，这样你就可以很好地把它放回去。

顺便说一下，"神奇的数字5"是E9跳转指令操作码的大小，即E9作为字节，然后是DWORD作为偏移量。

蹦床是这样你就可以从代码内部回调到原始API（即，如果你正在填充CreateFileW，你不能从填充程序内部调用CreateFileW，否则你最终会调用你的填充程序！）。

对 FlushInstruction Cache 的调用对 x86/x64 没有影响。您应该将其删除。