blast 发布的文章

以子之矛,陷子之盾,何如?——使用ChakraCore来Fuzz NScript (2)

我在很早以前做了一个多进程的Fuzz程序,其实最终的实现很简单,就是类似一个使用socket互相通信的程序,好似QQ互相发送消息一样。Server和Client约定好一组消息,互相通信。

代码实现很简单,百度随便一搜就是一大堆,比如:http://blog.csdn.net/yaopeng_2005/article/details/6696105,可以直接拿来用。NScript处,因为我需要将待分析的代码传给rs函数,在我没有传递完成前,整个流程都可以停滞。因此NScript作为客户端,只需要一直等待Server传来的Fuzz代码,并从Buffer中取出代码,传递给rs即可。

nscript.png

callClient.cpp

// callClient.cpp : 定义控制台应用程序的入口点。
//

#include "stdafx.h"
#include <stdio.h>  
#include <Winsock2.h>  

#pragma comment(lib, "ws2_32.lib")   


void main()  
{  
    WORD wVersionRequested = MAKEWORD(1, 1);   
    WSADATA wsaData;  
    int err = WSAStartup(wVersionRequested, &wsaData);

    if (err != 0) 
    {  
        return;  
    }  

    if (LOBYTE(wsaData.wVersion ) != 1 || HIBYTE(wsaData.wVersion) != 1) 
    {   
        WSACleanup();  
        return;  
    }  

    for (int index = 0; ; index++)  
    {  
        SOCKET sockClient = socket(AF_INET,SOCK_DGRAM, 0);  

        int len = sizeof(SOCKADDR);  

        SOCKADDR_IN local;  
        local.sin_addr.S_un.S_addr = inet_addr("127.0.0.1");   
        local.sin_family = AF_INET;   
        local.sin_port = htons(27015);   

        DWORD dwSize = MAX_PATH;
        char sendBuf[MAX_PATH] = { 0 };  
        sprintf(sendBuf, "%3d,", index); 

        //client(ChakraCore) shall send a file name to client. incase of buffer has a limit of 64kb.
        strcat(sendBuf, "c:\\test\\wow.js"); 
        sendto(sockClient, sendBuf, strlen(sendBuf) + 1, 0, (SOCKADDR*)&local, len);  

        /*char recvBuf[50];  
        recvfrom(sockClient,recvBuf,50,0,(SOCKADDR*)&local,&len);  
        printf("my reply is : %s\n",recvBuf);  
        printf("%s\n",inet_ntoa(local.sin_addr));  
        */

        closesocket(sockClient);  
        Sleep(2000);       
        WSACleanup();  
    }  
}  

callServer.cpp

// callServer.cpp : 定义控制台应用程序的入口点。
//

#include "stdafx.h"
#include <stdio.h>  
#include <Winsock2.h>  

#pragma comment(lib, "ws2_32.lib")   

void main()  
{  
    WORD wVersionRequested = MAKEWORD(1, 1);
    WSADATA wsaData;  
    int err = WSAStartup(wVersionRequested, &wsaData);  

    if (err != 0) {  
        return;  
    }  

    if ( LOBYTE(wsaData.wVersion) != 1 || HIBYTE(wsaData.wVersion) != 1) {   
        WSACleanup();  
        return;  
    }  

    SOCKET sockSrv = socket(AF_INET,SOCK_DGRAM, 0);  

    int len = sizeof(SOCKADDR);  

    SOCKADDR_IN from;     
    SOCKADDR_IN local;   
    local.sin_addr.S_un.S_addr = htonl(INADDR_ANY);   
    local.sin_family = AF_INET;   
    local.sin_port = htons(27015);   

    int bindResult = bind(sockSrv, (SOCKADDR*)&local, len);  
    while(1)  
    {  
        DWORD dwSize = MAX_PATH;
        char recvBuf[MAX_PATH];  

        //server(NScript) will receive a buffer contains filename of generated fuzzer.

        recvfrom(sockSrv, recvBuf, dwSize, 0, (SOCKADDR*)&from, &len);

        printf("%s\n", recvBuf);  
        printf("%s\n", inet_ntoa(local.sin_addr));  

        /*char sendBuf[50];  
        sprintf(sendBuf, "Welcome %s to here!", inet_ntoa(from.sin_addr));    
        sendto(sockSrv, sendBuf, strlen(sendBuf) + 1, 0, (SOCKADDR*)&from, len);  
        */

        Sleep(2000);  
    }  
    closesocket(sockSrv);  

    WSACleanup();  
}  

而Client端如果发生异常,处理起来就更容易了,使用SetUnhandledExceptionFilter可以设置TopLevelExceptionFilter。这是一个定义为

LONG WINAPI UnhandledExceptionFilter(__in struct _EXCEPTION_POINTERS* pExceptionInfo)

的函数,可以在pExceptionInfo中取到异常信息并记录。

所以,这里的实现比较容易,
1、在NScript侧,WinMain进入时,调用
SetUnhandledExceptionFilter()

int WinMain(...)
{
    ...
    LPTOP_LEVEL_EXCEPTION_FILTER originalFilter = SetUnhandledExceptionFilter(MyUnhandledExceptionFilter);
    ...
}

2、MyUnhandledExceptionFilter里面可以收集异常信息并告警

LONG WINAPI MyUnhandledExceptionFilter(__in struct _EXCEPTION_POINTERS* pExcepInfo)
{
    DWORD dwCrashPid = GetCurrentProcessId();
    DWORD dwCrashTid = GetCurrentThreadId();

    pExcepInfo->ExceptionRecord.ExceptionCode;
    pExcepInfo->ExceptionRecord.ExceptionAddress;

    alert!
    ...

}

3、NScript部分,循环等待,并在接受到数据时调用__rs

4、ChakraCore部分

JsValueRef __stdcall WScriptJsrt::EchoCallback(JsValueRef callee, bool isConstructCall, JsValueRef *arguments, unsigned short argumentCount, void *callbackState)
{
    for (unsigned int i = 1; i < argumentCount; i++)
    {
        ……
                if (i > 1)
                {
                    wprintf(_u(" "));
                }

                ////////////////////////////////////////////////////////
                SEND CODE HERE
                ////////////////////////////////////////////////////////

                wprintf(_u("%ls"), str.GetWideString());
            }
        }

        ……
    }
……

使用ChakraCore来Fuzz NScript

NScript是JScript的子集,前面的文章(http://www.nul.pw/2017/06/12/237.html)我介绍了使用NScript的方式,也提到了语法混乱,写的让人头疼的Mozilla Funfuzz的最终精简版本(http://www.nul.pw/2017/06/12/235.html),以及使用ChakraCore运行Funfuzz的壮观景象。

因为NScript事实上是一个非常精简的子集,以至于不能够支持Funfuzz那一大堆雄伟的语法,如下图:

nscript.png

事实上,先不说NScript,就连微软引以为豪的ChakraCore要支持起来都十分吃力,但是好在Funfuzz虽然代码写得奔放,但是最终生成的测试用例却格外的“人性化”。

转载请标明来源: http://nul.pw/
本文作者:blast

不过基本上看,NScript还是表现的和JScript很相似的:

代码测试
input:function p(){log(myVar)}; function q(){var myVar=2; p(); log(myVar);}; var myVar=1; log(myVar); q();
evaluation result: 1
evaluation result: 1
evaluation result: 2

为了能够取到Funfuzz的运行结果(即生成的Fuzz语句),而不修改Funfuzz的代码,(那一堆变量我实在是不敢修改啊)。我要选择更容易下手的——ChakraCore的代码。ChakraCore的print实现了类似console.log的功能,而我们可以在FunFuzz中print一下,来记录代码。

而只要我们知道print的代码,我们就可以在print时,将代码抛给NScript,这样,我们就可以同时Fuzz ChakraCore和NScript了。我知道你在想什么,wsprintf系列并不合适,因为不止print,ChakraCore的很多代码都依附类似的函数输出数据,所以我们要找到更精确的print。

print是一个很常见的名词,因此搜索并不管用,我们会搜出一大堆带print的函数。而ChakraCore也不支持阻塞的函数,比如alert之类的,这样就让我们的调试显得难以入手。不过还好,Math类下有一大堆名字独特的函数,比如反正切函数atan。Math.atan,我们搜索atan(之后,可以发现Js::Math::Atan的位置,给Atan的入口下断点。如果你不想多看一堆转换,最好是下一个参数的那个Atan。我们的代码是print(Math.atan(4))。运行后,断点停止在Atan处:

>   ChakraCore.dll!Js::Math::Atan(double x) Line 263    C++
    ChakraCore.dll!Js::Math::Atan(Js::RecyclableObject * function, Js::CallInfo callInfo, ...) Line 246 C++
    ChakraCore.dll!Js::LocalCallFunction(Js::RecyclableObject * function, void * (Js::RecyclableObject *, Js::CallInfo, ...) * entryPoint, Js::Arguments args, bool doStackProbe) Line 1313 C++
    ChakraCore.dll!Js::JavascriptFunction::CallFunction<1>(Js::RecyclableObject * function, void * (Js::RecyclableObject *, Js::CallInfo, ...) * entryPoint, Js::Arguments args) Line 1329  C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, const Js::AuxArray<unsigned int> * spreadIndices) Line 3902 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfileCallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, unsigned short profileId, unsigned int inlineCacheIndex, const Js::AuxArray<unsigned int> * spreadIndices) Line 3927 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfiledCallIWithICIndex<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, unsigned int flags) Line 456    C++
    ChakraCore.dll!Js::InterpreterStackFrame::ProcessProfiled() Line 86 C++
    ChakraCore.dll!Js::InterpreterStackFrame::Process() Line 3452   C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterHelper(Js::ScriptFunction * function, Js::ArgumentReader args, void * returnAddress, void * addressOfReturnAddress, const bool isAsmJs) Line 2039  C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterThunk(Js::JavascriptCallStackLayout * layout) Line 1776    C++

我们的目标当然不是Atan,而是外面的print,返回后跟踪,可以看到ChakraCore取出Next Op,并在这里:

>   ChakraCore.dll!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, const Js::AuxArray<unsigned int> * spreadIndices) Line 3888 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfileCallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, unsigned short profileId, unsigned int inlineCacheIndex, const Js::AuxArray<unsigned int> * spreadIndices) Line 3927 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfiledCallIWithICIndex<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, unsigned int flags) Line 456    C++
    ChakraCore.dll!Js::InterpreterStackFrame::ProcessProfiled() Line 86 C++
    ChakraCore.dll!Js::InterpreterStackFrame::Process() Line 3452   C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterHelper(Js::ScriptFunction * function, Js::ArgumentReader args, void * returnAddress, void * addressOfReturnAddress, const bool isAsmJs) Line 2039  C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterThunk(Js::JavascriptCallStackLayout * layout) Line 1776    C++
    [External Code] 

可以看到调用:

JavascriptFunction::CallFunction<true>(function, function->GetEntryPoint(), args);

这里的function就是最终实现print的函数。可以看到ChakraCore在这里使用了JSRT来扩展(参考我翻译的文章:http://tem.pw/?x=entry:entry170620-222216)实现了print,无论怎样,我们找到了print的位置:

-       function    0x0210c540 {signature=0x00000000 callbackState=0x00000000 nativeMethod=0x00251bc0 {ch.exe!WScriptJsrt::EchoCallback(void *, bool, void * *, unsigned short, void *)} ...}   Js::RecyclableObject *
-       [Js::JavascriptExternalFunction]    {signature=0x00000000 callbackState=0x00000000 nativeMethod=0x00251bc0 {ch.exe!WScriptJsrt::EchoCallback(void *, bool, void * *, unsigned short, void *)} ...}  Js::JavascriptExternalFunction
+       Js::RuntimeFunction {functionNameId=0x0210a720 }    Js::RuntimeFunction
        signature   0x00000000  void *
        callbackState   0x00000000  void *
        nativeMethod    0x00251bc0 {ch.exe!WScriptJsrt::EchoCallback(void *, bool, void * *, unsigned short, void *)}   void * (Js::RecyclableObject *, Js::CallInfo, void * *) *
+       wrappedMethod   0x00251bc0 {ch.exe!WScriptJsrt::EchoCallback(void *, bool, void * *, unsigned short, void *)} {signature=...}   Js::JavascriptExternalFunction *
        stdCallNativeMethod 0x00251bc0 {ch.exe!WScriptJsrt::EchoCallback(void *, bool, void * *, unsigned short, void *)}   void * (void *, bool, void * *, unsigned short, void *) *
        initMethod  0x00000000  HRESULT (void *) *
        oneBit  1   unsigned int
        typeSlots   0   unsigned int
        hasAccessors    0   unsigned int
        unused  0   unsigned int
        prototypeTypeId -1  int
        flags   0   unsigned __int64
+       FinalizableObject   {...}   FinalizableObject
-       type    0x020f2c80 {typeId=TypeIds_Function (26) flags=TypeFlagMask_None (0 '\0') javascriptLibrary=0x02110000 {...} ...}   Js::Type *
        typeId  TypeIds_Function (26)   Js::TypeId
        flags   TypeFlagMask_None (0 '\0')  TypeFlagMask
+       javascriptLibrary   0x02110000 {cacheForCopyOnAccessArraySegments=0x02122000 {cache=0x02122000 {0x00000000 <NULL>, 0x00000000 <NULL>, ...} ...} ...}    Js::JavascriptLibrary *
-       prototype   0x020f2540 {constructorCache=0x10a921b0 {ChakraCore.dll!Js::ConstructorCache Js::ConstructorCache::DefaultInstance} {...} ...}  Js::RecyclableObject *
-       [Js::JavascriptFunction]    {constructorCache=0x10a921b0 {ChakraCore.dll!Js::ConstructorCache Js::ConstructorCache::DefaultInstance} {...} ...} Js::JavascriptFunction
+       Js::DynamicObject   {auxSlots=0x00000000 {???} objectArray=0x00000000 <NULL> arrayFlags=None (0) ...}   Js::DynamicObject
+       constructorCache    0x10a921b0 {ChakraCore.dll!Js::ConstructorCache Js::ConstructorCache::DefaultInstance} {guard={value=...} ...}  Js::ConstructorCache *
+       functionInfo    0x10a93dd8 {ChakraCore.dll!Js::FunctionInfo Js::JavascriptFunction::EntryInfo::PrototypeEntryPoint} {...}   Js::FunctionInfo *
+       FinalizableObject   {...}   FinalizableObject
+       type    0x020f2520 {typeId=TypeIds_Function (26) flags=TypeFlagMask_None (0 '\0') javascriptLibrary=0x02110000 {...} ...}   Js::Type *
        entryPoint  0x0fe56270 {ChakraCore.dll!Js::JavascriptExternalFunction::StdCallExternalFunctionThunk(Js::RecyclableObject *, Js::CallInfo, ...)} void * (Js::RecyclableObject *, Js::CallInfo, ...) *
+       propertyCache   0x00000000 <NULL>   Js::TypePropertyCache *

为WScriptJsrt::EchoCallback下断点,运行过去,栈如下:

>   ch.exe!WScriptJsrt::EchoCallback(void * callee, bool isConstructCall, void * * arguments, unsigned short argumentCount, void * callbackState) Line 105  C++
    ChakraCore.dll!Js::JavascriptExternalFunction::StdCallExternalFunctionThunk(Js::RecyclableObject * function, Js::CallInfo callInfo, ...) Line 275   C++
    ChakraCore.dll!Js::LocalCallFunction(Js::RecyclableObject * function, void * (Js::RecyclableObject *, Js::CallInfo, ...) * entryPoint, Js::Arguments args, bool doStackProbe) Line 1313 C++
    ChakraCore.dll!Js::JavascriptFunction::CallFunction<1>(Js::RecyclableObject * function, void * (Js::RecyclableObject *, Js::CallInfo, ...) * entryPoint, Js::Arguments args) Line 1329  C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, const Js::AuxArray<unsigned int> * spreadIndices) Line 3891 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfileCallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, Js::RecyclableObject * function, unsigned int flags, unsigned short profileId, unsigned int inlineCacheIndex, const Js::AuxArray<unsigned int> * spreadIndices) Line 3927 C++
    ChakraCore.dll!Js::InterpreterStackFrame::OP_ProfiledCallIWithICIndex<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > >(const Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallIWithICIndex<Js::LayoutSizePolicy<0> > > * playout, unsigned int flags) Line 456    C++
    ChakraCore.dll!Js::InterpreterStackFrame::ProcessProfiled() Line 86 C++
    ChakraCore.dll!Js::InterpreterStackFrame::Process() Line 3452   C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterHelper(Js::ScriptFunction * function, Js::ArgumentReader args, void * returnAddress, void * addressOfReturnAddress, const bool isAsmJs) Line 2039  C++
    ChakraCore.dll!Js::InterpreterStackFrame::InterpreterThunk(Js::JavascriptCallStackLayout * layout) Line 1776    C++
    [External Code] 

对AutoString str进行检视:

-       str {length=18 data=0x00486730 "1.3258176636680325" data_wide=0x00000000 <NULL> ...}    AutoString
        length  18  unsigned int
+       data    0x00486730 "1.3258176636680325" char *
+       data_wide   0x00000000 <NULL>   wchar_t *
        errorCode   JsNoError (0)   _JsErrorCode
        dontFree    false   bool

值1.3258176636680325确实就是Atan(4)的结果。

所以,我们只要修改这个函数,用它来和NScript进行通信,就可以达到我们想要的效果。

NScript:有符号的64位和无符号的32位(?)

最近有一个比较扯淡的学业水平试题,学业水平考试一向是走形式,没想大厂微软也做了类似的事情。微软的公共符号不仅方便微软找问题,也方便了安全人员找问题,不过最近在看MSE的NScript时,发现了微软一个好玩的东西。

Ps0c-fyhfxph1878995.jpg

不知道微软是32位被报的多了还是怎么的,在它的公共符号服务器上还就是没有32位DLL的符号(无论是直接下载,还是使用虚拟机看x86自带的Windows Defender,最后结果都是——没有)。但是64位的符号却有,这就让人十分不解了。

(32bit)1.1.13804.0 From Windows Defender
E:\Program Files (x86)\Debugging Tools for Windows>symchk E:\Users\BlastTS\Desktop\wLoadMpEngine\Debug\mpengine.dll
SYMCHK: mpengine.dll         FAILED  - mpengine.pdb mismatched or not found

SYMCHK: FAILED files = 1
SYMCHK: PASSED + IGNORED files = 0

(64,win7)1.1.13701.0 From Windows Defender
E:\Program Files (x86)\Debugging Tools for Windows>symchk E:\msmpeng\x64-w7\mpengine.dll

SYMCHK: FAILED files = 0
SYMCHK: PASSED + IGNORED files = 1

如果你觉得是版本的问题,那也有例外,win10 64bit:1.1.13103.0 也就没有符号。所以,不知道微软葫芦里卖的什么药,难不成是正式版和非正式版的区别?不过这可都是你官网下发的啊。

这里主要最近看了taviso的linux加载MSE引擎并evaluate的例子,我想着在Windows下直接用系统的方式加载DLL并传入数据进行Fuzz。但是可怕的是没有符号,也就是说我需要手动定位我要hook的函数在哪里,而且修改它,让它跳转到我的函数中。

taviso的方式类似grinder的处理方式,即hook strtod来做logging。传入的字符是__log:开头的时候,就会认为是我们的log,否则调用正常的strtod逻辑。这个strtod被parseFloat调用:(后续的代码截图全是基于x64的)

out.jpg

不过为啥要放出上面的图呢,因为……即使32位无符号,像strtod这样不会有代码变动的函数(微软的编译选项中,strtod的代码是被直接静态写入DLL中的,不是对MSVC库的调用),只要和x64的代码一对比就可以找到位置。通过对比x64的JsDelegateObject_Global::parseFloat,就可以轻易地找到strtod,并在我们的代码中对其进行Hook。一下午的操作之后,我们就能直接调用NScript的接口,并且正确的得到输出了:

out.jpg

相关的代码之后我会整理放出。

精简了一下mozilla的funfuzzer,有兴趣的可以直接拿去玩

mozilla的那套框架真是令人excited,代码耦合的程度那是相当的高,各种依赖,如果你只是想测试一下你的js引擎,而且还是第一次接触funfuzzer,我估计你看完那一堆需求和写的不明不白的官方文档之后会疯掉。

老实说,你开源了就做个一键能用的嘛,这么麻烦作甚。好在fuzz代码和框架其实是某种意义上的非耦合的。很简单的就可以提取出funfuzzer的fuzz代码,脚本也是十分简单,提供如下。

extract-funfuzzer.bat

echo "//by blast @nul.pw">>output.js
copy /b output.js+.\jsfunfuzz\preamble.js output.js

copy /b output.js+.\jsfunfuzz\detect-engine.js output.js
copy /b output.js+.\jsfunfuzz\avoid-known-bugs.js output.js
copy /b output.js+.\jsfunfuzz\error-reporting.js output.js

copy /b output.js+.\shared\random.js output.js
copy /b output.js+.\shared\mersenne-twister.js output.js
copy /b output.js+.\shared\testing-functions.js output.js

copy /b output.js+.\jsfunfuzz\built-in-constructors.js output.js

copy /b output.js+.\jsfunfuzz\mess-tokens.js output.js
copy /b output.js+.\jsfunfuzz\mess-grammar.js output.js

copy /b output.js+.\jsfunfuzz\gen-asm.js output.js
copy /b output.js+.\jsfunfuzz\gen-math.js output.js
copy /b output.js+.\jsfunfuzz\gen-grammar.js output.js
copy /b output.js+.\jsfunfuzz\gen-proxy.js output.js
copy /b output.js+.\jsfunfuzz\gen-recursion.js output.js
copy /b output.js+.\jsfunfuzz\gen-regex.js output.js
copy /b output.js+.\jsfunfuzz\gen-stomp-on-registers.js output.js
copy /b output.js+.\jsfunfuzz\gen-type-aware-code.js output.js

copy /b output.js+.\jsfunfuzz\test-asm.js output.js
copy /b output.js+.\jsfunfuzz\test-math.js output.js
copy /b output.js+.\jsfunfuzz\test-regex.js output.js
copy /b output.js+.\jsfunfuzz\test-consistency.js output.js
copy /b output.js+.\jsfunfuzz\test-misc.js output.js

copy /b output.js+.\jsfunfuzz\driver.js output.js

copy /b output.js+.\jsfunfuzz\run-reduction-marker.js output.js

copy /b output.js+.\jsfunfuzz\run-in-sandbox.js output.js
copy /b output.js+.\jsfunfuzz\run.js output.js
copy /b output.js+.\jsfunfuzz\tail.js output.js

找到对应文件,把它们全部拷出来吧。然后,在v8或者chakracore中直接跑就可以了。什么,spidermonkey shell?no no,用户量太小了,并不想跑。

chakracore跑起来的壮观景象如下:

ch.jpg

当然,这个复现加跟踪啥的还是很麻烦的,我现在也打算做一个在线fuzz的接口,之后会开源一些工具和简单框架,这样,你就可以手动的fuzz+跟踪了,不用在VS中一遍遍的点着重启调试进程了:)。

注:grinder一样可以。只不过ie和edge在某种意义上,对ES6的支持并不好,你会看到各种奇妙的报错,如果你有兴趣,也可以做一些向下兼容的操作。

ChakraCore 一起读代码 - 01 - 字节码的生成

本文作者blast。
首发于nul.pw,转载请保留此行。

ChakraCore 是微软开源的 Microsoft Edge 浏览器 Chakra JavaScript 引擎的核心部分,主要用于
Microsoft Edge 和 Windows 中 HTML/CSS/JavaScript 编写的应用。

ChakraCore 支持 x86/x64/ARM 架构 JavaScript 的 Just-in-time (JIT)
编译,垃圾收集和大量的最新 JavaScript 特性。ChakraCore 还支持 JavaScript Runtime (JSRT)
APIs,允许用户简单嵌入 ChakraCore 到应用中。

ChakraCore 是一个功能完整的、独立的 JavaScript 虚拟机,可嵌入到衍生产品中,驱动需要脚本功能的产品如 NoSQL
数据库、生产力工具和游戏引擎。ChakraCore 现阶段只支持 Windows,但微软表示将类似 .NET 开源项目加入跨平台支持。
via http://www.oschina.net/p/chakracore

本文采用#2720的代码为例,使用的测试代码为:

var a;
const b = 4;

a = 3;

print(a+b);

略过前方的栈,我们从编译的地方开始说起。ChakraCore在Js::ScriptContext::LoadScript开始“正式”加载脚本。

>   ChakraCore.dll!Js::ScriptContext::LoadScript(const unsigned char * script, unsigned int cb, const SRCINFO * pSrcInfo, CompileScriptException * pse, Js::Utf8SourceInfo * * ppSourceInfo, const wchar_t * rootDisplayName, LoadScriptFlag loadScriptFlag, void * scriptSource) 行 1948    C++     ChakraCore.dll!RunScriptCore::__l2::<lambda>(Js::ScriptContext * scriptContext, TTD::TTDJsRTActionResultAutoRecorder &
_actionEntryPopper) 行 2952  C++     ChakraCore.dll!ContextAPINoScriptWrapper::__l2::<lambda>(Js::ScriptContext
* scriptContext) 行 294  C++     ChakraCore.dll!ContextAPINoScriptWrapper_Core<_JsErrorCode <lambda>(Js::ScriptContext *)
>(ContextAPINoScriptWrapper::__l2::_JsErrorCode <lambda>(Js::ScriptContext *) fn, bool allowInObjectBeforeCollectCallback, bool scriptExceptionAllowed) 行 254   C++     ChakraCore.dll!ContextAPINoScriptWrapper<_JsErrorCode <lambda>(Js::ScriptContext *, TTD::TTDJsRTActionResultAutoRecorder &)
>(RunScriptCore::__l2::_JsErrorCode <lambda>(Js::ScriptContext *, TTD::TTDJsRTActionResultAutoRecorder &) fn, bool allowInObjectBeforeCollectCallback, bool scriptExceptionAllowed) 行 291   C++     ChakraCore.dll!RunScriptCore(void * scriptSource, const unsigned char * script, unsigned int cb, LoadScriptFlag loadScriptFlag, unsigned long sourceContext, const wchar_t * sourceUrl, bool parseOnly, _JsParseScriptAttributes parseAttributes, bool isSourceModule, void * * result) 行 2901  C++     ChakraCore.dll!CompileRun(void * scriptVal, unsigned long sourceContext, void * sourceUrl, _JsParseScriptAttributes parseAttributes, void * * result, bool parseOnly) 行 4304    C++     ChakraCore.dll!JsRun(void * scriptVal, unsigned long sourceContext, void * sourceUrl, _JsParseScriptAttributes parseAttributes, void * * result) 行 4326 C++     ch.exe!ChakraRTInterface::JsRun(void * script, unsigned long sourceContext, void * sourceUrl,
_JsParseScriptAttributes parseAttributes, void * * result) 行 378    C++     ch.exe!RunScript(const char * fileName, const char * fileContents, void(__stdcall*)(void *) fileContentsFinalizeCallback, void * bufferValue, char * fullPath) 行 450    C++     ch.exe!ExecuteTest(const char * fileName) 行 744 C++     ch.exe!ExecuteTestWithMemoryCheck(char * fileName) 行 786    C++     ch.exe!StaticThreadProc(void * lpParam) 行 887   C++     ch.exe!invoke_thread_procedure(unsigned int(__stdcall*)(void
*) procedure, void * const context) 行 92    C++     ch.exe!thread_start<unsigned int (__stdcall*)(void *)>(void * const parameter) 行 115    C++

函数定义及起始位置如下;

ChakraCore.dll!Js::ScriptContext::LoadScript(const unsigned char * script, unsigned int cb, const SRCINFO * pSrcInfo, CompileScriptException * pse, Js::Utf8SourceInfo * * ppSourceInfo, const wchar_t * rootDisplayName, LoadScriptFlag loadScriptFlag, void * scriptSource) 行 1948

pro1.GIF

这个函数开始,就是正儿八经的解析脚本了。在LoadScript中,我们跟入调用《ParseScript》。

        ParseNodePtr parseTree = ParseScript(&parser, script, cb, pSrcInfo,
            pse, ppSourceInfo, rootDisplayName, loadScriptFlag,
            &sourceIndex, scriptSource);

看这个调用,这里许多参数是上层传递而来,所以在该函数中,ChakraCore会将传递来的信息保存到一块新申请的堆上内存中。

            if(*ppSourceInfo == nullptr)
            {
#ifndef NTBUILD
                if (loadScriptFlag & LoadScriptFlag_ExternalArrayBuffer)
                {
                    *ppSourceInfo = Utf8SourceInfo::NewWithNoCopy(this,
                        script, (int)length, cb, pSrcInfo, isLibraryCode,
                        scriptSource);
                }

Utf8SourceInfo::NewWithNoCopy中,ChakraCore创建内存,保存传入的这片源代码,然后返回这个新申请的内存。在检查了一些开关后,正常情况下在这里解析:

    ParseNodePtr parseTree;
    if((loadScriptFlag & LoadScriptFlag_Utf8Source) == LoadScriptFlag_Utf8Source)
    {
        hr = parser->ParseUtf8Source(&parseTree, script, cb, grfscr, pse,
            &sourceContextInfo->nextLocalFunctionId, sourceContextInfo);
    }

parser的定义是Parser*。ParseUtf8Source里,函数调用ParseSourceInternal:

HRESULT Parser::ParseUtf8Source(__out ParseNodePtr* parseTree, LPCUTF8 pSrc, size_t length, ULONG grfsrc, CompileScriptException *pse,
    Js::LocalFunctionId * nextFunctionId, SourceContextInfo * sourceContextInfo)
{
    m_functionBody = nullptr;
    m_parseType = ParseType_Upfront;
    return ParseSourceInternal( parseTree, pSrc, 0, length, 0, true, grfsrc, pse, nextFunctionId, 0, sourceContextInfo);
}

2 Parser::PrePareScanner -初始化HashTable、Scanner

pro2.GIF

void Parser::PrepareScanner(bool fromExternal)
{
    // NOTE: HashTbl and Scanner are currently allocated from the CRT heap. If we want to allocate them from the
    // parser arena, then we also need to change the way the HashTbl allocates PID's from its underlying
    // allocator (which also currently uses the CRT heap). This is not trivial, because we still need to support
    // heap allocation for the colorizer interface.

    // create the hash table and init PID members
    if (nullptr == (m_phtbl = HashTbl::Create(HASH_TABLE_SIZE)))
        Error(ERRnoMemory);
InitPids();

HASH_TABLE_SIZE是255。Create直接在CRT堆上分配了空间。HeapNewNoThrow这个wrapper最终实际就是调用了new。

HashTbl * HashTbl::Create(uint cidHash)
{
    HashTbl * phtbl;

    if (nullptr == (phtbl = HeapNewNoThrow(HashTbl)))
        return nullptr;
    if (!phtbl->Init(cidHash))
    {
        delete phtbl;  // invokes overrided operator delete
        return nullptr;
    }

    return phtbl;
}

再看看Init。

BOOL HashTbl::Init(uint cidHash)
{
    // cidHash must be a power of two
Assert(cidHash > 0 && 0 == (cidHash & (cidHash - 1)));

……

检查传入的值是否大于0且是2的倍数。
计算实际需要分配的大小,大小为cidHash * sizeof (Ident *), 32位上基本就是cidHash * 4了。输出到cbTemp。并检查是否出现整数溢出的情况。这个大小就是待会儿HashTable的空间。

uint cbTemp;
if (FAILED(UIntMult(cidHash, sizeof(Ident *), &cbTemp)) || cbTemp > LONG_MAX)
    return FALSE;

然后成员变量m_noReleaseAllocator所代表的,在不可释放的内存区域申请内存,大小为刚刚计算出来的值。然后清零内存。

    cb = cbTemp;
    if (nullptr == (m_prgpidName = (Ident **)m_noReleaseAllocator.Alloc(cb)))
        return FALSE;
    memset(m_prgpidName, 0, cb);

返回PrepareScanner后,调用InitPids。
图:m_rpid

InitPids中,调用CaseSensitiveComputeHash为各个常用的值做Hash并保存起来。算法如下:

ULONG CaseSensitiveComputeHash(LPCOLESTR prgch, LPCOLESTR end)
{
    ULONG luHash = 0;

    while (prgch < end)
    {
        luHash = 17 * luHash + *(char16 *)prgch++;
    }
    return luHash;
}

调用PidHashNameLenWithHash,如果已经添加进HashTable了,返回pid(Pointer to Identifier)。并根据条数决定是否需要调整bucket的大小。
然后,为当前Identifier分配大小,大小为字节长度len

(long)((Len + 1) * sizeof(OLECHAR) + sizeof(*pid))

以Len==9为例,32位下为:10*4+8 = 48。

再次分配(IdentPtr)m_noReleaseAllocator.Alloc(cb),生成一个Identifier的指针。并将指针插入hash list。IdentPtr是一个链表节点。然后,增加计数并填充刚刚插入的pid。

    /* Insert the identifier into the hash list */
    *ppid = pid;

    // Increment the number of entries in the table.
    m_luCount++;

    /* Fill in the identifier record */
    pid->m_pidNext = nullptr; //下一个节点
    pid->m_tk = tkLim;     //token类型
    pid->m_grfid = fidNil;    //flag
    pid->m_luHash = luHash;  //hash值
    pid->m_cch = cch;  //hash前字节长度
    pid->m_pidRefStack = nullptr;  //refStack
    pid->m_propertyId = Js::Constants::NoProperty;   //
    pid->assignmentState = NotAssigned;  //
    pid->isUsedInLdElem = false;     //

    HashTbl::CopyString(pid->m_sz, prgch, end); //将原始字符拷进去

就这样重复填充完wellKnownPropertyPids。

回到void Parser::PrepareScanner(bool fromExternal)。
现在,程序开始创建scanner。

// create the scanner 
    if (nullptr == (m_pscan = Scanner_t::Create(this, m_phtbl, &m_token, m_scriptContext)))
    Error(ERRnoMemory);

scanner的创建一样也是在CRT堆上的,实际上调用了:

    static Scanner * Create(Parser* parser, HashTbl *phtbl, Token *ptoken, Js::ScriptContext *scriptContext)
    {
        return HeapNewNoThrow(Scanner, parser, phtbl, ptoken, scriptContext);
}

这也是个new的wrap。
回到上层函数,因为我们是从外部加载的,所以触发了这个分支。

if (fromExternal)
    m_pscan->FromExternalSource();

分支设置从外部扫描代码所需的开关。

    // If we get UTF8 source buffer, turn off doAllowThreeByteSurrogates but allow invalid WCHARs without replacing them with replacement 'g_chUnknown'.
void FromExternalSource() { m_decodeOptions = (utf8::DecodeOptions)(m_decodeOptions & ~utf8::doAllowThreeByteSurrogates | utf8::doAllowInvalidWCHARs); }

到此,PrepareScanner完成。

3 Parser::Parse - 创建AST

pro2.GIF

Parser::Parse的定义如下:

ParseNodePtr Parser::Parse(LPCUTF8 pszSrc, size_t offset, size_t length, charcount_t charOffset, ULONG grfscr, ULONG lineNumber, Js::LocalFunctionId * nextFunctionId, CompileScriptException *pse)

这个函数就是开始将文本处理成抽象语法树的入口函数了。

bool isDeferred = (grfscr & fscrDeferredFnc) != 0;
bool isModuleSource = (grfscr & fscrIsModuleCode) != 0;

m_grfscr = grfscr; //FLAG
m_length = length;  //长度
m_originalLength = length; //初始长度。现在为止,长度和初始长度是一样的。
m_nextFunctionId = nextFunctionId; //目前指向0。

PhaseDeferred_Upfront

pro4.GIF

我提供的例子没有任何函数,解析器从global(glo)开始解析。
首先,调用setText初始化scanner,把代码放入scanner并开始处理。

template<typename EncodingPolicy>
tokens Scanner<EncodingPolicy>::ScanCore(bool identifyKwds)

在ScanCore中,ChakraCore开始真正“分析”。程序逐字读入输入的脚本。在我的例子中

var a;
const b = 4;

a = 3;

print(a+b);

扫描到第一个字v之后,它会继续往后扫,直到扫出完整的var之后,记为一个token。(tkVAR == 57)然后,指针移动到var a的a处。

case 'v':
    if (identifyKwds)
    {
        switch (p[0]) {
        case 'a':
            if (p[1] == 'r' && !IsIdContinueNext(p+2, last)) {
                p += 2;
                token = tkVAR;
                goto LReserved;
            }

获取完这第一个token后,稍微停一下,程序需要返回外层,创建主knopProg节点并初始化。

然后,程序开始为const和let创建区域。在ES6中,const和let在范围上是等价的

template<bool buildAST>
ParseNodePtr Parser::StartParseBlock(PnodeBlockType blockType, ScopeType scopeType, ParseNodePtr pnodeLabel, LabelId* pLabelId)

StartParseBlock为Global创建一个Scope。调用PushStmt,初始化AST。

pro2.GIF
回到Parser::Parse。程序开始对Statement List进行处理,statement就是各种变量定义:

    // Process a sequence of statements/declarations
    ParseStmtList<true>(
        &pnodeProg->sxFnc.pnodeBody,
        &lastNodeRef,
        SM_OnGlobalCode,
        !(m_grfscr & fscrDeferredFncExpression) /* isSourceElementList */);

这个函数中,有一个循环,会调用

template<bool buildAST>
ParseNodePtr Parser::ParseStatement()

来处理各种声明。并加入AST。

ParseStatement会调用ScanCore来找到各种token。就是我们上面说的那个ScanCore,每次ParseStatement处理完一个token,再次调用ScanCore的时候,就会从上次的token之后恢复,所以这也是为什么之前需要扫出第一个token的原因。

同时,在ParseStatement中我们也可以看到,chakra在处理const和let时是同一个分支去做的:

case tkCONST:
case tkLET:
    ichMin = m_pscan->IchMinTok();

    m_pscan->Scan();
    pnode = ParseVariableDeclaration<buildAST>(tok, ichMin);
    goto LNeedTerminator;

扫描到一个token后,他就会立刻再扫出后一个有效的语义。例如let后面会跟一个VariableId。扫出这个Id。
ParseVariableDeclaration中,函数为刚刚扫出的token建立一个node(CreateBlockScopedDeclNode),因为我们给出的类型是const的,所以这里是:

        else if (declarationType == tkCONST)
        {
            pnodeThis = CreateBlockScopedDeclNode(pid, knopConstDecl);
            CHAKRATEL_LANGSTATS_INC_LANGFEATURECOUNT(Const, m_scriptContext);
        }

然后,扫描到是在初始化,如果token允许初始化,程序调用ScanCore扫描值,并调用ParseExpr来解析值。

            pnodeInit = ParseExpr<buildAST>(koplCma, nullptr, fAllowIn, FALSE, pNameHint, &nameHintLength, &nameHintOffset);

ParseExpr会实际调用ParseTerm处理。ParseTerm根据后面token的类型(这里是tkIntCon,即整数常量),来解析值。

 case tkIntCon:
        if (IsStrictMode() && m_pscan->IsOctOrLeadingZeroOnLastTKNumber())
        {
            Error(ERRES5NoOctal);
        }

        if (buildAST)
        {
            pnode = CreateIntNodeWithScanner(m_token.GetLong());
        }
        fCanAssign = FALSE;
        m_pscan->Scan();
        break;

m_token.GetLong()可以看到,token早已解析好了一堆值,这里返回u.lw。即b=4的4。

pro5.GIF

CreateIntNodeWithScanner的实现如下:

ParseNodePtr Parser::CreateIntNodeWithScanner(int32 lw)
{
    Assert(!this->m_deferringAST);
    ParseNodePtr pnode = CreateNodeWithScanner<knopInt>();
    pnode->sxInt.lw = lw;
    return pnode;
}

也就是将4单独作为一个Node(type:knopInt)创建出来。

然后,程序调用ParsePostfixOperators,做一些简单的前缀修正。当然这里我们并不会触发这个修正。

pnode = ParsePostfixOperators<buildAST>(pnode, fAllowCall, fInNew, isAsyncExpr, &fCanAssign, &term, pfIsDotOrIndex);

返回上级ParseVariableDeclaration,这个knopInt并不会被加到列表中。最后,只有const被加入了AST。

void Parser::AddToNodeListEscapedUse(ParseNode ** ppnodeList, ParseNode *** pppnodeLast,
                           ParseNode * pnodeAdd)
{
    AddToNodeList(ppnodeList, pppnodeLast, pnodeAdd);
    pnodeAdd->SetIsInList();
}

扫描到分号,该行结束。

    switch (m_token.tk)
    {
    case tkSColon:
        m_pscan->Scan();
        if (pnode!= nullptr) pnode->grfpn |= PNodeFlags::fpnExplicitSemicolon;
        break;

这么处理完以后,AST中就会有var和const两个token了。

另外没有细说的是AddToNodeList。AddToNodeList会创建binNode(如果需要),并插入sxBin.pNode2中,如果要在Vs中遍历AST,可以从List开始遍历sxBin。

void Parser::AddToNodeList(ParseNode ** ppnodeList, ParseNode *** pppnodeLast,
                           ParseNode * pnodeAdd)
{
    Assert(!this->m_deferringAST);
    if (nullptr == *pppnodeLast)
    {
        // should be an empty list
        Assert(nullptr == *ppnodeList);

        *ppnodeList = pnodeAdd;
        *pppnodeLast = ppnodeList;
    }
    else
    {
        //
        AssertNodeMem(*ppnodeList);
        AssertNodeMem(**pppnodeLast);

        ParseNode *pnodeT = CreateBinNode(knopList, **pppnodeLast, pnodeAdd);
        **pppnodeLast = pnodeT;
        *pppnodeLast = &pnodeT->sxBin.pnode2;
    }
}

pro6.GIF

最终AST类似于:

pppnodeList
|
sxBin
|--pnode1 (knopVarDecl)-->knopNone
|--pnode2 (knopList)
    |
    sxBin
    |--pnode1 (knopConstDecl) -->knopNone
    |--pnode2 (knopList)
        |
        sxBin
|--knopAsg
            |
       sxBin
            |--pnode1 (knopName)
       |--pnode2 (knopInt) -- sxInt: 3
        |--knopCall
            |
             sxBin
            |--pnode1 (knopName)
            |--pnode2 (knopAdd)
                |
                sxBin
                |--pnode1 (knopName)
                |--pnode2 (knopName)
                     |
                  sxBin
                    |--pnode1
                    |--pnode2 (func)
+endNode

回到Js::ScriptContext::ParseScript,程序更新读取的字节数等信息,备份源,返回到Js::ScriptContext::LoadScript。

        ParseNodePtr parseTree = ParseScript(&parser, script, cb, pSrcInfo,
            pse, ppSourceInfo, rootDisplayName, loadScriptFlag,
            &sourceIndex, scriptSource);

        if (parseTree != nullptr)
        {
            pFunction = GenerateRootFunction(parseTree, sourceIndex, &parser, (*ppSourceInfo)->GetParseFlags(), pse, rootDisplayName);
        }

pro4.GIF

GenerateRootFunction

pro4.GIF

现在开始另一个函数的探索——GenerateRootFunction。在已经有了AST的情况下,这个函数负责生成bytecode,并让程序接下来能够跳转到RootFunction。

让我们开始。之前说到的操作,返回后会走到GenerateRootFunction。GenerateRootFunction定义如下:

JavascriptFunction* ScriptContext::GenerateRootFunction(ParseNodePtr parseTree, uint sourceIndex, Parser* parser, uint32 grfscr, CompileScriptException * pse, const char16 *rootDisplayName)

GenerateRoofFunction则会调用ByteCodeGenerator::Generate

HRESULT GenerateByteCode(__in ParseNode *pnode, __in uint32 grfscr, __in Js::ScriptContext* scriptContext, __inout Js::ParseableFunctionInfo ** ppRootFunc,
                         __in uint sourceIndex, __in bool forceNoNative, __in Parser* parser, __in CompileScriptException *pse, Js::ScopeInfo* parentScopeInfo,
                        Js::ScriptFunction ** functionRef)
{
    HRESULT hr = S_OK;
    ByteCodeGenerator byteCodeGenerator(scriptContext, parentScopeInfo);
    BEGIN_TRANSLATE_EXCEPTION_TO_HRESULT_NESTED
    {
        // Main code.
        ByteCodeGenerator::Generate(pnode, grfscr, &byteCodeGenerator, ppRootFunc, sourceIndex, forceNoNative, parser, functionRef);
    }
    END_TRANSLATE_EXCEPTION_TO_HRESULT(hr);

    if (FAILED(hr))
    {
        hr = pse->ProcessError(nullptr, hr, nullptr);
    }

    return hr;
}

看得出来这个就是主导生成字节码的入口了。这个函数的前几行获取各种上下文。确保生成的函数上下文全部正确。

void ByteCodeGenerator::Generate(__in ParseNode *pnode, uint32 grfscr, __in ByteCodeGenerator* byteCodeGenerator,
    __inout Js::ParseableFunctionInfo ** ppRootFunc, __in uint sourceIndex,
    __in bool forceNoNative, __in Parser* parser, Js::ScriptFunction **functionRef)
{
    Js::ScriptContext * scriptContext = byteCodeGenerator->scriptContext;

#ifdef PROFILE_EXEC
    scriptContext->ProfileBegin(Js::ByteCodePhase);
#endif
    JS_ETW_INTERNAL(EventWriteJSCRIPT_BYTECODEGEN_START(scriptContext, 0));

    ThreadContext * threadContext = scriptContext->GetThreadContext();
    Js::Utf8SourceInfo * utf8SourceInfo = scriptContext->GetSource(sourceIndex);
byteCodeGenerator->m_utf8SourceInfo = utf8SourceInfo;

    // For dynamic code, just provide a small number since that source info should have very few functions
    // For static code, the nextLocalFunctionId is a good guess of the initial size of the array to minimize reallocs
    SourceContextInfo * sourceContextInfo = utf8SourceInfo->GetSrcInfo()->sourceContextInfo;
    utf8SourceInfo->EnsureInitialized((grfscr & fscrDynamicCode) ? 4 : (sourceContextInfo->nextLocalFunctionId - pnode->sxFnc.functionId));
    sourceContextInfo->EnsureInitialized();

然后,程序会生成一个ByteCode的区域。并初始化,然后调用Visit第一次访问pnode。

ArenaAllocator localAlloc(_u("ByteCode"), threadContext->GetPageAllocator(), Js::Throw::OutOfMemory);
byteCodeGenerator->parser = parser;
byteCodeGenerator->SetCurrentSourceIndex(sourceIndex);
byteCodeGenerator->Begin(&localAlloc, grfscr, *ppRootFunc);
byteCodeGenerator->functionRef = functionRef;
Visit(pnode, byteCodeGenerator, Bind, AssignRegisters);

Visit中,程序根据pnode的类型,例如我的例子中,pnode就是一个knopProg,当然它代表的就是顶层的那个大函数了,程序会试图找到里面所有折叠的元素,包括eval以及with。方案很简单——遍历节点,找到所有可疑的并处理。之后,也会处理所有预设值,例如常量(字面值等)。

这一次Visit完成后,所有“刺头”基本就都被找到并处理了,然后,程序开始EmitProgram。

byteCodeGenerator->forceNoNative = forceNoNative;
byteCodeGenerator->EmitProgram(pnode);

在ByteCodeGenerator::EmitProgram中,程序会试图根据Ast的信息估算出合适的临时目录大小并初始化数据。

void ByteCodeGenerator::EmitProgram(ParseNode *pnodeProg)
{
    // Indicate that the binding phase is over.
    this->isBinding = false;
    this->trackEnvDepth = true;
    AssignPropertyIds(pnodeProg->sxFnc.funcInfo->byteCodeFunction);

    int32 initSize = this->maxAstSize / AstBytecodeRatioEstimate;

    // Use the temp allocator in bytecode write temp buffer.
    m_writer.InitData(this->alloc, initSize);

然后,对于我的例子,程序会走到EmitScopeList中。

if (this->parentScopeInfo)
{
    // Scope stack is already set up the way we want it, so don't visit the global scope.
    // Start emitting with the nested scope (i.e., the deferred function).
    this->EmitScopeList(pnodeProg->sxProg.pnodeScopes);
}
else
{
    this->EmitScopeList(pnodeProg);
}

在EmitScopeList中,程序会遍历pnode,而且根据pnode的类型,生成不同的字节码。

void ByteCodeGenerator::EmitScopeList(ParseNode *pnode, ParseNode *breakOnBodyScopeNode)
{
    while (pnode)
    {
        ……
        case knopProg:
            if (pnode->sxFnc.funcInfo)
            {
                FuncInfo* funcInfo = pnode->sxFnc.funcInfo;
                Scope* paramScope = funcInfo->GetParamScope();

                if (!funcInfo->IsBodyAndParamScopeMerged())
                {
                    funcInfo->SetCurrentChildScope(paramScope);
                }
                else
                {
                    funcInfo->SetCurrentChildScope(funcInfo->GetBodyScope());
                }
                this->StartEmitFunction(pnode);

                PushFuncInfo(_u("StartEmitFunction"), funcInfo);

                if (!funcInfo->IsBodyAndParamScopeMerged())
                {
                    this->EmitScopeList(pnode->sxFnc.pnodeBodyScope->sxBlock.pnodeScopes);
                }
                else
                {
                    this->EmitScopeList(pnode->sxFnc.pnodeScopes);
                }

                this->EmitOneFunction(pnode);
                this->EndEmitFunction(pnode);


                Assert(pnode->sxFnc.pnodeBody == nullptr || funcInfo->isReused || funcInfo->GetCurrentChildScope() == funcInfo->GetBodyScope());
                funcInfo->SetCurrentChildScope(nullptr);
            }
            pnode = pnode->sxFnc.pnodeNext;
            break;

在EmitOneFunction中,开始有生成字节码的操作。先从上往下慢慢看。

void ByteCodeGenerator::EmitOneFunction(ParseNode *pnode)
{
    Assert(pnode && (pnode->nop == knopProg || pnode->nop == knopFncDecl));
    FuncInfo *funcInfo = pnode->sxFnc.funcInfo;
    Assert(funcInfo != nullptr);

    if (funcInfo->IsFakeGlobalFunction(this->flags))
    {
        return;
    }

    Js::ParseableFunctionInfo* deferParseFunction = funcInfo->byteCodeFunction;
    deferParseFunction->SetGrfscr(deferParseFunction->GetGrfscr() | (this->flags & ~fscrDeferredFncExpression));
    deferParseFunction->SetSourceInfo(this->GetCurrentSourceIndex(),
        funcInfo->root,
        !!(this->flags & fscrEvalCode),
        ((this->flags & fscrDynamicCode) && !(this->flags & fscrEvalCode)));

    //提供参数个数,本例为0
deferParseFunction->SetInParamsCount(funcInfo->inArgsCount);

    //计算InArgCount。本例无默认参数,进else分支,参数为0。
if (pnode->sxFnc.HasDefaultArguments())
    {
        deferParseFunction->SetReportedInParamsCount(pnode->sxFnc.firstDefaultArg + 1);
    }
    else
    {
        deferParseFunction->SetReportedInParamsCount(funcInfo->inArgsCount);
    }
…………

获取解析后的Function Body的引用。

Js::FunctionBody* byteCodeFunction = funcInfo->GetParsedFunctionBody();
// We've now done a full parse of this function, so we no longer need to remember the extents
// and attributes of the top-level nested functions. (The above code has run for all of those,
// so they have pointers to the stub sub-trees they need.)
byteCodeFunction->SetDeferredStubs(nullptr);

try
{
    if (!funcInfo->IsGlobalFunction())
    {
……

进行必要的内存分配等初始化操作后,开始对BinaryWriter进行初始化。Begin操作和End/Reset操作是对应的。Begin时开始初始化,End时将所有内容都写入,且处理好所有需要再次计算的,包括跳转,函数调用等的信息。Reset则放弃操作。

    m_writer.Begin(byteCodeFunction, alloc, this->DoJitLoopBodies(funcInfo), funcInfo->hasLoop, this->IsInDebugMode());
    this->PushFuncInfo(_u("EmitOneFunction"), funcInfo);

    this->inPrologue = true;

根据ByteCodeWriter::Begin的注释,我们也大致可以了解Begin的操作:
1、 Begin设置好实例,然后为指定的JavascriptFunction生成字节码。
2、 调用完End()之后,字节码才会被写入。或者调用Reset()来放弃操作。
3、 每个ByteCodeWriter都可能会被多次使用,但是每次只会为一个函数生成一个字节码流。
事实上,Begin实际就是这么一个初始化函数:

void ByteCodeWriter::Begin(FunctionBody* functionWrite, ArenaAllocator* alloc, bool doJitLoopBodies, bool hasLoop, bool inDebugMode)
{
    Assert(!isInUse);
    AssertMsg(m_functionWrite == nullptr, "Cannot nest Begin() calls");
    AssertMsg(functionWrite != nullptr, "Must have valid function to write");
    AssertMsg(functionWrite->GetByteCode() == nullptr, "Function should not already have a byte-code body");
    AssertMsg(functionWrite->GetLocalsCount() > 0, "Must always have R0 for return-value");

    DebugOnly(isInUse = true);
    m_functionWrite = functionWrite;
    m_doJitLoopBodies = doJitLoopBodies;
    m_doInterruptProbe = functionWrite->GetScriptContext()->GetThreadContext()->DoInterruptProbe(functionWrite);
    m_hasLoop = hasLoop;
    m_isInDebugMode = inDebugMode;
}

进入Prologue之后
1、检查是否是Class构造函数。需要对它特殊处理。
2、获取对应的Scope。
3、提交所有的常量,包括我们的4、3。通过LoadAllConstants(funcInfo);实现。
其余包括整数、字符串、字符串模板调用点、浮点(double)。

对于整数,Chakra调用ToVar来简单加密。确保未发生整数溢出的情况下,做一次左移位

    Var intConst = JavascriptNumber::ToVar((int32)val, scriptContext);

    this->RecordConstant(location, intConst);

实现:

return reinterpret_cast<Var>((nValue << VarTag_Shift) | AtomTag_IntPtr);

此处 varTag_Shift == 1, AtomTag_IntPtr == 1。

例如4加密后为9。

4、读取有名称的函数对象,例如 x = function f() {},把它放入Slot。

……省略许多

5、ByteCodeGenerator::DefineFunctions
忽略文本顺序,在调用scope之前,对整个scope范围内的函数定义进行字节码提交。
调用DefineCachedFunctions/DefineUncachedFunctions

    this->inPrologue = false;


    if (funcInfo->IsGlobalFunction())

    {

        EmitGlobalBody(funcInfo);

    }

else

{

        EmitFunctionBody(funcInfo);

    }

In EmitGlobalBody:

1、 遍历节点(while(pNode = pNode->sxBin.pNode2))。对每个节点进行简单检查,我们的例子中全部会走到EmitTopLevelStatement。

EmitTopLevelStatement(stmt, funcInfo, false);

在EmitTopLevelStatement中,函数调用Emit挨个提交:

Emit(stmt, this, funcInfo, fReturnValue, false/*isConstructorCall*/, nullptr/*bindPnode*/, true/*isTopLevel*/);


stmt:
-       stmt    0x030d7150 {nop=knopVarDecl (82 'R') grfpn=32 ichMin=0 ...} ParseNode *
        nop knopVarDecl (82 'R')    OpCode

funcInfo:
-       funcInfo    0x030d8028 {inlineCacheCount=0 rootObjectLoadInlineCacheCount=0 rootObjectLoadMethodInlineCacheCount=...}   FuncInfo *
+       name    0x1090b324 L"glo"   const wchar_t *

在Emit中,var、const、let被视为同类。

    case knopVarDecl:
    case knopConstDecl:
    case knopLetDecl:
    {
        // Emit initialization code
        ParseNodePtr initNode = pnode->sxVar.pnodeInit;
……

注意我们的var a,sym定义了它的名称,pnodeNext标记了它在AST中没有子节点。

-       pnode->sxVar    {pnodeNext=0x00000000 <NULL> pid=0x011c3c14 {m_pidNext=0x00000000 <NULL> m_pidRefStack=0x00000000 <NULL> ...} ...}  PnVar
+       pnodeNext   0x00000000 <NULL>   ParseNode *
+       sym 0x030d7198 {name={string=0x011c3c2e L"a" len=1 } pid=0x011c3c14 {m_pidNext=0x00000000 <NULL> m_pidRefStack=...} ...}    Symbol *

看一下对const的处理,因为const有init(const b = 4;),所以

ParseNodePtr initNode = pnode->sxVar.pnodeInit;

这里得到sxVar.pnodeInit是为非null的,因此,触发了Init流程。

    if (initNode != nullptr || pnode->nop == knopLetDecl)
    {
        Symbol *sym = pnode->sxVar.sym;
        Js::RegSlot rhsLocation;

        byteCodeGenerator->StartStatement(pnode);

        if (initNode != nullptr)
        {
            Emit(initNode, byteCodeGenerator, funcInfo, false);

函数emit该initNode(const b,+nop)。以下是initNode的具体细节,可以看到sxInt为4,我们的初始值。

-       initNode    0x030d7250 {nop=knopInt (2 '\x2') grfpn=0 ichMin=18 ...}    ParseNode *
+       sxInt   {lw=4 } PnInt

完事后,函数开始emit这个node的Assignment操作(=4,to b)
EmitAssignment(nullptr, pnode, rhsLocation, byteCodeGenerator, funcInfo);

三大家族齐聚一堂:

case knopVarDecl:
case knopLetDecl:
case knopConstDecl:
{
    Symbol *sym = lhs->sxVar.sym;
    Assert(sym != nullptr);
    byteCodeGenerator->EmitPropStore(rhsLocation, sym, nullptr, funcInfo, lhs->nop == knopLetDecl, lhs->nop == knopConstDecl);
    break;
}

然后,结束Emit。

void ByteCodeGenerator::EmitGlobalBody(FuncInfo *funcInfo) 

在提交完GlobalObject后,完成所有Emit操作。这里仅仅都是预处理,没有生成字节码。

在其BeginEmitBlock函数调用的EmitOneFunction中,终于会调用各种编码函数生成字节码了。

    ::BeginEmitBlock(pnode->sxFnc.pnodeScopes, this, funcInfo);

我们可以第一次看到向缓冲区写入字节码的操作,操作发生时栈如下。 ByteCodeWriter::Data::EncodeT根据类型的序号,找到对应内容编码,并调用Js::ByteCodeWriter::Data::Write写入缓冲区。然后移动写入指针。

pro4.GIF

>dd 0x012486d8
0x012486D8  00000000 00000000 00000000 00000000  
0x012486E8  00000000 00000000 00000000 00000000  
0x012486F8  00000000 00000000 00000000 00000000  
0x01248708  00000000 00000000 00000000 00000000  
>dd 0x012486d8
0x012486D8  0000004c 00000000 00000000 00000000  
0x012486E8  00000000 00000000 00000000 00000000  
0x012486F8  00000000 00000000 00000000 00000000  
0x01248708  00000000 00000000 00000000 00000000  

这是写入1字节操作码前后的缓冲区变化。

>   ChakraCore.dll!Js::ByteCodeWriter::DataChunk::WriteUnsafe(const void * data, unsigned int byteSize) 行 60    C++
    ChakraCore.dll!Js::ByteCodeWriter::Data::Write(const void * data, unsigned int byteSize) 行 3340 C++
    ChakraCore.dll!Js::ByteCodeWriter::Data::EncodeOpCode<0>(unsigned short op, Js::ByteCodeWriter * writer) 行 3259 C++
    ChakraCore.dll!Js::ByteCodeWriter::Data::EncodeT<0>(Js::OpCode op, Js::ByteCodeWriter * writer) 行 3308  C++
    ChakraCore.dll!Js::ByteCodeWriter::Data::EncodeT<0>(Js::OpCode op, const void * rawData, int byteSize, Js::ByteCodeWriter * writer) 行 3321  C++
    ChakraCore.dll!Js::ByteCodeWriter::TryWriteElementRootU<Js::LayoutSizePolicy<0> >(Js::OpCode op, unsigned int index) 行 1833 C++
    ChakraCore.dll!Js::ByteCodeWriter::ElementRootU(Js::OpCode op, unsigned int index) 行 1844   C++
    ChakraCore.dll!ByteCodeGenerator::EnsureNoRedeclarations::__l25::<lambda>(ParseNode * pnode) 行 5901 C++
    ChakraCore.dll!ByteCodeGenerator::IterateBlockScopedVariables<void <lambda>(ParseNode *) >(ParseNode * pnodeBlock, ByteCodeGenerator::EnsureNoRedeclarations::__l25::void <lambda>(ParseNode *) fn) 行 418   C++
    ChakraCore.dll!ByteCodeGenerator::EnsureNoRedeclarations(ParseNode * pnodeBlock, FuncInfo * funcInfo) 行 5904    C++
    ChakraCore.dll!ByteCodeGenerator::EmitOneFunction(ParseNode * pnode) 行 3251 C++

BinaryWriter::End

未完待续。