Swift,C,LLVM编译器优化

用很少和基本的词来说,LLVM是一个编译器框架,在“前端”-“后端”多层体系结构中支持许多不同的编程语言,其中,第一层非常容易地对源代码进行解析和分类,以生成中间语言表示形式,第二层层将中间表示形式转换为针对不同处理器体系结构优化的实际组装机器代码。

当然,LLVM比这个简单的描述要多得多,但是我并不十分在意这篇文章中的LLVM。 我在这里真正关心的是专注于基于LLVM架构的Swift和Clang现代编译器如何进行编译,尤其是优化我们开发人员每天生成的源代码,尤其是这些编译器如何对某些旧式技巧,窍门做出反应和优化,我们尝试在源代码中花费一些时间。

在这篇文章中,我将测试一个在Swift和C中使用不同编程技术实现的超简单但繁琐的函数,以试图迫使编译器遵循一些经典的优化模式。

我将提供编译后代码的运行时执行性能结果,并分享生成的汇编代码中的一些优化细节。

假设您已经分配了一个很大的缓冲区,不管是在堆还是在堆栈上,我们都不在乎,我们可以很容易地说一个整数数组,我们的基本函数只需要为该数组的每个位置分配一个特定的值。 而已!

因此,在C语言中,我们的功能可能是这样的超级基本的东西:

  void testLoop(int64_t * buffer,int64_t tot){ 
for(int64_t i = 0; i <tot; i ++){
*缓冲区++ = 1;
}
}

在Swift中,类似的东西非常类似:

  func testLoop(_ a:inout [Int],_ tot:Int){ 
对于i in 0 .. <tot {
a [i] = 1
}
}

注意,我在这里使用64位整数,因为这是64位体系结构上Swift Int的默认值,我希望能够支持很大的缓冲区以及巨大的循环。

一个好的开发人员通常可以在这种超级简单的场景中应用的基本技巧之一就是试图减少循环指令占用空间的影响。

举例来说,除了让循环具有n个交互并在每个循环交互上没有一条赋值指令,我们还可以将总的循环交互减少一个数量级,并在循环周期内放入10个这样的嵌套指令,

在C中:

  void testLoop(int64_t * buffer,int64_t tot){ 
int64_t t = tot / 10;
for(int64_t i = 0; i <t; i ++){
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
*缓冲区++ = 1;
}
}

在Swift中:

  func testLoop(_ a:inout [Int],_ tot:Int){ 
令t = tot / 10
因为我在0 .. <t {
a [i] = 1
a [i + 1] = 1
a [i + 2] = 1
a [i + 3] = 1
a [i + 4] = 1
a [i + 5] = 1
a [i + 6] = 1
a [i + 7] = 1
a [i + 8] = 1
a [i + 9] = 1
}
}

使用xcrun swiftc版本的编译器,您可以直接使用-O选项设置编译器所需的优化级别,然后选择不进行任何优化就进行编译,进行优化并减小目标代码大小,最后进行优化并删除运行时安全检查。

  xcrun swiftc main.swift -o mainNoO 
xcrun swiftc -onone main.swift -o mainNone
xcrun swiftc -Osize main.swift -o mainSize
xcrun swiftc -Ounchecked main.swift -o mainUnckecked

swiftc版本的编译器不接受-O选项,您只能决定在调试模式下没有优化或在发行模式下没有优化的情况下进行构建。

 迅速建立-c调试 
迅速建立-c版本

Swift与C语言具有非常好的互操作性,它还允许直接在Swift中使用C样式指针。 以下实现基本上获得了一个指向Swift数组对象直接管理的内存的C指针,并在Swift中使用传统的C指针算法和赋值来尝试对我们的基本功能进行优化:

  func testLoop(_ a:inout [Int64],_ tot:Int){ 
让大小= MemoryLayout .size
如果var p:UnsafeMutableRawPointer = UnsafeMutableRawPointer(变异:a){
for _ in 0 .. <tot {
p.storeBytes(of:1,as:Int64.self)
p + =大小
}
}
}

再一次强制减少for循环占用空间,我们可以在循环中使用10种不同的时间指针分配,并以此方式将循环的交互总数减少一个数量级:

  func testLoop(_ a:inout [Int64],_ tot:Int){ 
让大小= MemoryLayout .size
如果var p:UnsafeMutableRawPointer = UnsafeMutableRawPointer(变异:a){
令t = tot / 10
对于_ in 0 .. <t {
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
p.storeBytes(of:1,as:Int64.self)
p + =大小
}
}
}

如前所述,Swift对混合Swift和C代码具有出色的本机支持。 在另一个故事中,我已经演示了在同一个Swift Package Manager项目的上下文中直接支持C和C ++代码有多么简单。

Swift ++ == Swift => C => C ++ => STL

Swift包装程序,调用C包装程序,调用C ++包装程序,并使用Swift Package Manager(Linux)调用STL

medium.com

在这里,我只是在示例Swift Package Manager测试项目中包括了我们上面讨论的函数的两个基于C的简单版本,我直接调用了它们,并将它们作为上述所有其他Swift实现的基准进行了测试。

以下swiftc可选参数可用于生成Swift和LLVM中间文件,以研究如何在嵌套的前端和后端步骤中优化代码。

Swift抽象语法树(AST)

  swiftc –dump-ast main.swift 

快速中间语言(SIL)

  swiftc –emit-sil main.swift 

LLVM中间表示(LLVM IR)

  swiftc –emit-ir main.swift 

汇编语言

  swiftc –emit-assembly main.swift 

为了全面比较所有可能的选项,我还创建了一个单独的XCode macOS控制台C项目,在该项目中,我仅测试和基准测试了上面针对示例函数讨论的两个C实现。

惊喜,惊喜。 好吧,至少令我惊讶的是,最终所有不同的Swift和C实现都有非常可比的性能结果。

在这里非常重要的一点是,它表明您的Swift代码越简单,从编译器获得的优化就越好!

请参阅下表,并比较所有Swift和C函数的执行情况,并始终以10亿次交互的循环来测试该函数。

以下伪代码已通过Hopper(https://www.hopperapp.com)从Swift和C编译和优化的二进制文件中复制而来。

简单的Swift实现(每个循环交互一个赋值指令):

 无效_ $ S28SwiftCompilerPerformanceTest9testLoop1ys5Int64VSayADGz_SitF(int arg0,int arg1){ 
rbx = arg1;
r14 = arg0;
如果(rbx <0x0)转到loc_100003093; loc_100002eea:
如果(CPU_FLAGS&E)转到loc_100003030; loc_100002ef0:
r15 = * r14;
如果(rbx-0x1> = *(r15 + ** _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd))转到loc_100003097; loc_100002f0b:
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(r15)== 0x0){
swift_bridgeObjectRetain(r15);
r12 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r15);
swift_bridgeObjectRelease(r15);
rdi = * r14;
* r14 = r12;
swift_bridgeObjectRelease(rdi);
}
rax = * r14;
如果(rbx <= 0x3){
rcx = 0x0;
rax = rax + rcx * 0x8 + 0x20;
rdx = rbx-rcx;
做{
* rax = 0x1;
rax = rax + 0x8;
rdx = rdx-0x1;
} while(rdx!= 0x0);
}
其他{
rcx = rbx&0xfffffffffffffffc;
rsi = rcx-0x4 >> 0x2;
rdx = rsi + 0x1&0x7;
如果(rcx <0x20){
rsi = 0x0;
如果(rdx!= 0x0){
rsi = rax + rsi * 0x8 + 0x30;
rdx = -rdx;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007880);
做{
*(int128_t *)(rsi-0x10)= native_movups(*(int128_t *)(rsi-0x10),xmm0);
*(int128_t *)rsi = native_movups(*(int128_t *)rsi,xmm0);
rsi = rsi + 0x20;
rdx = rdx + 0x1;
} while(rdx!= 0x0);
}
}
其他{
rdi = rdx-0x1-rsi;
rsi = 0x0;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007880);
做{
*(int128_t *)(rax + rsi * 0x8 + 0x20)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x20),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x30)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x30),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x40)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x40),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x50)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x50),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x60)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x60),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x70)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x70),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x80)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x80),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x90)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x90),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xa0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xa0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xb0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xb0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xc0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xc0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xd0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xd0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xe0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xe0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0xf0)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0xf0),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x100)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x100),xmm0);
*(int128_t *)(rax + rsi * 0x8 + 0x110)= native_movups(*(int128_t *)(rax + rsi * 0x8 + 0x110),xmm0);
rsi = rsi + 0x20;
rdi = rdi + 0x8;
} while(rdi!= 0x0);
如果(rdx!= 0x0){
rsi = rax + rsi * 0x8 + 0x30;
rdx = -rdx;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007880);
做{
*(int128_t *)(rsi-0x10)= native_movups(*(int128_t *)(rsi-0x10),xmm0);
*(int128_t *)rsi = native_movups(*(int128_t *)rsi,xmm0);
rsi = rsi + 0x20;
rdx = rdx + 0x1;
} while(rdx!= 0x0);
}
}
如果(rcx!= rbx){
rax = rax + rcx * 0x8 + 0x20;
rdx = rbx-rcx;
做{
* rax = 0x1;
rax = rax + 0x8;
rdx = rdx-0x1;
} while(rdx!= 0x0);
}
}
goto loc_100003030; loc_100003030:
如果(rbx> = 0xffffffffffffffff){
rax = SAR((rbx >> 0x3f)+ rbx,0x1);
rcx = * r14;
如果(rax> = *(rcx + ** _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd)){
asm {ud2};
loc_100003091();
}
}
其他{
asm {ud2};
loc_100003091();
}
返回; loc_100003097:
asm {ud2};
sub_100003099();
return; loc_100003093:
asm {ud2};
loc_100003095();
返回;
}

第二个Swift实现(每个循环交互有10条分配指令):

 无效_ $ S28SwiftCompilerPerformanceTest9testLoop2ys5Int64VSayADGz_SitFTm(int arg0,int arg1,int arg2){ 
rbx = arg2;
r14 = arg1;
r15 = arg0;
r12 = HIQWORD(r14 * 0x6666666666666667);
如果(r14 <0xfffffffffffffff7)转到loc_100003511; loc_1000030f7:
如果(r14 <= 0x9)转到loc_1000032a2; loc_100003105:
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
r13 = * r15;
swift_bridgeObjectRetain(r13);
var_30 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(r13);
swift_bridgeObjectRelease(r13);
rdi = * r15;
* r15 = var_30;
swift_bridgeObjectRelease(rdi);
}
r13 = r12 >> 0x3f;
r12 = SAR(r12,0x2);
如果(swift_isUniquelyReferencedOrPinned_nonNull_native(* r15)== 0x0){
swift_bridgeObjectRetain(* r15);
var_38 = _ $ Ss20_ArrayBufferProtocolPss5RangeVySiG7IndicesRtzrlE7copyingxx_tcfCs01_aB0Vys5Int64VG_Tg5Tf4gd_n(rdi);
swift_bridgeObjectRelease(rdi);
rdi = * r15;
* r15 = var_38;
swift_bridgeObjectRelease(rdi);
}
r12 = r12 + r13;
rax = * r15;
rcx = * _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd;
rcx = * rcx;
rdx = 0x1;
rsi = 0x0;
转到loc_1000031d0; loc_1000031d0:
如果(rsi> = *(rax + rcx))转到loc_1000034e5; loc_1000031da:
*(rax + rsi * 0x8 + 0x20)= rbx;
rdi = rsi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034e9; loc_1000031ed:
*(rax + rsi * 0x8 + 0x28)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034ed; loc_1000031ff:
*(rax + rsi * 0x8 + 0x30)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034f1; loc_100003211:
*(rax + rsi * 0x8 + 0x38)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034f5; loc_100003223:
*(rax + rsi * 0x8 + 0x40)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034f9; loc_100003235:
*(rax + rsi * 0x8 + 0x48)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_1000034fd; loc_100003247:
*(rax + rsi * 0x8 + 0x50)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_100003501; loc_100003259:
*(rax + rsi * 0x8 + 0x58)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_100003505; loc_10000326b:
*(rax + rsi * 0x8 + 0x60)= rbx;
rdi = rdi + 0x1;
如果(rdi> = *(rax + rcx))转到loc_100003509; loc_10000327d:
*(rax + rsi * 0x8 + 0x68)= rbx;
如果(r12 == rdx)转到loc_1000032a2; loc_100003287:
COND =!OVERFLOW(rsi);
rdx = rdx + 0x1;
rsi = rdi + 0x1;
如果(COND)转到loc_1000031d0; loc_10000329e:
asm {ud2};
loc_1000032a0();
return; loc_1000032a2:
如果(r14> = 0xffffffffffffffff){
rax = SAR((r14 >> 0x3f)+ r14,0x1);
rcx = * r15;
如果(rax> = *(rcx + ** _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd)){
asm {ud2};
loc_10000350f();
}
}
其他{
asm {ud2};
loc_10000350f();
}
返回; loc_100003509:
asm {ud2};
loc_10000350b();
return; loc_100003505:
asm {ud2};
loc_100003507();
return; loc_100003501:
asm {ud2};
loc_100003503();
return; loc_1000034fd:
asm {ud2};
loc_1000034ff();
返回; loc_1000034f9:
asm {ud2};
loc_1000034fb();
返回; loc_1000034f5:
asm {ud2};
loc_1000034f7();
return; loc_1000034f1:
asm {ud2};
loc_1000034f3();
返回; loc_1000034ed:
asm {ud2};
loc_1000034ef();
return; loc_1000034e9:
asm {ud2};
loc_1000034eb();
return; loc_1000034e5:
asm {ud2};
loc_1000034e7();
return; loc_100003511:
asm {ud2};
sub_100003513();
返回;
}

在Swift中使用C指针的第三个Swift实现(每个循环交互一个赋值指令):

 无效_ $ S28SwiftCompilerPerformanceTest9testLoop4ys5Int64VSayADGz_SitF(int arg0,int arg1){ 
rsi = arg1;
rdi = arg0;
rcx = * rdi;
rax = rcx + 0x20;
如果(rsi> = 0x0){
如果(!CPU_FLAGS&E){
如果(rsi <= 0x3){
r9 = 0x0;
rcx = rsi-r9;
做{
* rax = 0x4;
rax = rax + 0x8;
rcx = rcx-0x1;
} while(rcx!= 0x0);
}
其他{
r8 = rsi&0x3;
r9 = rsi-r8;
rax = rax + r9 * 0x8;
rcx = rcx + 0x30;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007860);
rdx = r9;
做{
*(int128_t *)(rcx-0x10)= native_movups(*(int128_t *)(rcx-0x10),xmm0);
*(int128_t *)rcx = native_movups(*(int128_t *)rcx,xmm0);
rcx = rcx + 0x20;
rdx = rdx-0x4;
} while(rdx!= 0x0);
如果(r8!= 0x0){
rcx = rsi-r9;
做{
* rax = 0x4;
rax = rax + 0x8;
rcx = rcx-0x1;
} while(rcx!= 0x0);
}
}
rcx = * rdi;
}
如果(rsi> = 0xffffffffffffffff){
rax = SAR((rsi >> 0x3f)+ rsi,0x1);
如果(rax> = *(rcx + ** _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd)){
asm {ud2};
loc_1000035bf();
}
}
其他{
asm {ud2};
loc_1000035bf();
}
}
其他{
asm {ud2};
sub_1000035c3();
}
返回;
}

在Swift中使用C指针的第四个Swift实现(每个循环交互有10个赋值指令):

  int _ $ S28SwiftCompilerPerformanceTest9testLoop5ys5Int64VSayADGz_SitF(int arg0,int arg1){ 
rsi = arg1;
rdi = arg0;
rdx = HIQWORD(rsi * 0x6666666666666667);
if(rsi> = 0xfffffffffffffff7){
rax = * rdi;
如果(rsi> = 0xa){
rdx =(SAR(rdx,0x2))+(rdx >> 0x3f);
rax = rax + 0x20;
r8 = rdx&0x3;
如果(rdx> = 0x4){
rcx = r8-rdx;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007870);
做{
*(int128_t *)rax = native_movups(*(int128_t *)rax,xmm0);
*(int128_t *)(rax + 0x10)= native_movups(*(int128_t *)(rax + 0x10),xmm0);
*(int128_t *)(rax + 0x20)= native_movups(*(int128_t *)(rax + 0x20),xmm0);
*(int128_t *)(rax + 0x30)= native_movups(*(int128_t *)(rax + 0x30),xmm0);
*(int128_t *)(rax + 0x40)= native_movups(*(int128_t *)(rax + 0x40),xmm0);
*(int128_t *)(rax + 0x50)= native_movups(*(int128_t *)(rax + 0x50),xmm0);
*(int128_t *)(rax + 0x60)= native_movups(*(int128_t *)(rax + 0x60),xmm0);
*(int128_t *)(rax + 0x70)= native_movups(*(int128_t *)(rax + 0x70),xmm0);
*(int128_t *)(rax + 0x80)= internal_movups(*(int128_t *)(rax + 0x80),xmm0);
*(int128_t *)(rax + 0x90)= native_movups(*(int128_t *)(rax + 0x90),xmm0);
*(int128_t *)(rax + 0xa0)= native_movups(*(int128_t *)(rax + 0xa0),xmm0);
*(int128_t *)(rax + 0xb0)= internal_movups(*(int128_t *)(rax + 0xb0),xmm0);
*(int128_t *)(rax + 0xc0)= internal_movups(*(int128_t *)(rax + 0xc0),xmm0);
*(int128_t *)(rax + 0xd0)= native_movups(*(int128_t *)(rax + 0xd0),xmm0);
*(int128_t *)(rax + 0xe0)= internal_movups(*(int128_t *)(rax + 0xe0),xmm0);
*(int128_t *)(rax + 0xf0)= native_movups(*(int128_t *)(rax + 0xf0),xmm0);
*(int128_t *)(rax + 0x100)= native_movups(*(int128_t *)(rax + 0x100),xmm0);
*(int128_t *)(rax + 0x110)= native_movups(*(int128_t *)(rax + 0x110),xmm0);
*(int128_t *)(rax + 0x120)= native_movups(*(int128_t *)(rax + 0x120),xmm0);
*(int128_t *)(rax + 0x130)= native_movups(*(int128_t *)(rax + 0x130),xmm0);
rax = rax + 0x140;
rcx = rcx + 0x4;
} while(rcx!= 0x0);
}
如果(r8!= 0x0){
r8 = -r8;
xmm0 = internal_movaps(xmm0,*(int128_t *)0x100007870);
做{
*(int128_t *)rax = native_movups(*(int128_t *)rax,xmm0);
*(int128_t *)(rax + 0x10)= native_movups(*(int128_t *)(rax + 0x10),xmm0);
*(int128_t *)(rax + 0x20)= native_movups(*(int128_t *)(rax + 0x20),xmm0);
*(int128_t *)(rax + 0x30)= native_movups(*(int128_t *)(rax + 0x30),xmm0);
*(int128_t *)(rax + 0x40)= native_movups(*(int128_t *)(rax + 0x40),xmm0);
rax = rax + 0x50;
r8 = r8 + 0x1;
} while(r8!= 0x0);
}
rax = * rdi;
}
如果(rsi> = 0xffffffffffffffff){
rcx = SAR((rsi >> 0x3f)+ rsi,0x1);
如果(rcx <*(rax + ** _ $ Ss27_ContiguousArrayStorageBaseC16countAndCapacitys01_B4BodyVvpWvd)){
rax = *(rax + rcx * 0x8 + 0x20);
}
其他{
asm {ud2};
rax = loc_10000371b();
}
}
其他{
asm {ud2};
rax = loc_10000371b();
}
}
其他{
asm {ud2};
rax = sub_10000371f();
}
返回rax;
}

C函数实现:

  int _test6(){ 
rax = rsi;
如果(rax> 0x0){
rax = memset_pattern16(rdi,0x100007820,rax << 0x3);
}
返回rax;
} void _test7(){
如果(rsi> = 0xa){
memset_pattern16(rdi,0x100007830,(HIQWORD(rsi * 0xcccccccccccccccccd)>> 0x3 <> 0x3 << 0x4)* 0x4);
}
返回;
}

JacopoMangiavacchi / SwiftCompilerPerformanceTest

SwiftCompilerPerformanceTest –关于Swift / LLVM编译器代码优化的一些测试

github.com