基于OLLVM源码探索BCF的去除方式

前言·

作为十分常见的混淆项目,OLLVM主要拥有四个Pass,分别是SubstitutionFlatteningSplitBogusControlFlow

其中虚假控制流(BogusControlFlow)是一个比较容易去除的混淆。先前我已经试着用过Angr分析程序的控制流,找到没有被执行的基本块并Nop,今天我们来探索下其他的去除方式。

源码分析·

根据我的理解,BCF主要分为两步,首先产生垃圾指令(不可达的基本块),再通过不透明谓词构成的永真式,将垃圾代码插入程序并混淆控制流。

垃圾指令·

这方面不太重要,毕竟我们都要去除。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// 创建虚假块
Twine * var3 = new Twine("alteredBB");
BasicBlock *alteredBB = createAlteredBasicBlock(originalBB, *var3, &F);
// -------------------
// 在alteredBB中填充垃圾指令
for (BasicBlock::iterator i = alteredBB->begin(), e = alteredBB->end(); i != e; ++i)
{
if (i->isBinaryOp())
{
unsigned opcode = i->getOpcode();
BinaryOperator *op, *op1 = NULL;
Twine *var = new Twine("_");
if (opcode == Instruction::Add || opcode == Instruction::Sub ||
opcode == Instruction::Mul || opcode == Instruction::UDiv ||
opcode == Instruction::SDiv || opcode == Instruction::URem ||
opcode == Instruction::SRem || opcode == Instruction::Shl ||
opcode == Instruction::LShr || opcode == Instruction::AShr ||
opcode == Instruction::And || opcode == Instruction::Or ||
opcode == Instruction::Xor)
{
// ...
}
if (opcode == Instruction::FAdd || opcode == Instruction::FSub ||
opcode == Instruction::FMul || opcode == Instruction::FDiv ||
opcode == Instruction::FRem)
{
// ...
}
if (opcode == Instruction::ICmp)
{
// ...
}
if (opcode == Instruction::FCmp)
{
// ...
}
}
}
return alteredBB;

永真跳转·

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// 创建一个永真的FCmp指令,将originalBB放在True跳转那方(意味着alteredBB永远不会执行)
// 再将alteredBB控制流指向originalBB,混淆控制流
Twine * var4 = new Twine("condition");
FCmpInst * condition = new FCmpInst(*basicBlock, FCmpInst::FCMP_TRUE , LHS, RHS, *var4);

BranchInst::Create(originalBB, alteredBB, (Value *)condition, basicBlock);

BranchInst::Create(originalBB, alteredBB);

// -------------------
// 再创建一个永真的FCmp指令,利用Split将originalBB分割为两部分,将True和False跳转分别指向正常的程序流和alteredBB
BasicBlock::iterator i = originalBB->end();
Twine * var5 = new Twine("originalBBpart2");
BasicBlock * originalBBpart2 = originalBB->splitBasicBlock(--i , *var5);

originalBB->getTerminator()->eraseFromParent();

Twine * var6 = new Twine("condition2");
FCmpInst * condition2 = new FCmpInst(*originalBB, CmpInst::FCMP_TRUE , LHS, RHS, *var6);

BranchInst::Create(originalBBpart2, alteredBB, (Value *)condition2, originalBB);

// -------------------
// 接下来是利用不透明谓词混淆永真式的部分
// 创建不透明谓词x, y
Twine *varX = new Twine("x");
Twine *varY = new Twine("y");
Value *x1 = ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);
Value *y1 = ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);

// 遍历函数,寻找永真的分支指令
for (Module::iterator mi = M.begin(), me = M.end(); mi != me; ++mi)
{
for (Function::iterator fi = mi->begin(), fe = mi->end(); fi != fe; ++fi)
{
// fi->setName("");
TerminatorInst *tbb = fi->getTerminator();
if (tbb->getOpcode() == Instruction::Br)
{
// ...
// 找到永真式,分别加入toDelete和toEdit中
if (cond->getPredicate() == FCmpInst::FCMP_TRUE)
{
toDelete.push_back(cond);
toEdit.push_back(tbb);
}
}
}
}

// 处理之前找到的toEdit,转换为if y < 10 || x*(x+1) % 2 == 0
for (std::vector<Instruction *>::iterator i = toEdit.begin(); i != toEdit.end(); ++i)
{
opX = new LoadInst((Value *)x, "", (*i));
opY = new LoadInst((Value *)y, "", (*i));

op = BinaryOperator::Create(Instruction::Sub, (Value *)opX,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 1,
false),
"", (*i));
op1 = BinaryOperator::Create(Instruction::Mul, (Value *)opX, op, "", (*i));
op = BinaryOperator::Create(Instruction::URem, op1,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 2,
false),
"", (*i));
condition = new ICmpInst((*i), ICmpInst::ICMP_EQ, op,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 0,
false));
condition2 = new ICmpInst((*i), ICmpInst::ICMP_SLT, opY,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 10,
false));
op1 = BinaryOperator::Create(Instruction::Or, (Value *)condition,
(Value *)condition2, "", (*i));

BranchInst::Create(((BranchInst *)*i)->getSuccessor(0),
((BranchInst *)*i)->getSuccessor(1), (Value *)op1,
((BranchInst *)*i)->getParent());
(*i)->eraseFromParent();
}

去除BCF·

分析程序,我们已经找到了问题的关键,就是y<10||x*(x+1)%2==0这个永真表达式,因为反编译程序不能预测不透明变量的值,所以就不敢擅自去除虚假控制流。

所以除了用Angr找到不可达的基本块,我们还可以给反编译程序足够的信息,来让他自动去除BCF。

IDAPython·

fake_jmp

既然我们知道这个条件跳转是永真的,那么我们可以直接patch,将mov eax, [rbp+var_xxx]替换成一个具体的值,比如mov eax, 0,这样IDA就能自动计算并去除BCF。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import ida_xref
import ida_idaapi
from ida_bytes import get_bytes, patch_bytes
import ida_segment

def do_patch(ea):
instruction = get_bytes(ea, 1)
if instruction == b"\x8B":
reg = (get_bytes(ea + 1, 1)[0] & 0b00111000) >> 3
new_bytes = (0xB8 + reg).to_bytes(1, 'little') + b'\x00\x00\x00\x00\x90\x90'
patch_bytes(ea, new_bytes)
print(f'[+] Patched at {hex(ea)}: {new_bytes.hex()}')
else:
print(f'[!] Unsupported instruction at {hex(ea)}')

seg = ida_segment.get_segm_by_name('.bss')
if seg:
start = seg.start_ea
end = seg.end_ea

for addr in range(start, end, 4):
ref = ida_xref.get_first_dref_to(addr)
while ref != ida_idaapi.BADADDR:
do_patch(ref)
ref = ida_xref.get_next_dref_to(addr, ref)
else:
print('[!] .bss segment not found')

修改.bss段数据与段属性·

我们可以直接在.bss修改数据,并修改为只读,这样IDA也可以自动识别永真式来去除虚假控制流。

1
2
3
4
5
6
7
8
9
10
11
12
import ida_segment
import ida_bytes

seg = ida_segment.get_segm_by_name('.bss')

if seg:
for ea in range(seg.start_ea, seg.end_ea, 4):
ida_bytes.patch_bytes(ea, int(0).to_bytes(4, 'little'))

seg.perm = 4
else:
print('[!] .bss segment not found')