Detection of Software Vulnerabilities: Dynamic Analysis

2022-05-27

1. Testing for functionality vs testing for security

A majority of security defects and vulnerabilities in software are not directly related to functionality
- i.e. Side-channel effect in the hardware 侧信道
- information obtained from the impl. rather than weaknesses in the code
Traditional testing checks functionalities for sensible inputs and corner conditions
Security testing also requires looking for the wrong, unwanted behavior for uncommon inputs
users will complain about functional problems, but hackers will not complain about security problems

To test a software system, we need:

test suite测试套件: a collection of input data
test oracle测试准则: decides if a test succeeded or led to an error
- A test oracle consists of a long list, which for every individual test case, specifies what should happen
  – A simple test oracle: just looking if the application does not crash

2. Code Coverage 代码覆盖

2.1. Statement Coverage 语句覆盖

involves the execution of all the executable statements at least once
- 涉及所有可执行语句的至少一次执行

2.2. Decision Coverage 判定覆盖

reports the true or false outcomes of each Boolean expression (tough to achieve 100%)
- 报告每个布尔表达式的真或假结果
- (decision outcomes exercised / total decision outcomes) * 100

2.3. Branch Coverage 分支覆盖

tests every outcome from the code to ensure that every branch is executed at least once
- 测试代码中的每个结果，以确保每个分支至少被执行一次
- (executed branches / total branches)*100

2.4. Condition Coverage 条件覆盖

reveals how the variables in the conditional statement are evaluated (logical operands)
- 揭示了条件语句中的变量是如何被评估的（逻辑操作数）
- (executed operands / total operands)*100

Code coverage criteria to measure the test suite quality
- Statement, decision, branch and condition coverage
Statement coverage does not imply branch coverage
Other coverage criteria exists, e.g., modified condition/ decision coverage (MCDC), which is used to test avionics embedded software航空电子嵌入式软件

3. Dynamic Detection

Dynamic detection techniques execute a program and monitor the execution to detect vulnerabilities

How should one monitor an execution such that vulnerabilities are detected?
How many and what program executions (i.e., for what input values) should one monitor?

3.1. Monitoring

In cases where a dynamic analysis is approximative, it can also generate false positives or false negatives, even though it operates on a concrete execution trace
- 在动态分析是近似的情况下，它也可能产生假阳性或假阴性，尽管它是在具体的执行轨迹上操作的。
For structured output generation vulnerabilities, the main challenge is that the intended structure of the generated output is often implicit. There exists no explicit specification that can be monitored.
- 对于结构化输出的漏洞，主要的挑战是生成的输出的预期结构往往是隐性的。不存在可以被监控的明确规范。
- For example, a monitor can use a fine-grained dynamic taint analysis to track the flow of untrusted input strings § flag a violation when untrusted input has an impact on the parse tree of the generated output
  - 例如，监控器可以使用细粒度的动态污点分析来跟踪不受信任的输入字符串的流向，当不受信任的输入对生成的输出的解析树产生影响时，就会标记出违规行为
Assertions, pre-conditions, and post-conditions can be compiled into the code to provide a monitor for API vulnerabilities at testing time, even if the cost of these compiled-in run-time checks can be too high to use them in production code
- 可以将断言、前条件和后条件编译到代码中，以便在测试时提供对API漏洞的监控，即使这些编译的运行时检查的成本太高，无法在生产代码中使用它们。
Monitoring for race conditions is hard, but some approaches for monitoring data races on shared memory cells exist. E.g., by monitoring whether all shared memory accesses follow a consistent locking discipline

3.1.1. Using BAs to check the program

A never claim can be used to define system behavior … It is most commonly used to specify behavior that should never happen. The claim is defined as a series of propositions, or boolean expressions, on the system state that must become true in the sequence specified for the behavior of interest to be matched.
never claim可以用来定义系统行为…它最常被用来指定永远不应该发生的行为。Claim被定义为一系列命题或布尔表达式，这些命题或布尔表达式在系统状态上必须按照指定的顺序成为真，才能与感兴趣的行为匹配。

线性时序逻辑转换Büchi自动机: convert LTL to never claim
Theory: check product of model and never claim for accepting state
ESBMC
- run never claim program as a monitor thread concurrently with other program thread(s)
  - no distinction between monitor thread and other threads

3.1.2. Ensuring soundness of monitor thread

atomic_begin();
assume(trans_count <= trans_seen + 1); 
trans_seen = trans_count;
switch(state) { 
  case T0_init:
    if(choice == 0) 
    { 
      assume((1)); 
      state = T0_init;
    } 
    else if (choice == 1) 
    { 
      assume((!cexpr_1 && cexpr_0)); 
      state = accept_S2; 
    } 
    else assume(0); 
    break;
  case accept_S2: 
    if(choice == 0) 
    { 
      assume((!cexpr_1)); 
      state = accept_S2; 
    } 
    else assume(0); 
    break;
}
atomic_end();

Problem: BMC forces program execution to eventually end, but BA are defined over infinite traces

Solution:
- pretend final state extends infinitely
- re-run monitor thread after program termination, with enough loop iterations to pass through each state twice
- if an accepting state is visited at least twice while stuttering, BA accepts extended trace

4. Fuzzing

Challenge: generate executions of the program along paths that will lead to the discovery of new vulnerabilities
- how to systematically select appropriate inputs for a program under test

Fuzzing

Basic idea: generate random inputs and check whether an application crashes
Original fuzzing: generate long inputs and check whether the system crashes
Why would inputs ideally be very long?
- To make it likely that buffer overruns cross segment boundaries so that the OS triggers a fault
- 所以一大区别是传统的需要很长，但结构性的不需要？
Types
- very long string
- blank strings
- min/max values of integers
- zero and negative values
- unique values, characters or keywords likely to trigger bugs

4.1. Pros & cons

Pros: Minimal effort:
- the test cases are automatically generated, and test oracle is is merely looking for crashes
- Fuzzing of a C/C++ binary can quickly give a good picture of the robustness of the code
Cons:
- Fuzzers do not find all bugs
- Crashes may be hard to analyze, but a crash is a true positive that something is wrong
- For programs that take complex inputs, more work will be needed to get reasonable code coverage and hit unusual test cases 对于接受复杂输入的程序，需要做更多的工作来获得合理的代码覆盖率并击中不寻常的测试案例
  - Leads to various studies on “smarter” fuzzers

4.2. Black-box fuzzing

The generation of values depends on the program input/output behaviour, and not on its internal structure
值的产生取决于程序的输入/输出行为，而不是其内部结构

Random testing （original fuzzing）
1. input values are randomly sampled from the appropriate value domain
Grammar-based fuzzing
1. a model of the expected format of input values is taken into account during the generation of input values
Mutation-based fuzzing
1. the fuzzer is provided with typical input values; it generates new input values by performing small mutations on the provided input

Use a model checker to produce an input that triggers the property violation

应用

4.2.1. AFL

AFL (American Fuzzy Lop) takes an evolutionary approach to learn mutations based on measuring code coverage

basic idea: if a mutation of the input triggers a new path through the code, then it is an exciting mutation; otherwise, the mutation is discarded
- 如果输入的突变触发了代码的新路径，那么它就是一个令人兴奋的突变；否则，突变就会被丢弃
Produce random mutations of the input and observe their effect on code coverage, AFL can learn what interesting inputs are
- 产生输入的随机突变，观察它们对代码覆盖率的影响，AFL可以了解什么是有趣的输入
- 也就是所谓的Coverage-Guided
Code instrumented to observe execution paths: – if source code is available, then use modified compiler; otherwise, run code in an emulator

4.3. White-box fuzzing

The internal structure of the program is analysed to assist in the generation of appropriate input values
对程序的内部结构进行分析，以协助生成适当的输入值

The primary systematic white-box fuzzing technique is a dynamic symbolic execution
- Executes a program with concrete input values and builds at the same time a path condition
  - An expression that specifies the constraints on those input values that have to be fulfilled to take this specific execution path
- Solve input values that do not satisfy the path condition of the current execution
  - the fuzzer can make sure that these input values will drive the program to a different execution path, thus improving coverage 模糊器可以确保这些输入值将驱动程序进入不同的执行路径，从而提高覆盖率

Step

Combine fuzz testing with dynamic test generation
Run the code with some initial input
Collect constraints on input with symbolic execution
Generate new constraints
Solve constraints with c**onstraint solver **
Synthesize合成 new inputs
Leverages Directed Automated Random Testing

4.4. BMC for Coverage Test Generation

Translate the program to an intermediate representation (IR)
Add goals indicating the coverage
1. location, branch, decision, condition and path
Symbolically execute IR to produce an SSA program
Translate the resulting SSA program into a logical formula
Solve the formula iteratively to cover different goals
Interpret the solution to figure out the input conditions
Spit吐出 those input conditions out as a test case

example

esbmc main.c lib/lib.c --error-label GOALX -I lib/