Detection of Software Vulnerabilities: Static Analysis

2022-05-05

1. Verification vs Validation

Verification: “Are we building the product right?”
- The software should conform to its specification
Validation: “Are we building the right product?”
- The software should do what the user requires
Verification and validation must be applied at each stage in the software process

2. Static and Dynamic Verification

static verification

aka Software inspections
Code analysis can prove the absence of errors but might subject to incorrect results

dynamic verification

aka Software testing
The system is executed with test data
Operational behaviour is observed

2.1. The V-model of development

3. Detection of Vulnerabilities

Detect the presence of vulnerabilities in the code during the development, testing, and maintenance
Trade-off between soundness and completeness
- Achieving soundness requires reasoning about all executions of a program
  - This can be done by static checking of the program code while making suitable abstractions of the executions
- Achieving completeness can be done by performing actual, concrete executions of a program that are witnesses to any vulnerability reported
  - The analysis technique has to come up with concrete inputs for the program that triggers a vulnerability
  - A typical dynamic approach is software testing: the tester writes test cases with concrete inputs and specific checks for the outputs
Detection tools can use a hybrid混合 combination of static and dynamic analysis techniques to achieve a good trade-off between soundness and completeness
Dynamic verification should be used in conjunction with static verification to provide full code coverage

4. difference among static analysis, testing / simulation, and debugging

Simulation
- Checks only some of the system executions, thus may miss errors
Static analysis (BMC)
- Exhaustively explores all executions
- Report errors as traces
- May produce incorrect results

4.1. Avoiding state space explosion

Bounded Model Checking (BMC)
- Breadth-first search (BFS) approach 广度优先
Symbolic Execution
- Depth-first search (DFS) approach 深度优先

4.2. V&V vs debugging

V & V is concerned with establishing the absence or existence of defects in a program, resp.
Debugging is concerned with two main tasks
- Locating
- Repairing these errors
Debugging involves
- Formulating a hypothesis about program behaviour
- Test these hypotheses to find the system error

5. Concurrency verification

Concurrency errors

progress errors: deadlock, starvation, …
- typically caused by wrong synchronization
safety errors: assertion violation
- typically caused by data races (i.e., unsynchronized access to shared data)

Concurrent programming styles

communication via message passing
- “truly” parallel distributed systems “真正的 “并行分布式系统
communication via shared memory
- multi-threaded programs

Round-robin scheduling

context: segment of a run of an active thread $t_i$
context switch: change of active thread from $t_i$ to $t_k$
- global state is passed on to $t_k$
- context switch back to $t_i$ resumes at old local state (incl. pc)
round: formed of one context of each thread
round robin schedule: same order of threads in each round
can simulate all schedules by round robin schedules

更多可参考：操作系统备忘录–处理机

5.1. BMC of Multi-threaded Software

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving

5.2. Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving, combines:

symbolic model checking: on each individual interleaving
explicit state model checking: explore all interleavings

Reachability Tree

每一层相当于多走一步单步，例如左子树，在上了m1的锁之后，跳到reader线程是无法进入cs的，所以会被blocked.

Exploring Reachability Tree

Use a reachability tree (RT) to describe reachable states of a multi-threaded program
Each node in the RT is a tuple $\left (A_i,C_i,s_i, \left \langle l_i^j,G_i^j \right \rangle ^n_{j=i} \right )_i$
- $A_i$ represents the currently active thread 当前活跃线程
- $C_i$ represents the context switch number 上下文切换序号
- $s_i$ represents the current state 当前状态
- $l_i^j$ represents the current location of thread $j$ 命令instruction位置
- $G_i^j$ represents the control flow guards accumulated in thread $j$ along the path from $l_0^j$ to $l_i^j$ 积累的守护条件

Expansion Rules of the RT

main algorithm

Initialize the stack with the initial node $v_0$ and the initial path $\pi_0 = \left \langle v_0 \right \rangle$
If the stack is empty, terminate with “no error”
Pop the current node $v$ and current path $\pi$ off the stack and compute the set $v’$ of successors后续 of $v$ using rules R1-R8
If $v’$ is empty, derive the VC $\phi^{\pi}_k$ for $\pi$ and call the SMT solver on it. If $\phi^{\pi}_k$ is satisfiable, terminate with “error”; otherwise, goto step 2
If $v’$ is not empty, then for each node $v \in v’$, add $v$ to $\pi$, and push node and extended path on the stack. goto step 3.

can suffer performance degradation:

in particular for correct programs where we need to invoke the SMT solver once for each possible execution path

5.3. Schedule Recording

Idea: systematically encode all possible interleavings into one formula

explore reachability tree in same way as lazy approach
- but call SMT solver only once
add a schedule guard $ts_i$ for each context switch block $i$ (0< $ts_i$ < #threads)
- record in which order the scheduler has executed the program
- SMT solver determines the order in which threads are simulated
add scheduler guards only to effective statements
- record effective context switches (ECS) (assignments and assertions)
- ECS block: sequence of program statements that are executed with no intervening ECS

number of threads and context switches grows very large quickly, and easily “blow-up” the solver

there is a clear trade-off between usage of time and memory resources

6. Sequentialization 序列化

Building verification tools for full-fledged成熟的 concurrent languages is difficult and expensive
- but scalable verification techniques exist for sequential languages
  - Abstraction techniques
  - SAT/SMT techniques

Sequentialization:

convert concurrent programs into sequential programs such that reachability is preserved
replace control non-determinism by data non-determinism
source-to-source transformation: T1 ∥ T2 ↝ T1’ ; T2’

6.1. KISS

Keep It Simple and Sequential

Under-approximation (subset of interleavings)
- 也就是说不会cover所有情况
Thread creation → function call
- Thread被转换为function
- at context-switches / when a thread is terminated, either:
  - the active thread is terminated / the thread that has called it is resumed (if any)
  - or a not yet scheduled thread is started (by calling its main function)