软件安全备忘录：Fundamentals

2022-01-31

1. Main Goal

Verified trustworthy software systems 经过验证的值得信赖的软件系统

Reliability
Availability
Safty
Resilience 复原能力
- in timely response to events

2. Standard Notions of Security 安全的标准概念

Safety vs Security

Safety: any input => desired output
Secruity: bad input => bad output
- wrong input does not cause failure or crashes

most sofeware system do not have precise准确 explict明确 security objectives

3. Software Security Problems

Why are there security vulnerabilities?

Limited number of courses in computer security
– Programming textbooks do not emphasize security
– Limited number of security audits
– Programmers are focused on implementing features
– Security is expensive and takes time

cve

Common Vulnerabilities and Exposures 公共漏洞和暴露
公共漏洞和暴露又称通用漏洞披露、常见漏洞与披露，是一个与信息安全有关的数据库，收集各种信息安全弱点及漏洞并给予编号以便于公众查阅。此数据库现由美国非营利组织MITRE所属的National Cybersecurity FFRDC所营运维护

3.1. Critical Software Vulnerabilitie

Null pointer dereference
- dereference 解引用
  - “*”（即星号），是一个单项操作符（即有一个操作数）
  - 它对一个指针变量进行操作，并返回一个指针地址的值

Double free

Unchecked Return Value to NULL Pointer Dereference

Division by zero
Missing free
Use after free
APIs rule based checking
Race Condition Vulnerability 竞争条件漏洞
Structured output generation vulnerabilitie
- SQL Injection
XSS
XXE
- XML External Entity (XXE)
- XXE occurs when XML input (incl. an external entity) is processed by a weakly configured XML parser
- XXE might lead to the disclosure披露 of confidential data

4. testing and verification techniques – two approaches

4.1. proof by induction 数学归纳法

base step
inductice step

4.2. Temporal Logic Model Checking 时态逻辑模型检查

2 branches of verification

deductive 演绎式
Model checking
- 可以automatic
Verification procedure is algorithmic rather than deductive in nature
- 验证程序是算法性的，而不是演绎性的
The assertions written as formulas in propositional temporal logice

Model checking

an automatic verification technique for finite state concurrent systems并发系统
- 模型检查是一种有限状态并发系统的自动验证技术
  4.2.1. Advantages
No proofs(Algorithmic rather than Deductive)
Fast (compared to other rigorous methods such as theorem proving) 与其他严格的方法相比，如定理证明
Diagnostic counterexamples 诊断性反例
No problem with partial specifications 部分规格
Logics can easily express many concurrency properties 逻辑学可以很容易地表达许多并发性属性

4.2.2. LTL - Linear Time Logic

Atomic Propositions 原子命题
Temporal operators 时间运算符

X即为next $◯$
F相当于eventually $◊$
G相当于Always $□$

4.2.3. Model Checking Problem

example

之前有道题是用if then来formalize的，这样会在concurrent的情况下出现bug，事实上经过z3验证确实会satisfy counterexample

formalization

M
- Let M be a state-transition graph
f
- Let ƒ be an assertion or specification in temporal logic
s
- Find all states s of M such that M, s satisfies ƒ

state-transition graph

Complexity

singly exponential in size of specification 与规格大小呈单指数关系
linear in size of state-transition graph 与状态转换图的大小呈线性关系

4.3. Bounded Model Checking(BMC) 有界模型检查

Basic idea: check negation of given property up to given depth

counterexample: 反例

有界模型检测的主要过程是:先把要验证的系统或模型构造为有限状态自动机(finite state machine,简称FSM),通过 FSM 状态间的转移来模拟系统或模型运行;要验证的规范说明用时序逻辑 LTL(linear-time temporal logic)进行说明(如 G(p),F(p));设定边界上界 K;FSM 状态间的转移关系和 LTL 逻辑规范否定的 NNF(negation normal form)公式通过逻辑与构成 BMC 转换公式;把 BMC 转换公式编码成 SAT 实例,通过 SAT 工具求解.若有解,则找到反例（_即原系统存在bug？_）;反之,若不可满足,则表明要验证的系统或模型运行到 K 阶段时,是安全的、没有错误的

K 阶段相当于SMT的重复k次？

4.3.1. 扩展到SMT

SMT decides the satisfiability of first-order logic formulae using the combination of different background theories

4.3.2. Steps for Software BMC

example 1

example 2

When getPassword is called, the location of the code to run when getPassword completes (the if statement) is put into a special location in memory called the stack. The top of the stack is pointed to by the internal stack pointer register (sp for short). When getPassword is called, the top of the stack contains the address of the if statement. The first thing that happens after getPassword is called is that sp is decreased by 4 to make room for the 4-character array buf (which is also stored on the stack). Next, the call to gets writes input characters into memory starting with the location pointed to by sp. When getPassword is finished, sp is increased by 4, reclaiming the memory occupied by buf, and then the instruction pointer register (ip for short) is loaded with the location pointed to by sp. We wish to determine whether it is possible to set ip to a value that we choose instead of the location of the if statement.
当getPassword被调用时，当getPassword完成时要运行的代码的位置（if语句）被放入内存中的一个特殊位置，称为堆栈。堆栈的顶部由内部堆栈指针寄存器（简称sp）来指出。当getPassword被调用时，堆栈的顶部包含if语句的地址。调用getPassword后发生的第一件事是sp减少4，以便为4个字符的数组buf（它也存储在堆栈中）腾出空间。接下来，对get的调用从sp指向的位置开始将输入字符写入内存。当getPassword完成后，sp增加4，回收buf占用的内存，然后指令指针寄存器（简称ip）被加载到sp指向的位置

We use a couple of encoding tricks here. First is something called static single assignment (SSA). SSA represents the same variable at different times by using different names for the different times. For example, the stack pointer is represented initially by sp0, and then later by sp1, and then finally by sp2. As the variable changes, the relationship between its previous value and new value is captured with a formula that relates them. For example, the expression sp1 = BVSUB(8,sp0,0bin100) tells us that the new value of sp is equal to the old value minus 4 (the first parameter of 8 is used to specify the bit-width of the operation).
我们在这里使用了几个编码技巧。首先是称为静态单一赋值（SSA）的东西。SSA通过在不同的时间使用不同的名字来表示同一个变量。例如，堆栈指针最初用sp0表示，后来用sp1，最后用sp2。随着变量的变化，它的前值和新值之间的关系用一个公式来捕捉，将它们联系起来。例如，表达式sp1 = BVSUB(8,sp0,0bin100)告诉我们sp的新值等于旧值减去4（第一个参数8用来指定操作的位宽）。

Another trick is loop unrolling. We have modelled the call to gets as a series of 5 writes to memory. This corresponds to an input of 5 characters, meaning that a loop that reads a single character would get executed 5 times. How do we know how many times to unroll the loop? In general, we don’t, but in practice we can start small and try more and more loop unrollings until we find a satisfying assignment or give up. For this example, if we use 4 or fewer unrollings, the solver returns unsat. However, with 5 characters, the result is sat. In particular, with an input of 5 characters, the last character of the input will become the new value of ip.
另一个技巧是循环解卷。我们把对gets的调用模拟成一系列对内存的5次写入。这对应于5个字符的输入，意味着一个读取单个字符的循环将被执行5次。我们怎么知道要展开多少次循环呢？一般来说，我们不知道，但在实践中，我们可以从小处开始，尝试越来越多的循环展开，直到找到一个满意的赋值或放弃。对于这个例子，如果我们使用4个或更少的unrollings，求解器会返回unsat。然而，如果使用5个字符，结果是sat。特别是，在输入5个字符的情况下，输入的最后一个字符将成为ip的新值

4.4. Context-Bounded Model Checking 基于上下文的模型检测

Idea: iteratively generate all possible interleavings交织 and call the BMC procedure on each interleaving
combines:
- symbolic model checking: on each individual interleaving
- explicit state model checking: explore all interleavings
- symbolic state hashing
- monotonic partial order reduction that combines dynamic POR with symbolic state space exploration

4.5. Lazy Exploration of the Reachability Tree

4.6. Predicate Abstraction

Example

It abstracts data by only keeping track of certain predicates to represent the data
Conservative approach reduces the state space, but generates spurious counter-examples

4.7. Combine Simulation and Verification

Improve coverage and reduce verification time by combining static and dynamic verification

simulation就当是csmith

4.8. BMC的克里普克结构(Kripke structure)

BMC 的克里普克结构(Kripke structure)为一四元组 M＝(S,I,T,ℓ),其中

$S$ 为 BMC 中 FSM 产生的所有状态集合;
$I⊆S$,I 为初时状态的集合;
$T⊆S×S$,T 为状态间转移关系的集合;、
$ℓ:S→P(A)$,ℓ 为标注状态的函数.

5. 参考

Problem Solving for the 21st Century - BKM14.pdf

1. Main Goal

2. Standard Notions of Security 安全的标准概念

3. Software Security Problems

3.1. Critical Software Vulnerabilitie

4. testing and verification techniques – two approaches

4.1. proof by induction 数学归纳法

4.2. Temporal Logic Model Checking 时态逻辑模型检查

4.2.1. Advantages

4.2.2. LTL - Linear Time Logic

4.2.3. Model Checking Problem

4.3. Bounded Model Checking(BMC) 有界模型检查

4.3.1. 扩展到SMT

4.3.2. Steps for Software BMC

4.4. Context-Bounded Model Checking 基于上下文的模型检测

4.5. Lazy Exploration of the Reachability Tree

4.6. Predicate Abstraction

4.7. Combine Simulation and Verification

4.8. BMC的克里普克结构(Kripke structure)

5. 参考