• 主页
  • 相册
  • 随笔
  • 目录
  • 存档
Total 244
Search AboutMe

  • 主页
  • 相册
  • 随笔
  • 目录
  • 存档

GOTO Language Syntax Grammar

2022-04-22

1. Goto Language

1.1. Background

  • why we need it
    • main goal: replacement of swithc and for while to if goto
    • add checking Verificiation Condition

Generally speaking, it’s a simplified from original source program, with the replacement of some expression.

It compiles programs given in a C and C++ into GOTO-programs and serialises into a binary file. This goto-binary can be later deserialised and read in with the option “–binary”.

Noted that this representation makes it very easy to interpret the program, keeping just two pieces of state: the current position (which basic block and which line), and the values of all variables (in real software, this would also include parts of the heap and the call stack). Execution proceeds as follows: as long as there are still instructions to execute in the current basic block, run the next instruction and move the current position to right after that instruction. At the end of a basic block, take one of the available outgoing edges to another basic block.

We first generate the symbol table. Then the GOTO converter will use this symbol table to produce the GOTO program. Noted that GOTO programs
are the language-independent and context-free IR in ESBMC.

2. post-processing

2.1. goto checking

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void goto_checkt::div_by_zero_check(
const expr2tc &expr,
const guardt &guard,
const locationt &loc)
{
if(disable_div_by_zero_check)
return;

assert(is_div2t(expr) || is_modulus2t(expr));

// add divison by zero subgoal
expr2tc side_2;
if(is_div2t(expr))
side_2 = to_div2t(expr).side_2;
else
side_2 = to_modulus2t(expr).side_2;

expr2tc zero = gen_zero(side_2->type);
assert(!is_nil_expr(zero));

add_guarded_claim(
notequal2tc(side_2, zero),
"division by zero",
"division-by-zero",
loc,
guard);
}

Division by zero.

1
2
3
4
5
6
7
8
9
10
11
  signed int x;
x=1;
IF !(x == 1) THEN GOTO 1
ASSERT 0 // division by zero
x=x % 0;
GOTO 2
1: ASSERT 0 // division by zero
x=x / 0;
2: dead x;
RETURN: NONDET(signed int)
END_FUNCTION // main
1
2
3
4
5
6
7
8
9
  signed int x;
x=1;
IF !(x == 1) THEN GOTO 1
x=x % 0;
GOTO 2
1: x=x / 0;
2: dead x;
RETURN: NONDET(signed int)
END_FUNCTION // main

In ESBMC, the options associated with the goto checking can be listed as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
--no-bounds-check                do not do array bounds check
--no-div-by-zero-check do not do division by zero check
--no-pointer-check do not do pointer check
--no-align-check do not check pointer alignment
--no-pointer-relation-check do not check pointer relations
--nan-check check floating-point for NaN
--memory-leak-check enable memory leak check
--overflow-check enable arithmetic over- and underflow check
--struct-fields-check enable over-sized read checks for struct
fields
--deadlock-check enable global and local deadlock check with
mutex
--data-races-check enable data races check
--lock-order-check enable for lock acquisition ordering check
--atomicity-check enable atomicity check at visible
assignments
--stack-limit bits (=-1) check if stack limit is respected
--error-label label check if label is unreachable
--force-malloc-success do not check for malloc/new failure

3. GOTO grammar

4. Expression

  • From the aspect of goto program, an expression (expr2t) is formed by two properties:
    • expr_type (type2t): represents the type of an expression.
    • expr_ids (expr_ids): used for distinguishing different classes of expression
1
2
expr  ::=   expr_type + expr_ids  
| expr
  • expr_type

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    expr_type   ::=
    | bool_id
    | empty_id
    | symbol_id
    | struct_id
    | union_id
    | code_id
    | array_id
    | pointer_id
    | unsignedbv_id
    | signedbv_id
    | fixedbv_id
    | floatbv_id
    | string_id
    | cpp_name_id
    | end_type_id
  • expr_ids

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    expr_ids     ::=
    | constant_int_id
    | constant_fixedbv_id
    | constant_floatbv_id
    | constant_bool_id
    | constant_string_id
    | constant_struct_id
    | constant_union_id
    | constant_array_id
    | constant_array_of_id
    ...
    | code_block_id
    | code_assign_id
    | code_init_id
    ...
    |   end_expr_id

5. Goto Instruction

In ESBMC, a goto instruction (goto_programt::instructiont) is defined by three properties– The meaning of an instruction depends on the instruction_type (goto_program_instruction_typet) field, while different kinds of instructions make use of the fields guard (expr2tc) and code (expr2tc) for different purposes.

  • In other words, these are the properties that need to be considered when initialising a goto-instruction data structure, any other property could be set default or generated afterwards.
  • expr2tc is basically a reference pointer container for expr2t. In this situation, there is no need to distinguish these two.

The syntax grammar can be roughly listed as follows.

1
2
3
4
Instruction ::= 
| (instruction_type)
| (instruction_type, guard)
| (instruction_type, code)
  • instruction_type: an enum value describing the action performed by this instruction.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    instruction_type    ::=
    | NO_INSTRUCTION_TYPE
    | GOTO
    | ASSUME
    | ASSERT
    | OTHER
    | SKIP
    | LOCATION
    | END_FUNCTION
    | ATOMIC_BEGIN
    | ATOMIC_END
    | RETURN
    | ASSIGN
    | DECL
    | DEAD
    | FUNCTION_CALL
    | THROW
    | CATCH
    | THROW_DECL
    | THROW_DECL_END
  • code: represents the code expression whose id is a subset of expr_ids:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    code.ids     ::=
    | code_block_id
    | code_assign_id
    | code_init_id
    | code_decl_id
    | code_dead_id
    | code_printf_id
    | code_expression_id
    | code_return_id
    | code_skip_id
    | code_free_id
    | code_goto_id
    | code_asm_id
    | code_function_call_id
    | code_comma_id
    | code_cpp_del_array_id
    | code_cpp_delete_id
    | code_cpp_catch_id
    | code_cpp_throw_id
    | code_cpp_throw_decl_id
    | code_cpp_throw_decl_end_id
  • guard: an (arbitrarily complex) expression (usually an exprt) of Boolean type, whose id is a subset of expr_ids.

    1
    2
    3
    4
    5
    guard.id  ::=  
    | constant_bool_id
    | constant_int_id
    | constant_floatbv_id
    | constant_fixedbv_id

The detailed syntax grammar can be listed as follows. For simplification, we use the id of a expression directly to represent that expression, and * is denoted to represent a list of expressions/ instrucitons.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Instruction ::=
| NO_INSTRUCTION_TYPE
| GOTO
| GOTO + guard
| ASSUME + guard
| ASSERT + guard
| OTHER + code_expression_id
| OTHER + code_free_id
| OTHER + code_printf_id
| OTHER + code_asm_id
| OTHER + code_cpp_del_array_id
| OTHER + code_cpp_delete_id
| SKIP
| LOCATION
| ATOMIC_BEGIN + Instruction* + ATOMIC_END
| RETURN
| RETURN + code_return_id
| ASSIGN + code_assign_id
| DECL + code_decl_id
| DEAD + code_dead_id
| FUNCTION_CALL + code_function_call_id
| THROW + code_cpp_throw_id*
| CATCH + code_cpp_catch_id*
| THROW_DECL + code_cpp_throw_decl_id*
| THROW_DECL_END + code_cpp_throw_decl_end_id*
  • NO_INSTRUCTION_TYPE: Instruction will be set to NO_INSTRUCTION_TYPE if it is not explicitly defined.
  • GOTO: GOTO targets if and only if guard is true. The guard will be set to TRUE if it is not explicitly given.
  • ASSUME: This thread of execution waits for guard to evaluate to TRUE, which performs a non-failing guarded self loop.
  • ASSERT: An assertion is TRUE / safe if guard is TRUE in all possible executions, otherwise it is FALSE / unsafe. This instructions is used to express properties to be verified
  • OTHER: Represents an expression that gets evaluated, but does not have any other effect on execution, i.e. doesn’t contain a call or assignment.
  • SKIP: Just advance the PC.
  • LOCATION: Semantically like SKIP.
  • ATOMIC_BEGIN, ATOMIC_END: Marks/ Ends a block without interleavings. When a thread executes ATOMIC_BEGIN, no thread other will be able to execute any Instruction until the same thread executes ATOMIC_END.
  • RETURN: Set the value returned by code (which shall be either nil or an instance of code_return_id) and then jump to the end of the function.
  • ASSIGN: Update the left-hand side of code (an instance of code_assign_id) to the value of the right-hand side.
  • DECL: Introduces a symbol denoted by the field code (an instance of code_decl_id). Semantically, the life-time of which is bounded by a corresponding DEAD instruction.
  • DEAD: Ends the life of the symbol denoted by the field code.
  • FUNCTION_CALL: Invoke the function denoted by field code (an instance of code_function_call_id).
  • THROW: Throw an exception. Throw $exception_1, …, exception_N$ where the list of exceptions is extracted from the code field
  • CATCH: Catch an exception.
  • THROW_DECL: List of throws that a function can throw.
  • THROW_DECL_END: End of throw declaration.

6. Goto Program

An instance of goto program (goto_programt) is effectively a list of instructions, and must end with END_FUNCTION.

  • END_FUNCTION: Must occur as the last instruction of the list and nowhere else.

7. GOTO Conversion

Fig 1. GOTO Conversion

Each symbol in the symbol table with function type (that is, the symbol’s type field contains a code_typet) will be converted to a corresponding GOTO program. The conversion happens in two phases:

  1. goto_convertt::convert turns each codet in the symbol table into corresponding GOTO instructions
  2. goto_convertt::finish_gotos and others (i.e. goto_convertt::optimize_guarded_gotos) populate the GOTO and CATCH instructions’ targets members, pointing to their possible successors. DEAD instructions are also added when GOTO instructions branch out of one or more lexical blocks

codet represents statements, which is a subset of expression( inherit from exprt).

  • exprt/ typet:
  • Statement:
  • Operands:

The grammar of a (By default.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
            statment        operand(s) 
------------------------------------------------
codet ::=
| block
| decl op0
| decl op0, op1
| decl-block
| expression op0
| assign op0, op1
| init op0, op1
| assert op0
| assume
| function_call
| label op0
| switch_case op0, op1
| for op0, op1, op2, op3
| while op0, op1
| dowhile op0, op1
| switch
| break
| return
| continue
| goto
| skip
| non-deterministic-goto
| ifthenelse
| atomic_begin
| atomic_end
| cpp_delete
| cpp_delete[]
| cpp-catch
| throw_decl
| throw_decl_end
| dead

7.1. Stage 1: Conversion from statement to goto-instruction

  • Note that did not cover the case that atomicity-check
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Function                  Input (codet)                 Output (codet/instructiont)
------------------------------------------------------------------------------------
convert_block (block) (codet*)
convert_decl (decl, op0) DECL
(decl, op0, op1) DECL ASSIGN
convert_decl_block (decl-block) (codet*)
convert_expression (expression, op0) (ifthenelse)
(expression, op0) OTHER
convert_assign (assign, op0, op1) ASSIGN
convert_init (init, op0, op1) (assign)
convert_assert (assert, op0) ASSERT
convert_assume (assume, op0) ASSUME
convert_function_call (function_call) (codet)
convert_label (label, op0) (codet)
convert_switch_case (switch_case, op0, op1) (codet)
convert_for (for, op0, op1, op2, op3) NO_INSTRUCTION_TYPE SKIP GOTO
(for, op0, op1, op2, op3) NO_INSTRUCTION_TYPE SKIP SKIP GOTO
convert_while (while, op0, op1) SKIP GOTO
convert_dowhile (dowhile, op0, op1) NO_INSTRUCTION_TYPE SKIP GOTO
convert_switch (switch) NO_INSTRUCTION_TYPE SKIP GOTO*
convert_break (break) GOTO
...
  • block: By goto_convertt::convert_blockConvert each expression (exprt) , which is stored as operand (op), to codet. Each new codet then converts to goto-instruction later.

    1
    2
    3
    4
    5
    {        // block begin
    int x; // op[0] => codet(decl)
    x=1; // op[1] => codet(assign)
    ...
    } // block end
  • decl: takes one or two operands and always expects symbol as first operand (op0).

    • if one, then generates a DECL instruction only.
    • if two, first the statement breaks up into decl and assignment, then generates a DECL instruction and a ASSIGN instruction.
      1
      2
      int x; // one operand, will be converted to DECL
      int x = 1; // two operand, will be spilted to two sub statement and converted to DECL and ASSIGN
  • decl-block: Semantically like block.

  • expression: takes one operand

    • if
    • else
  • assign: takes two operands, denoted as lhs(op0) and rhs(op1)

    • d
    1
    2
    3
    4
    5
    x = func()      // function_call 
    x = new class() // cpp_new
    x = new int[10] // cpp_new[]
    x = y // other cases
    ...

7.2. Stage 2: target generation and optimisation

7.3. Example

Fig 2. C Program Being Converted Into GOTO Code
source: cprover: goto-conversion

8. Referrence

The syntax grammar are referred to:

  • goto_*.h/ goto_*.cpp
  • symex_*.h/ symex_*.cpp
  • irep2_*.h/irep2_*.cpp
  • cprover: goto-programs
  • cprover: util

During this process, instruction can be very similar comparing to original expression, for instance, the code expression \lstinline{assert(x==1)} compares to its GOTO instrucitons form \lstinline{ASSERT x == 1}.

  • sec
  • Security
  • ESBMC
ESBMC Symbol Table
ESBMC-An Efficient SMT-based Software Model Checker
  1. 1. 1. Goto Language
    1. 1.1. 1.1. Background
  2. 2. 2. post-processing
    1. 2.1. 2.1. goto checking
  3. 3. 3. GOTO grammar
  4. 4. 4. Expression
  5. 5. 5. Goto Instruction
  6. 6. 6. Goto Program
  7. 7. 7. GOTO Conversion
    1. 7.1. 7.1. Stage 1: Conversion from statement to goto-instruction
    2. 7.2. 7.2. Stage 2: target generation and optimisation
    3. 7.3. 7.3. Example
  8. 8. 8. Referrence
© 2024 何决云 载入天数...