コンパイラ作成(66) ブロック内での変数宣言

今回の目標

ブロック{...}を追加して、その中での変数宣言にも対応するよ。

// block中の変数宣言
extern int puts(char *str);

int main()
{
    int a = 10, b = 20;
    printf("1: a = %d b = %d\n",a,b);
    {
        int a = 55;
        printf("2: a = %d b = %d\n",a,b);
    }
    int c = 30;
    printf("3: a = %d b = %d c = %d\n",a,b,c);
    a = 100;
    printf("4: a = %d b = %d c = %d\n",a,b,c);
}

頑張ってwhile文とかのブロック中での変数宣言にも対応するよ。

initialize

ブロック情報管理用の変数を二つ追加。

  # コンストラクタ
  def initialize(fname)
    @fname = fname                        # ソースファイルのファイル名
    @asmfname = fname.sub(/\.myc$/,'.s')  # アセンブリコードのファイル名
    @regs32 = ["edi", "esi","edx","ecx","r8d","r9d"]  # 32bitレジスタ
    @regs64 = ["rdi", "rsi","rdx","rcx","r8", "r9" ]  # 64bitレジスタ
    @lex = Lexer.new(@fname)              # 字句解析
    @funcname = nil                       # 現在処理している関数名
    @labelcnt = nil                       # 自動生成するラベルの個数(関数単位)
    @literalcnt = 0                       # 文字列リテラルの数
    @literaltable = []                    # 文字列リテラルのリスト
    @functions = Hash.new                 # 関数
    @lvars = nil                          # ローカル変数
    @lvarsize = nil                       # スタックに確保する領域のサイズ
    @breaklabel = nil                     # breakの飛び先のラベル
    @codebuffer = []                      # コードバッファ
    @numuseregs = 0                       # 関数コールで使用しているレジスタの数
    @numblock = nil                       # blockの個数
    @blocks = nil                         # ネストしたblock(["B2#","B1#",""])
  end

ブロックにそれぞれ名前を付ける。B1#、B2#って具合にね。ネストしたブロックを表すのにblocksってArrayを使って、["B2#","B1#",""]みたいに表現するよ。左から順に内側のブロックで一番右の""は関数のブロック。これは名無しにするよ。

ヘルパーメソッド

メソッドを三つ追加。

  # 変数情報取り出し
  def get_var(var)
    # ネストしたブロックのどこかにあるか?
    @blocks.each do |blk|
      v = @lvars[blk + var]
      if v then return v end
    end
    # どこにもなかった
    return nil
  end

  # 変数チェック
  def check_var(var)
    # 現在のblockで宣言されているか?
    return @lvars[@blocks[0] + var]
  end

  # 変数情報登録
  def set_var(var,info)
    @lvars[@blocks[0] + var] = info
  end

ブロックの内側の変数はa=>B1#aって感じで変数名に飾りを付けてlvarsに登録する。その為、今後は変数情報lvarsにアクセスするときはこのメソッドを通して行うよ。

statement

ブロックの処理を追加。

    elsif kind == TK::SYMBOL && str == "{" then
      # blockの処理
      @numblock += 1
      @blocks.unshift "B#{@numblock}#"
      kind, str = block
      @blocks.shift
      kind, str = @lex.gettoken
      return kind, str

blockメソッドの前後でblocks情報を弄ってるよ。

    elsif kind == TK::RESERVE && str == "if" then
      # if文の処理
      @labelcnt += 1
      else_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      @labelcnt += 1
      exit_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != "(" then perror end
      kind, str = @lex.gettoken
      kind, str = expr kind, str
      if kind != TK::SYMBOL || str != ")" then perror end
      codegen "  jz   " + else_label
      kind, str = @lex.gettoken
      if kind == TK::SYMBOL && str == "{" then
        @numblock += 1
        @blocks.unshift "B#{@numblock}#"
        kind, str = block
        @blocks.shift
        kind, str = @lex.gettoken
      else
        kind, str = statement kind, str
      end
      if kind == TK::RESERVE && str == "else" then
        codegen "  jmp  " + exit_label
        codegen else_label + ":"
        kind, str = @lex.gettoken
        if kind == TK::SYMBOL && str == "{" then
          @numblock += 1
          @blocks.unshift "B#{@numblock}#"
          kind, str = block
          @blocks.shift
          kind, str = @lex.gettoken
        else
          kind, str = statement kind, str
        end
      else
        codegen else_label + ":"
      end
      codegen exit_label + ":"
      return kind, str
    elsif kind == TK::RESERVE && str == "while" then
      # while文の処理
      @labelcnt += 1
      cond_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      @labelcnt += 1
      exit_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      breaklabelsave = @breaklabel
      @breaklabel = exit_label
      codegen cond_label + ":"
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != "(" then perror end
      kind, str = @lex.gettoken
      kind, str = expr kind, str
      if kind != TK::SYMBOL || str != ")" then perror end
      codegen "  jz   " + exit_label
      kind, str = @lex.gettoken
      if kind == TK::SYMBOL && str == "{" then
        @numblock += 1
        @blocks.unshift "B#{@numblock}#"
        kind, str = block
        @blocks.shift
        kind, str = @lex.gettoken
      else
        kind, str = statement kind, str
      end
      codegen "  jmp  " + cond_label
      codegen exit_label + ":"
      @breaklabel = breaklabelsave
      return kind, str
    elsif kind == TK::RESERVE && str == "for" then
      # for文の処理
      @labelcnt += 1
      cond_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      @labelcnt += 1
      cont_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      @labelcnt += 1
      body_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      @labelcnt += 1
      exit_label = ".LBB_" + @funcname + "_" + @labelcnt.to_s
      breaklabelsave = @breaklabel
      @breaklabel = exit_label
      @numblock += 1
      @blocks.unshift "B#{@numblock}#"
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != "(" then perror end
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != ";" then
        kind, str = expr kind, str
      end
      if kind != TK::SYMBOL || str != ";" then perror end
      codegen cond_label + ":"
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != ";" then
        kind, str = expr kind, str
        codegen "  jz   " + exit_label
      end
      codegen "  jmp  " + body_label
      if kind != TK::SYMBOL || str != ";" then perror end
      codegen cont_label + ":"
      kind, str = @lex.gettoken
      if kind != TK::SYMBOL || str != ")" then
        kind, str = expr kind, str
      end
      if kind != TK::SYMBOL || str != ")" then perror end
      codegen "  jmp  " + cond_label
      codegen body_label + ":"
      kind, str = @lex.gettoken
      if kind == TK::SYMBOL && str == "{" then
        kind, str = block
        kind, str = @lex.gettoken
      else
        kind, str = statement kind, str
      end
      codegen "  jmp  " + cont_label
      codegen exit_label + ":"
      @blocks.shift
      @breaklabel = breaklabelsave
      return kind, str

if文、while文、for文のブロックでも同じようにやってるよ。

    elsif kind == TK::TYPE then
      # 変数宣言の処理
      basetype = str
      loop do
        type = basetype
        kind, str = @lex.gettoken
        if kind == TK::SYMBOL && str == "*" then
          type += str
          kind, str = @lex.gettoken
        end
        if kind != TK::ID then perror end
        print "var "+str+"\n" if $opt_d
        @lvarsize += sizeof(type)
        if check_var str then
          perror "redefinition variable \"" + str +"\""
        end
        set_var str, [type,@lvarsize]
        skind, sstr = @lex.gettoken
        if skind == TK::SYMBOL && sstr == "=" then
          kind, str = expr2 kind, str, skind, sstr;
        else
          kind, str = skind, sstr;
        end
        if kind != TK::SYMBOL || str != "," then break end
      end
      if kind != TK::SYMBOL || str != ";" then
        perror "expected ';' after variables"
      end

check_var、set_varで処理するよう変更。

function

引数の処理部。

    # 引数の処理
    kind, str = @lex.gettoken
    loop do
      if kind == TK::SYMBOL && str == ")" then break end
      if kind == TK::TYPE then
        if str == "extern" then perror "invalid 'extern'" end
        type = str
        kind, str = @lex.gettoken
        if kind == TK::SYMBOL && str == "*" then
          type += str
          kind, str = @lex.gettoken
        end
        paratype << type
        if kind != TK::ID then perror "wrong parameter name" end
        print "para "+str+"\n" if $opt_d
        size = sizeof type
        @lvarsize += size
        parametersize << size
        if check_var str then perror "redefinition parameter \"" + str +"\"" end
        set_var str, [type,@lvarsize]
      else
        perror
      end
      kind, str = @lex.gettoken
      if kind == TK::SYMBOL && str == "," then
        kind, str = @lex.gettoken
      end
    end

ここもcheck_var、set_varに変更。

コード生成部

最後にここも修正。

  # 代入のコード生成
  def codegen_assign(el)
    if el.size != 3 then perror end
    type_r = codegen_el [el[2]]
    if el[0].kind_of?(Array) then perror end
    if el[0].kind != TK::ID then perror end
    v = get_var el[0].str
    if v == nil then
      perror "undeclared variable \"" + el[0].str + "\""
    end
    type_l = v[0]
    if type_r == "void*" && is_pointer_type?(type_l) then
      type_r = type_l    # 暗黙の型変換
    end
    if type_l != type_r then perror end
    if type_l == "char*" then
      codegen "  mov  qword ptr [rbp - " + v[1].to_s + "], rax"
    else
      codegen "  mov  dword ptr [rbp - " + v[1].to_s + "], eax"
    end
    return type_l
  end

  # 式のコード生成(二項演算の左側被演算子)
  def codegen_elf(operand)
    type = "int"
    if operand.kind_of?(Array) then
      if !operand[0].kind_of?(Array) && operand[0].kind == TK::ID && operand[1].str == "()" then
        type = codegen_func operand
      else
        type = codegen_el operand
      end
    elsif operand.kind == TK::NUMBER then
      codegen "  mov  eax, " + operand.str
    elsif operand.kind == TK::ID then
      v = get_var operand.str
      if v == nil then
        perror "undeclared variable \"" + operand.str + "\""
      end
      type = v[0]
      if type == "char*"
        codegen "  mov  rax, qword ptr [rbp - " + v[1].to_s + "]"
      else
        codegen "  mov  eax, dword ptr [rbp - " + v[1].to_s + "]"
      end
    elsif operand.kind == TK::STRING then
      type = "char*"
      label = addliteral operand.str
      codegen "  lea  rax, "+label
    else
      perror
    end
    return type
  end

  # 式のコード生成(二項演算の右側被演算子)
  def codegen_els(op, operand, type_l)
    if op.str == "+" then
      ostr = "add "
    elsif op.str == "-" then
      ostr = "sub "
    elsif op.str == "*" then
      ostr = "imul"
    elsif op.str == "/" then
      ostr = "idiv"
    elsif op.str == "%" then
      ostr = "idiv"
    elsif op.str == "==" then
      ostr = "cmp "
    elsif op.str == "!=" then
      ostr = "cmp "
    elsif op.str == "<" || op.str == "<" || op.str == ">" || op.str == "<=" || op.str == ">=" then
      ostr = "cmp "
    else
      perror "unknown operator \"" + op.str + "\""
    end

    # 右被演算子を評価
    type_r = "int"
    if operand.kind_of?(Array) then
      if operand[0].size == 2 && operand[0].kind == TK::ID && operand[1].str == "()" then
        codegen "  sub  rsp, 8"
        codegen "  push rax"
        type_r = codegen_func operand
        codegen "  mov  r10d, eax"
        codegen "  pop  rax"
        codegen "  add  rsp, 8"
      else
        codegen "  sub  rsp, 8"
        codegen "  push rax"
        type_r = codegen_el operand
        codegen "  mov  r10d, eax"
        codegen "  pop  rax"
        codegen "  add  rsp, 8"
      end
      str = "r10d"
    elsif operand.kind == TK::ID then
      v = get_var operand.str
      if v == nil then
        perror "undeclared variable \"" + operand.str + "\""
      end
      type_r = v[0]
      if type_r == "char*"
        str = "qword ptr [rbp - " + v[1].to_s + "]"
      else
        str = "dword ptr [rbp - " + v[1].to_s + "]"
      end
    elsif operand.kind == TK::NUMBER then
      str = operand.str
    elsif operand.kind == TK::STRING then
      type_r = "char*"
      label = addliteral operand.str
      codegen "  lea  r10, "+label
      str = "r10"
    else
      perror
    end

    # 型チェック
    if type_l != type_r then
      if type_l == "char*" && type_r == "int" then
        if op.str != "+" && op.str != "-" then
          perror "mismatched types to binary operation"
        end
      elsif type_l == "int" && type_r == "char*" then
        if op.str != "+" && op.str != "-" then
          perror "mismatched types to binary operation"
        end
      else
        perror "mismatched types to binary operation"
      end
    elsif type_l == "char*" then
      perror "mismatched types to binary operation"
    end

    # 左被演算子と右被演算子とで計算
    if op.str == "==" then
      codegen "  " + ostr + " eax, " + str
      codegen "  sete al"
      codegen "  and  eax, 1"
    elsif op.str == "!=" then
      codegen "  " + ostr + " eax, " + str
      codegen "  setne al"
      codegen "  and  eax, 1"
    elsif op.str == "<" then
      codegen "  " + ostr + " eax, " + str
      codegen "  setl al"
      codegen "  and  eax, 1"
    elsif op.str == ">" then
      codegen "  " + ostr + " eax, " + str
      codegen "  setg al"
      codegen "  and  eax, 1"
    elsif op.str == "<=" then
      codegen "  " + ostr + " eax, " + str
      codegen "  setle al"
      codegen "  and  eax, 1"
    elsif op.str == ">=" then
      codegen "  " + ostr + " eax, " + str
      codegen "  setge al"
      codegen "  and  eax, 1"
    elsif op.str == "*" || op.str == "/" || op.str == "%" then
      if str != "r10d" then codegen "  mov  r10d, " + str end
      codegen "  mov  r11, rdx"
      if op.str == "/" || op.str == "%" then
        codegen "  cdq"
      end
      codegen "  " + ostr + " r10d"
      if op.str == "%" then
        codegen "  mov  eax, edx"
      end
      codegen "  mov  rdx, r11"
    else
      if type_l == "char*" && type_r == "int" then
        if str == op.str then
          codegen "  " + ostr + " rax, " + str
        elsif str == "r10d" then
          codegen "  movsx r10, r10d"
          codegen "  " + ostr + " rax, r10"
        else
          codegen "  mov  r10d, " + str
          codegen "  movsx r10, r10d"
          codegen "  " + ostr + " rax, r10"
        end
      elsif type_l == "int" && type_r == "char*" then
        codegen "  movsx rax, eax"
        codegen "  " + ostr + " rax, " + str
        type_l = "char*"
      else
        codegen "  " + ostr + " eax, " + str
       end
    end
    return type_l
  end

get_varに変更。今回は修正箇所が多かったよ。修正漏れが無いと良いんだけど。

動作テスト

それじゃ行くよ。

~/myc$ myc -d m31.myc
para str
var a
[a, =, 10]
[[a, =, 10]]
var b
[b, =, 20]
[[b, =, 20]]
[1: a = %d b = %d\n]
[1: a = %d b = %d\n]
[a]
[a]
[b]
[b]
[[printf, (), [1: a = %d b = %d\n], [a], [b]]]
[[printf, (), [1: a = %d b = %d\n], [a], [b]]]
var a
[a, =, 55]
[[a, =, 55]]
[2: a = %d b = %d\n]
[2: a = %d b = %d\n]
[a]
[a]
[b]
[b]
[[printf, (), [2: a = %d b = %d\n], [a], [b]]]
[[printf, (), [2: a = %d b = %d\n], [a], [b]]]
var c
[c, =, 30]
[[c, =, 30]]
[3: a = %d b = %d c = %d\n]
[3: a = %d b = %d c = %d\n]
[a]
[a]
[b]
[b]
[c]
[c]
[[printf, (), [3: a = %d b = %d c = %d\n], [a], [b], [c]]]
[[printf, (), [3: a = %d b = %d c = %d\n], [a], [b], [c]]]
[a, =, 100]
[[a, =, 100]]
[4: a = %d b = %d c = %d\n]
[4: a = %d b = %d c = %d\n]
[a]
[a]
[b]
[b]
[c]
[c]
[[printf, (), [4: a = %d b = %d c = %d\n], [a], [b], [c]]]
[[printf, (), [4: a = %d b = %d c = %d\n], [a], [b], [c]]]
{"a"=>["int", 4], "b"=>["int", 8], "B1#a"=>["int", 12], "c"=>["int", 16]}
{"puts"=>["int", ["char*"]], "main"=>["int", []]}
~/myc$ ./m31
1: a = 10 b = 20
2: a = 55 b = 20
3: a = 10 b = 20 c = 30
4: a = 100 b = 20 c = 30
~/myc$

お、ちゃんと動いてる。相変わらず俺以外が見ても良く分からないデバッグ情報だけど、設計通りブロックの内側の変数aがB1#aになってるよ。修正箇所が多かった割にテストは不足してるなあ。一個一個丁寧にやらないと拙いよなあ。
今回できればfor文の頭でのループ変数の宣言まで行きたかったけど、途中で力尽きてしまったよ。次回はその辺頑張るよ。