1. Implementing Continuation based language in GCC

        Information Engineering,
    University of the Ryukyus
    Shinji KONO, Kento YOGI
  2. A Continuation based Language

        Suitable for writing application
    Verification aware
    Compatibility with C
    No function call
    No environment on heap (unlike scheme)
    Satiable for Embedded System
    Now Available in http://sourceforge.jp/projects/cbc/

    Hands out in http://www.ie.u-ryukyu.ac.jp/~kono/paper.html

  3. Practical Programming Language

    I don't say OCAML is not practcal, but

    We'd like to use continuation in C, or

    Existing application written in C.

    setjmp/try-catch is close, but too expesive to use.

        So we designed Continuation based C
  4. Continuation based C

    code

       struct interface1  { int i; };
    struct interface2 { int o; };
    __code f(struct interface1 a) {
    struct interface2 b; b.o=a.i;
    goto g(b);
    }
  5. What's this?

    Just a segment of code

    Architecture Independent Assembler

    State Machine like programming language

    Programming Language lower than C

    Hardware Description Language

    Specification Language

        Do we have to write everything in code segments?
    It is possible, but too cruel... for human...
  6. Intermix with C

       void *env;
    __code (*exit)(int);
    __code h(char *s) {
    printf(s);
    goto (*exit)(0),env;
    }
    int main() {
    env = __environment; exit = __return;
    goto h("hello World\n");
    }
  7. Goto with Environment

    __envrionment is a actually a frame pointer.

    __return is a actually a address of part of function.

    goto f(1),env; changes frame pointer and go.

       Implementation is easy
    Wanna return back to the middle of the function?
       Insert extra function call for that.
  8. Is this a Continuation?

    If everything is written in CbC, there are no environment.

    All states are kept in interface or global variable.

    Continuation it self is an address.

    A kind of light weight continuation.

        vs. scheme continuation (entire stack image)
  9. What's good?

        Thread Scheduler 
    Context Switch
    Synchronization Primitives
    I/O wait semantics
    Cell Simulator written in CbC
  10. Scheduler example

    CbC can write thread schedule by itself.

        goto scheduler(self, task_list);
    code scheduler(Thread self,TaskPtr list)
    {
    TaskPtr t = list;
    TaskPtr e;
    list = list->next;
    goto list->thread->next(list->thread,list);
    }
    State enumeration can be written in itself.
        goto record_and_search_state(self, task_list);
  11. Self Verification

    Slower than SPIN, but faster than Java PathFinder.

    State enumeration of Dining Philosophers .


    process states/sec CbC states/sec SPIN sec JPF
    5 38,984 / 0.66 94 / 0.008 3.98
    6 159,299 / 3.79 212 / 0.01 7.33
    7 845,529 / 27.59 494 / 0.03 26.29
    8 3915,727 / 199.80 1,172 / 0.04 123.16

    Advantage.

        We can modify unit of interleaving (by hand)
  12. Micro C implementation

    We have our own full set C compiler

    Very few optimization

    Suitable for experiments

    Not so fast

    It is difficult to make good compiler

    If we have GCC quality compiler implementation...

  13. GNU CC implementation

    GNU C version 4.2.3 (i386-apple-darwin9.2.2)

    Idea is very old (2002).

        Forcing C tail call elimination for all code segment. 
    gcc before version 4.x is difficult to handle.
        limited capability of Tail Call Elimination (TCE).
    gcc version 4.x is Ok. Implementation is done by B4 (Kento YOGI).
        (a very good student)

  14. GCC structure

    Syntax tree and RTL is architecture indepent.

    RTL: pseudo machine code

    code

    We implement CbC in Parser and RTL generation part.

  15. Tail Call Elimination

        int f() { return g(); }
    _f: // Normal code
    subl $12, %esp
    call _g
    addl $12, %esp
    ret
    _f: // Tail Call Eliminated code
    jmp _g
  16. To force tail call elimination

        __code is a type
    restriction on void function
    Unlike C-- or Scheme, special syntax is necessary for
    code segment.

        no need for TCE check.
    Special treatment for code segment is possible
    different Binary API
    Architecture independent (in theory...)
  17. To force tail call elimination

    GCC expand_call() (generate rtx from syntax tree)

        4463 lines   18527  145469 calls.c
    CbC expand_cbc_goto() for code segment
         783 lines    2935   23651 cbc-goto.h       
    Actually code segment implementation is easier.
    It is free from complicated binary API (including
    varargs. )

  18. Parallel Assignment

    Consider the next code,

        __code carg4(struct arg args0,struct arg args1,
    int i, int j,int k,int l)
    {
    goto carg5(args1,args0,j,k,l,i);
    }
    GCC simply give up TCE in this case.

    Check overrap and saving in stack is necessary.

  19. Not yet done

    On going.
        goto with environment , __return+, __environment
    Investigating
        PowerPC, SPU is not working now.
    TCE it is not working well in this architecture
    unknown insn
    Theoretically architecture independent, but...

    Version dependency. GCC changes drastically even in
    case of minor change. (4.2.1 to 4.2.3), currently 4.3.0
    is not supported.

  20. Bench Mark (function call version)

        f0(int i) {
    int k,j;
    k = 3+i; j = g0(i+3);
    return k+4+j;
    }
    g0(int i) {
    return h0(i+4)+i;
    }
    h0(int i) {
    return i+4;
    }
    Artificial bench mark with heavy sub routing call overhead.

  21. CPS transformed source (1/1)

        typedef char *stack;
    struct cont_interface {
    // General Return Continuation
    __code (*ret)();
    };
    __code f(int i,stack sp) {
    int k,j;
    k = 3+i;
    goto f_g0(i,k,sp);
    }
    struct f_g0_interface {
    // Specialized Return Continuation
    __code (*ret)();
    int i_,k_,j_;
    };
  22. CPS transformed source (2/1)

        __code f_g1(int j,stack sp);
    __code f_g0(int i,int k,stack sp) { // Caller
    struct f_g0_interface *c =
    (struct f_g0_interface *)(
    sp -= sizeof(struct f_g0_interface));
    c->ret = f_g1;
    c->k_ = k;
    c->i_ = i;
    goto g(i+3,sp);
    }
    __code f_g1(int j,stack sp) { // Continuation
    struct f_g0_interface *c =
    (struct f_g0_interface *)sp;
    int k = c->k_;
    sp+=sizeof(struct f_g0_interface);
    c = (struct f_g0_interface *)sp;
    goto (c->ret)(k+4+j,sp);
    }
  23. CPS transformed source (3/1)

        __code g_h1(int j,stack sp);
    __code g(int i,stack sp) { // Caller
    struct f_g0_interface *c =
    (struct f_g0_interface *)(
    sp -= sizeof(struct f_g0_interface));
    c->ret = g_h1;
    c->i_ = i;
    goto h(i+3,sp);
    }
    __code g_h1(int j,stack sp) {
    // Continuation
    struct f_g0_interface *c =
    (struct f_g0_interface *)sp;
    int i = c->i_;
    sp+=sizeof(struct f_g0_interface);
    c = (struct f_g0_interface *)sp;
    goto (c->ret)(j+i,sp);
    }
  24. CPS transformed source (4/1)

    This is awfully long...

        __code h(int i,stack sp) {
    struct f_g0_interface *c =
    (struct f_g0_interface *)sp;
    goto (c->ret)(i+4,sp);
    }
    struct main_continuation {
    // General Return Continuation
    __code (*ret)();
    __code (*main_ret)();
    void *env;
    };
    __code main_return(int i,stack sp) {
    if (loop-->0)
    goto f(233,sp);
    printf("#0103:%d ",i);
    goto (( (struct main_continuation *)sp)->main_ret)(0),
    ((struct main_continuation *)sp)->env;
    }
  25. CPS transformed source (1/2)

        __code f2_1(int i,char *sp) {
    int k,j;
    k = 3+i;
    goto g2_1(k,i+3,sp);
    }
    __code g2_1(int k,int i,char *sp) {
    goto h2_11(k,i+4,sp);
    }
    __code f2_0_1(int k,int j,char *sp);
    __code h2_1_1(int i,int k,int j,char *sp) {
    goto f2_0_1(k,i+j,sp);
    }
  26. CPS transformed source (2/2)

    Slightly short.

        __code h2_11(int i,int k,char *sp) {
    goto h2_1_1(i,k,i+4,sp);
    }
    __code f2_0_1(int k,int j,char *sp) {
    goto (( (struct cont_interface *)
    sp)->ret)(k+4+j,sp);
    }
    __code main_return2_1(int i,stack sp) {
    if (loop-->0)
    goto f2_1(233,sp);
    printf("#0165:%d ",i);
    goto (( (struct main_continuation *)sp)->main_ret)(0),
    ((struct main_continuation *)sp)->env;
    }
  27. Bench mark result


    compiler ./conv1 1
    function call
    ./conv1 2 (CPS 1) ./conv1 3 (CPS 2)
    icro-C 8.97 2.19 2.73
    CC 4.87 3.08 3.65
    CC (+omit) 4.20 2.25 2.76
    CC (+fast) 3.44 1.76 2.34

    Micro-C, GCC bench mark (in sec)

    Don't think it is general result. This is an artificial bench mark.

  28. Function call is legacy

        Have to follow classical ABI
    Have to save many registers
    for possible future use
    Code segment is too fine grain

        It is not for human
  29. Conclusion

    Now we are going to have practical language with very fast continuation.

    Available in https://sourceforge.jp/projects/cbc/

  30. On going project

        Implementing goto with environment in GCC
    CbC based Many Core Kernel
    (Cerium Engine on PS3)
    CbC Self Model Checker
    C2CbC converter
  31. Cerium Engine on PS3

        CbC based Many Core Kernel
    • like SPURS
    • Scene Graph based
      State Pattern

  32. Cerium Engine on PS3

    Sequential Implementation in CwC

    Generate code segment

    Execute each code segment on each Core

  33. Implementing goto in gcc

    Is it difficult?

    No, I think it is a matter of time...