Implementing Continuation based language in GCC

    Information Engineering,

    University of the Ryukyus

    Shinji KONO, Kento YOGI

A Continuation based Language

    Suitable for writing application

    Verification aware

    Compatibility with C

    No function call

    No environment on heap (unlike scheme)

    Satiable for Embedded System

Now Available in http://sourceforge.jp/projects/cbc/

Hands out in http://www.ie.u-ryukyu.ac.jp/~kono/paper.html

Practical Programming Language
I don't say OCAML is not practcal, but

We'd like to use continuation in C, or

Existing application written in C.

setjmp/try-catch is close, but too expesive to use.
```
    So we designed Continuation based C
```

Continuation based C

   struct interface1  { int i; };

   struct interface2  { int o; };

   __code f(struct interface1 a) { 

       struct interface2 b; b.o=a.i;  

       goto g(b); 

   }

What's this?

Just a segment of code

Architecture Independent Assembler

State Machine like programming language

Programming Language lower than C

Hardware Description Language

Specification Language
```
    Do we have to write everything in code segments?

    It is possible, but too cruel... for human...
```

Intermix with C

   void *env;

   __code (*exit)(int);

   __code h(char *s) { 

        printf(s);

        goto (*exit)(0),env;

   }

   int main() {

        env  = __environment; exit = __return;

        goto h("hello World\n");

   }

Goto with Environment

__envrionment is a actually a frame pointer.

__return is a actually a address of part of function.

goto f(1),env; changes frame pointer and go.
```
   Implementation is easy
```
Wanna return back to the middle of the function?
```
   Insert extra function call for that.
```
Is this a Continuation?

If everything is written in CbC, there are no environment.

All states are kept in interface or global variable.

Continuation it self is an address.

A kind of light weight continuation.
```
    vs. scheme continuation (entire stack image)
```

What's good?

    Thread Scheduler 

    Context Switch

    Synchronization Primitives

    I/O wait semantics

    Cell Simulator written in CbC

Scheduler example

CbC can write thread schedule by itself.

    goto scheduler(self, task_list);

    code scheduler(Thread self,TaskPtr list)

    {

        TaskPtr t = list;

        TaskPtr e;

        list = list->next;

        goto list->thread->next(list->thread,list);

    }

State enumeration can be written in itself.

    goto record_and_search_state(self, task_list);

Self Verification

Slower than SPIN, but faster than Java PathFinder.

State enumeration of Dining Philosophers .

process states/sec CbC states/sec SPIN sec JPF

5 38,984 / 0.66 94 / 0.008 3.98

6 159,299 / 3.79 212 / 0.01 7.33

7 845,529 / 27.59 494 / 0.03 26.29

8 3915,727 / 199.80 1,172 / 0.04 123.16

Advantage.

    We can modify unit of interleaving (by hand)

Micro C implementation

We have our own full set C compiler

Very few optimization

Suitable for experiments

Not so fast

It is difficult to make good compiler

If we have GCC quality compiler implementation...
GNU CC implementation
GNU C version 4.2.3 (i386-apple-darwin9.2.2)

Idea is very old (2002).
```
    Forcing C tail call elimination for all code segment. 
```
gcc before version 4.x is difficult to handle.
```
    limited capability of Tail Call Elimination (TCE).
```
gcc version 4.x is Ok. Implementation is done by B4 (Kento YOGI).
```
    (a very good student)

  
```
GCC structure

Syntax tree and RTL is architecture indepent.

RTL: pseudo machine code

We implement CbC in Parser and RTL generation part.

Tail Call Elimination

    int f() { return g(); }

    _f:  // Normal code

            subl    $12, %esp

            call    _g

            addl    $12, %esp

            ret

    _f:  // Tail Call Eliminated code

            jmp     _g

To force tail call elimination

    __code is a type

    restriction on void function

Unlike C-- or Scheme, special syntax is necessary for
code segment.

    no need for TCE check.

    Special treatment for code segment is possible

    different Binary API

    Architecture independent (in theory...)

To force tail call elimination

GCC expand_call() (generate rtx from syntax tree)
```
    4463 lines   18527  145469 calls.c
```
CbC expand_cbc_goto() for code segment
```
     783 lines    2935   23651 cbc-goto.h       
```
Actually code segment implementation is easier.
It is free from complicated binary API (including
varargs. )

Parallel Assignment

Consider the next code,

    __code carg4(struct arg args0,struct arg args1,

         int i, int j,int k,int l)

    {

        goto carg5(args1,args0,j,k,l,i);

    }

GCC simply give up TCE in this case.

Check overrap and saving in stack is necessary.

Not yet done
On going.
```
    goto with environment , __return+, __environment
```
Investigating
```
    PowerPC, SPU is not working now.

    TCE it is not working well in this architecture

    unknown insn
```
Theoretically architecture independent, but...

Version dependency. GCC changes drastically even in
case of minor change. (4.2.1 to 4.2.3), currently 4.3.0
is not supported.

Bench Mark (function call version)

    f0(int i) {

        int k,j;

        k = 3+i; j = g0(i+3);

        return k+4+j;

    }

    g0(int i) {

        return h0(i+4)+i;

    }

    h0(int i) {

        return i+4;

    }

Artificial bench mark with heavy sub routing call overhead.

CPS transformed source (1/1)

    typedef char *stack;

    struct cont_interface { 

      // General Return Continuation

        __code (*ret)();

    };

    __code f(int i,stack sp) {

        int k,j;

        k = 3+i;

        goto f_g0(i,k,sp);

    }

    struct f_g0_interface {  

       // Specialized Return Continuation

        __code (*ret)();

        int i_,k_,j_;

    };

CPS transformed source (2/1)

    __code f_g1(int j,stack sp);

    __code f_g0(int i,int k,stack sp) { // Caller

        struct f_g0_interface *c = 

            (struct f_g0_interface *)(

         sp -= sizeof(struct f_g0_interface));

        c->ret = f_g1;

        c->k_ = k;

        c->i_ = i;

        goto g(i+3,sp);

    }

    __code f_g1(int j,stack sp) {  // Continuation 

        struct f_g0_interface *c = 

          (struct f_g0_interface *)sp;

        int k = c->k_;

        sp+=sizeof(struct f_g0_interface);

        c = (struct f_g0_interface *)sp;

        goto (c->ret)(k+4+j,sp);

    }

CPS transformed source (3/1)

    __code g_h1(int j,stack sp);

    __code g(int i,stack sp) { // Caller

        struct f_g0_interface *c = 

            (struct f_g0_interface *)(

           sp -= sizeof(struct f_g0_interface));

        c->ret = g_h1;

        c->i_ = i;

        goto h(i+3,sp);

    }

    __code g_h1(int j,stack sp) {  

        // Continuation 

        struct f_g0_interface *c = 

          (struct f_g0_interface *)sp;

        int i = c->i_;

        sp+=sizeof(struct f_g0_interface);

        c = (struct f_g0_interface *)sp;

        goto (c->ret)(j+i,sp);

    }

CPS transformed source (4/1)

This is awfully long...

    __code h(int i,stack sp) {

        struct f_g0_interface *c = 

          (struct f_g0_interface *)sp;

        goto (c->ret)(i+4,sp);

    }

    struct main_continuation { 

        // General Return Continuation

        __code (*ret)();

        __code (*main_ret)();

        void *env;

    };

    __code main_return(int i,stack sp) {

        if (loop-->0)

            goto f(233,sp);

        printf("#0103:%d
",i);

        goto (( (struct main_continuation *)sp)->main_ret)(0),

               ((struct main_continuation *)sp)->env;

    }

CPS transformed source (1/2)

    __code f2_1(int i,char *sp) {

        int k,j;

        k = 3+i;

        goto g2_1(k,i+3,sp);

    }

    __code g2_1(int k,int i,char *sp) {

        goto h2_11(k,i+4,sp);

    }

    __code f2_0_1(int k,int j,char *sp);

    __code h2_1_1(int i,int k,int j,char *sp) {

        goto f2_0_1(k,i+j,sp);

    }

CPS transformed source (2/2)

Slightly short.

    __code h2_11(int i,int k,char *sp) {

        goto h2_1_1(i,k,i+4,sp);

    }

    __code f2_0_1(int k,int j,char *sp) {

        goto (( (struct cont_interface *)

           sp)->ret)(k+4+j,sp);

    }

    __code main_return2_1(int i,stack sp) {

        if (loop-->0)

            goto f2_1(233,sp);

        printf("#0165:%d
",i);

        goto (( (struct main_continuation *)sp)->main_ret)(0),

               ((struct main_continuation *)sp)->env;

    }

Bench mark result

compiler ./conv1 1
function call
./conv1 2 (CPS 1) ./conv1 3 (CPS 2)

icro-C 8.97 2.19 2.73

CC 4.87 3.08 3.65

CC (+omit) 4.20 2.25 2.76

CC (+fast) 3.44 1.76 2.34

Micro-C, GCC bench mark (in sec)

Don't think it is general result. This is an artificial bench mark.

Function call is legacy

    Have to follow classical ABI

    Have to save many registers 

       for possible future use

Code segment is too fine grain

    It is not for human

Conclusion

Now we are going to have practical language with very fast continuation.

Available in https://sourceforge.jp/projects/cbc/

On going project

    Implementing goto with environment in GCC

    CbC based Many Core Kernel

         (Cerium Engine on PS3)

    CbC Self Model Checker

    C2CbC converter

Cerium Engine on PS3

    CbC based Many Core Kernel

	like SPURS

    Scene Graph based

    State Pattern

Cerium Engine on PS3

Sequential Implementation in CwC

Generate code segment

Execute each code segment on each Core
Implementing goto in gcc
Is it difficult?

No, I think it is a matter of time...

process	states/sec CbC	states/sec SPIN	sec JPF
5	38,984 / 0.66	94 / 0.008	3.98
6	159,299 / 3.79	212 / 0.01	7.33
7	845,529 / 27.59	494 / 0.03	26.29
8	3915,727 / 199.80	1,172 / 0.04	123.16

compiler	./conv1 1 function call	./conv1 2 (CPS 1)	./conv1 3 (CPS 2)
icro-C	8.97	2.19	2.73
CC	4.87	3.08	3.65
CC (+omit)	4.20	2.25	2.76
CC (+fast)	3.44	1.76	2.34

Implementing Continuation based language in GCC

Implementing Continuation based language in GCC

A Continuation based Language

Practical Programming Language

Continuation based C

What's this?

Intermix with C

Goto with Environment

Is this a Continuation?

What's good?

Scheduler example

Self Verification

Micro C implementation

GNU CC implementation

GCC structure

Tail Call Elimination

To force tail call elimination

To force tail call elimination

Parallel Assignment

Not yet done

Bench Mark (function call version)

CPS transformed source (1/1)

CPS transformed source (2/1)

CPS transformed source (3/1)

CPS transformed source (4/1)

CPS transformed source (1/2)

CPS transformed source (2/2)

Bench mark result

Function call is legacy

Conclusion

On going project

Cerium Engine on PS3

Cerium Engine on PS3

Implementing goto in gcc