How to use `adtpp`

Overview

adtpp provides support for algebraic data types in C. Type information is put in a file with a .adt extension and adtpp processes this to produce a C header file that contains C type definitions, macros etc to be used in C code. The following command takes example.adt and produces example.h:

adtpp example.adt

If you prefer to read code with comments, you can take a look at example.adt and example.c that uses it. There is also a paper about the system that gives more background, motivation and discussion of implementation, performance etc. Here we first describe how simple algebraic data types are defined and used, then discuss polmorphic (generic) types, higher order code (pointers to functions). We finish with a quick reference for adtpp features.

Simple ADTs

ADTs are defined using data declarations such as the following:

data point {
    Point(double, double);
}
data color {
    Red();
    Blue();
    Green();
}
data tree {
    Empty();
    Node(long, tree, tree);
}
data quad_roots {
    Noroot();                /* no root */
    Allroot();               // everything is a root
    Oneroot(double);         // one root
    Tworoot(double, double); // two roots
}

The first defines the type point that has a single data constructor, Point with two arguments, both of type double. Identifiers are case sensitive and here we use upper case for data constructors by convention - any valid C identifier can be used for both types and data constructors. 'data' is a keyword for parsing by adtpp and white-space is not significant. The arguments of data constructors in such definitions should be valid types. Those not defined in the file (for example, double) are assumed to be defined elsewhere in C. Parsing compound C type expressions such as pointers, structs, arrays etc is not supported; if this is desired typedef or #define can be used in C code to give a single identifier name to the type. As the other data declarations above show, types can be defined recursively, there can be multiple data constructors in a type, those that have no arguments must still be followed by parentheses and C-style comments are supported.

ADT values are created/constructed by using the data constructors like C functions calls (the C compiler checks the argument and result types as it would with other function calls). For data constructors with arguments, this dynamically allocates memory, using ADT_MALLOC (defined as malloc by default). For each type mytype, a function free_mytype is generated that frees the memory allocated by the top level data constructor (using ADT_FREE; it does not recursively process arguments of the data constructor). The code can also easily be used with different memory management libraries, including garbage collection (which we recommend). For example:

#include "example.h" // generated by "adtpp example.adt"
...
    point origin;                // defines variable origin of type point
    tree t1, t2, t3;
    origin = Point(0.0, 0.0);    // creates a point and assigns it to origin
    t1 = Empty();
    t2 = Node(1, t1, Empty());
    t3 = Node(2, t2, t2);
...
    free_tree(t3);              // free memory for t3 only

ADT values can be tested and deconstructed (to give access to the arguments of data constructors) using several programming constructs based on if-then-else and switch. For each data constructor MyDC there is an if_MyDC(...) and else_if_MyDC(...) macro. There is also an else() and end_if() common to all ADTs. Arguments of if_MyDC(...) must be an expression of the correct type plus fresh variables for each of the arguments of MyDC (the body of the if_MyDC(...) is a new scope; variables with the same identifier elsewhere are unaffected). Arguments of else_if_MyDC(...) are the same, except the first argument is omitted - the whole if-then-else construct branches on the data constructors of the single value given in the if_MyDC(...). For example, in the following code, else_if_Node(val,tl,tr) is implicitly testing t3 and introduces the three variables, whose scope ends at the final end_if(). However, if_Node(tl,val,tll,tlr) introduces another variable called val, whose scope ends at the else().

if_Empty(t3)
    printf("Tree is empty\n");
else_if_Node(val, tl, tr)    // implicitly testing t3
    printf("The root is %ld\n", val);
    if_Node(tl, val, tll, tlr) // fresh variable val
        printf("The left sub-tree root is ");
        printf("%ld\n", val);
    else()
        printf("Left sub-tree is empty\n);
    end_if()
end_if()

Braces are not required around compound statements (such as the consecutive printf statements), semicolons may be used after the various macros but are not required but the end_if() is required (the macros use unbalanced braces and the C compiler will issue a possibly cryptic syntax error message otherwise). The different data constructors used in one if-then-else construct must be of the same type (otherwise the C compiler will issue a possibly cryptic type error message). There are also if_MyDC_ptr(...) and else_if_MyDC_ptr(...) macros that bind the fresh variable to pointers to the data constructor arguments; this is the only mechanism for in-situ update of ADT values. For example, the root of a tree can be incremented as follows:

if_Node_ptr(t3, valp, tlp, trp)
    *valp++;
else()
    printf("No root to increment\n");
end_if()

For each type mytype there is a switch_mytype(v) macro (where v must have this type) generated and there is an end_switch() macro. For each data constructor MyDC there is are case_MyDC(...) and case_MyDC_ptr(...) macros, with fresh variables for each of the arguments of MyDC, with scope limited to that case. There is also a default() case that can be used. For example, the sum of the elements in a tree can be computed using the following function:

long sum_tree(tree t) {
    switch_tree(t)
    case_Empty()
        return 0;
    case_Node(val, tl, tr)
        return val + sum_tree(tl) + sum_tree(tr);
    end_switch()
}

Polymorphic (generic) types and functions

Polymorphic types such as "list of t", where t is a type variable that can be instantiated to a type allows re-use of type definitions. Similarly, functions that operate on any list facilitate code reuse. In adtpp, polymorphism is supported but each instance of a polymorphic type and function used in the program must be declared and given a distinct identifier in order for errors to be detected using the simple type system of C. For the same reason, data constructors of different instances of polymorphic types must have different names. Type parameters are allowed in data declarations surrounded by "angle brackets" to define polymorphic types and their instances are declared using type declarations (examples below). Type parameters must be instantiated by ADTs or C types that are the size of a pointer (adt_int is a predefined integer type of the correct size, the same as intptr_t; using list<int> instead of list<adt_int> may not work and there is no checking for this).

data pair<t1, t2> { Pair(t1, t2); }
data list<t> {
    Nil();
    Cons(t, list<t>);
}

type points = list<point>;
type colors = list<color>;
type ints = list<adt_int>;
type polygon = pair<color, points>;
type polygons = list<polygon>;
type pairs<t1, t2> = list<pair<t1, t2>>;
type polygons1 = pairs<color, points>;

Note that type declarations can introduce new polymorphic types such as pairs. Also, distinct identifiers can be given to equivalent types: both polygons and polygons1 are equivalent to list<pair<color,list<point>>>. With adtpp these types can be used interchangably. For instances of polymorphic types, adtpp essentially generates a copy of the polymorphic definition with data constructor names having an underscore followed by the type instance name appended. For example, ints has data constructors Nil_ints and Cons_ints and is defined as follows and macros etc are generated as if this was defined explicitly:

data ints {      // implicitly generated by adtpp
    Nil_ints();
    Cons_ints(adt_int, ints);
}

Data constructors from polymorphic type definitions can be used to define polymorphic C functions. For example, functions that compute the length of a list and concatenate two lists can be defined as follows:

int length(list xs) {
    int len = 0;
    while (1) {
        if_Cons(xs, head, tail)
            len++;
            xs = tail;
        else()
            return len;
        end_if()
   }
}

list concat(list xs, list ys) {
    if_Cons(xs, head, tail)
        return Cons(head, concat(tail, ys));
    else()
        return ys;
    end_if()
}

Polymorphic functions must be declared in the .adt file using function declarations, giving the type of the arguments and result, and every distinct type instance that is used must be given a name with an instance declaration. Having instances with different names allows the C compiler to perform appropriate type checking; adtpp generates function prototypes and definitions for the instances (which call the polymorphic code). For example, the following declarations allow us to use a function count_ints that takes an argument of type ints and returns an int without duplicating the length code or using error-prone and potentially unsafe explicit casts.

function<t> int length(list<t>);
instance count_ints = length<adt_int>;

function<t> list<t> concat(list<t>, list<t>);
instance join = concat<point>;
instance concat_col = concat<color>;

Multiple type parameters

With types and code that have multiple type parameters, we sometimes have generic types that differ only in the order of the parameters. For example, we maye have a polymorphic function swap that takes an argument of type pair<t1,t2> and returns a result of type pair<t2,t1>. Type variables in adtpp are implemented by using multiple distinct generic types, adt_1, adt_2, etc. (they have no supported operations). Polymorphic types are simulated by instantiating type parameters by these generic types, in order. For example, in the polymorphic type pair, the arguments of the Pair data constructor are of type adt_1 and adt_2, respectively. These generic types can be used explicitly in adtpp declarations to give distinct names to polymorhic types that differ only in their parameters. For example:

function<t1, t2> pair<t2, t1> swap(pair<t1, t2>);
instance swap_polygon = swap<points, color>;

type pair_swp = pair<adt_2, adt_1>;     // type returned by swap
type polygon_swp = pair<points, color>; // type returned by swap_polygon

Higher order types and functions (pointers to functions)

Functions can be included in atdpp data types and allowed as arguments and results of declared functions, using the func keyword: t_res func(t1, t2) means a function taking two arguments of type t1 and t2, respectively, and returning a result of type t_res. For example, we can declare an analogue of the Haskell polymorphic zipWith function (which takes a two-argument function f plus two lists and applies f to elements of the two lists pair-wise and returns the list of results).

function<t1,t2,t3>
    list<t3> zipWith(t3 func(t1,t2), list<t1>, list<t2>);
instance mk_polygons = zipWith<color, points, polygon>;

type pointss = list<points>; // needed for mk_polygons
type list_2 = list<adt_2>;   // needed for zipWith
type list_3 = list<adt_3>;   // needed for zipWith

Quick reference

Command to generate example.h from example.adt:

adtpp example.adt

data, type, function and instance definitions/declarations in example.adt:

// data types predefined by adtpp:
// numeric: adt_int, adt_uint, adt_char, adt_float
// character pointer: adt_string
// generic types for polymorphism: adt_1, adt_2, adt_3, ...
// Any other identifier used as a type and not defined in example.adt
// is assumed to be defined in C before example.h is included.
// All types defined by adtpp can be used to instantiate polymorphic types
// safely; other types used to instantiate polymorphic types *must* be
// the size of a pointer.

data tree {                   // defines algebraic data type tree
    Empty();                  // data constructor with no arguments
    Node(long, tree, tree);   // data constructor with three arguments
}

data list<t> {                // polymorphic list
    Nil();
    Cons(t, list<t>);
}

// An instance of list<t> (note list<int> should not be used -
// arguments of polymorphic types should be guaranteed to be the same size
// as a pointer to void but adtpp cannot check this)
type ints = list<adt_int>;
// data ints {                // implicitly generated by adtpp
//     Nil_ints();            // type name appended to constructor names
//     Cons_ints(adt_int, ints);
// }

data pair<t1, t2> {            // polymorphic pair
    Pair(<t1>, <t2>);
// data pair {                // implicitly generated by adtpp
//     Pair(adt_1, adt_2);    // parameter(s) replaced by adt_1, adt_2 etc
// }

type pair_swp = pair<adt_2, adt_1>;     // explicit use of adt_1 etc

// Higher order type instance: a list of (int to int) functions
type fns = list<int func(int)>;

function<t> int length(list<t>); // Declares a polymorphic function
instance count_ints = length<adt_int>; // An instance of length

Using the generated code in example.c:

// define identifier myCtype using typedef if you want to use this
// type in ADT definitions - adtpp can't parse complex C types
typedef struct mystruct{int f1[5]; char f2[8]} myCtype;

// redefine malloc/free primitives if desired
#define ADT_MALLOC(s) my_malloc(s)
#define ADT_FREE(s) my_free(s)

// include generated example.h (this includes <stdint.h> and
// <stdlib.h>)
#include "example.h"

// Assuming an ADT named myADT and data constructor MyDC(t1, t2) are
// defined in example.adt, the following functions can be used:
// MyDC(val1, val2)          // create a value (allocates space)
// free_myADT(myADTval)      // free space of top level data constructor
// Also, the following macros can be used to structure code:
// if_MyDC(myADTval, var1, var2)
// if_MyDC_ptr(myADTval, ptr1, ptr2)
// else_if_MyDC(var1, var2)
// else_if_MyDC_ptr(ptr1, ptr2)
// else()
// end_if()
// switch_myADT(myADTval)
// case_MyDC(var1, var2)
// case_MyDC_ptr(ptr1, ptr2)
// default()
// end_switch()

// Example code using ADT tree:
long sum_tree(tree t) {
    switch_tree(t)
    case_Empty()
        return 0;
    case_Node(val, tl, tr)
        return val + sum_tree(tl) + sum_tree(tr);
    end_switch()
}

// Example polymorphic code: return length of a list of anything
// Assuming length has been declared using a function declaration,
// and count_ints is declared as an instance, example.h will contain
// function prototypes for both functions plus a definition of
// count_ints that calls length
int length(list xs) {
    int len = 0;
    while (1) {
        if_Cons(xs, head, tail)
            len++;
            xs = tail;
        else()
            return len;
        end_if()
   }
}

// Return sum of elements in a list of integers (note constructor names
// have _ints appended).
int sumlist(ints xs) {
    if_Cons_ints(xs, head, tail)
        return head + sumlist(tail)
    else()
        return 0;
    end_if()
}

How to use adtpp