adtpp adtpp provides support for algebraic data types
in C. Type information is put in a file with a .adt
extension and adtpp processes this to produce a C header
file that contains C type definitions, macros etc to be used in C
code. The following command takes example.adt and produces
example.h:
adtpp example.adt
If you prefer to read code with comments, you can take a look at example.adt and
example.c that uses it. There is also a
paper about the system that gives more background, motivation and
discussion of implementation, performance etc. Here we first describe how
simple algebraic data types are defined and used, then discuss polmorphic
(generic) types, higher order code (pointers to functions). We finish
with a quick reference for adtpp
features.
ADTs are defined using data declarations such as the
following:
data point {
Point(double, double);
}
data color {
Red();
Blue();
Green();
}
data tree {
Empty();
Node(long, tree, tree);
}
data quad_roots {
Noroot(); /* no root */
Allroot(); // everything is a root
Oneroot(double); // one root
Tworoot(double, double); // two roots
}
The first defines the type point that has a single
data constructor, Point with two arguments, both of
type double. Identifiers are case sensitive and here
we use upper case for data constructors by convention - any valid C
identifier can be used for both types and data constructors. 'data'
is a keyword for parsing by adtpp and white-space is not
significant. The arguments of data constructors in such definitions
should be valid types. Those not defined in the file (for example,
double) are assumed to be defined elsewhere in C. Parsing
compound C type expressions such as pointers, structs, arrays etc is not
supported; if this is desired typedef or #define
can be used in C code to give a single identifier name to the type.
As the other data declarations above show, types can be
defined recursively, there can be multiple data constructors in a type,
those that have no arguments must still be followed by parentheses and
C-style comments are supported.
ADT values are created/constructed by using the data constructors like
C functions calls (the C compiler checks the argument and result types as
it would with other function calls). For data constructors with arguments,
this dynamically allocates memory, using ADT_MALLOC (defined
as malloc by default). For each type mytype,
a function free_mytype is generated that frees the memory
allocated by the top level data constructor (using ADT_FREE;
it does not recursively process arguments of the data constructor). The
code can also easily be used with different memory management libraries,
including garbage collection (which we recommend). For example:
#include "example.h" // generated by "adtpp example.adt"
...
point origin; // defines variable origin of type point
tree t1, t2, t3;
origin = Point(0.0, 0.0); // creates a point and assigns it to origin
t1 = Empty();
t2 = Node(1, t1, Empty());
t3 = Node(2, t2, t2);
...
free_tree(t3); // free memory for t3 only
ADT values can be tested and deconstructed (to give access to the
arguments of data constructors) using several programming constructs
based on if-then-else and switch. For each data
constructor MyDC there is an if_MyDC(...)
and else_if_MyDC(...) macro. There is also an
else() and end_if() common to all ADTs.
Arguments of if_MyDC(...) must be an expression of
the correct type plus fresh variables for each of the arguments of
MyDC (the body of the if_MyDC(...) is a new
scope; variables with the same identifier elsewhere are unaffected).
Arguments of else_if_MyDC(...) are the same, except
the first argument is omitted - the whole if-then-else
construct branches on the data constructors of the single value given
in the if_MyDC(...). For example, in the following code,
else_if_Node(val,tl,tr) is implicitly testing t3
and introduces the three variables, whose scope ends at the final
end_if(). However, if_Node(tl,val,tll,tlr)
introduces another variable called val, whose scope ends
at the else().
if_Empty(t3)
printf("Tree is empty\n");
else_if_Node(val, tl, tr) // implicitly testing t3
printf("The root is %ld\n", val);
if_Node(tl, val, tll, tlr) // fresh variable val
printf("The left sub-tree root is ");
printf("%ld\n", val);
else()
printf("Left sub-tree is empty\n);
end_if()
end_if()
Braces are not required around compound statements (such as the
consecutive printf statements), semicolons may be used after
the various macros but are not required but the end_if()
is required (the macros use unbalanced braces and the C compiler will
issue a possibly cryptic syntax error message otherwise). The different
data constructors used in one if-then-else construct must be
of the same type (otherwise the C compiler will issue a possibly cryptic
type error message). There are also if_MyDC_ptr(...) and
else_if_MyDC_ptr(...) macros that bind the fresh variable to
pointers to the data constructor arguments; this is the only mechanism
for in-situ update of ADT values. For example, the root of a tree can
be incremented as follows:
if_Node_ptr(t3, valp, tlp, trp)
*valp++;
else()
printf("No root to increment\n");
end_if()
For each type mytype there is a
switch_mytype(v) macro (where v must have
this type) generated and there is an end_switch()
macro. For each data constructor MyDC there is are
case_MyDC(...) and case_MyDC_ptr(...) macros,
with fresh variables for each of the arguments of MyDC,
with scope limited to that case. There is also a default()
case that can be used. For example, the sum of the elements in a tree
can be computed using the following function:
long sum_tree(tree t) {
switch_tree(t)
case_Empty()
return 0;
case_Node(val, tl, tr)
return val + sum_tree(tl) + sum_tree(tr);
end_switch()
}
Polymorphic types such as "list of t", where t is a type variable
that can be instantiated to a type allows re-use of type definitions.
Similarly, functions that operate on any list facilitate code reuse.
In adtpp, polymorphism is supported but each instance of
a polymorphic type and function used in the program
must be declared and given a distinct
identifier in order for errors to be detected using the simple type
system of C. For the same reason, data constructors of different
instances of polymorphic types must have different names. Type
parameters are allowed in data declarations surrounded
by "angle brackets" to define polymorphic types and their instances
are declared using type declarations (examples below).
Type parameters must be instantiated by ADTs or C types that
are the size of a pointer (adt_int is a predefined integer
type of the correct size, the same as intptr_t; using
list<int> instead of list<adt_int>
may not work and there is no checking for this).
data pair<t1, t2> { Pair(t1, t2); }
data list<t> {
Nil();
Cons(t, list<t>);
}
type points = list<point>;
type colors = list<color>;
type ints = list<adt_int>;
type polygon = pair<color, points>;
type polygons = list<polygon>;
type pairs<t1, t2> = list<pair<t1, t2>>;
type polygons1 = pairs<color, points>;
Note that type declarations can introduce
new polymorphic types such as pairs. Also,
distinct identifiers can be given to equivalent types: both
polygons and polygons1 are equivalent to
list<pair<color,list<point>>>. With
adtpp these types can be used interchangably. For instances
of polymorphic types, adtpp essentially generates a copy
of the polymorphic definition with data constructor names having an
underscore followed by the type instance name appended. For example,
ints has data constructors Nil_ints and
Cons_ints and is defined as follows and macros etc are
generated as if this was defined explicitly:
data ints { // implicitly generated by adtpp
Nil_ints();
Cons_ints(adt_int, ints);
}
Data constructors from polymorphic type definitions can be used to define polymorphic C functions. For example, functions that compute the length of a list and concatenate two lists can be defined as follows:
int length(list xs) {
int len = 0;
while (1) {
if_Cons(xs, head, tail)
len++;
xs = tail;
else()
return len;
end_if()
}
}
list concat(list xs, list ys) {
if_Cons(xs, head, tail)
return Cons(head, concat(tail, ys));
else()
return ys;
end_if()
}
Polymorphic functions must be declared in the .adt file using
function declarations, giving the type of the arguments and
result, and every distinct type instance that is used must
be given a name with an instance declaration. Having
instances with different names allows the C compiler to perform
appropriate type checking; adtpp generates function
prototypes and definitions for the instances (which call the polymorphic
code). For example, the following declarations allow us to use a
function count_ints that takes an argument of type
ints and returns an int without duplicating
the length code or using error-prone and potentially unsafe
explicit casts.
function<t> int length(list<t>); instance count_ints = length<adt_int>; function<t> list<t> concat(list<t>, list<t>); instance join = concat<point>; instance concat_col = concat<color>;
With types and code that have multiple type parameters, we sometimes
have generic types that differ only in the order of the parameters.
For example, we maye have a polymorphic function swap
that takes an argument of type pair<t1,t2> and returns a result
of type pair<t2,t1>. Type variables in adtpp are
implemented by using multiple distinct generic types, adt_1,
adt_2, etc. (they have no supported operations). Polymorphic
types are simulated by instantiating type parameters by these generic
types, in order. For example, in the polymorphic type pair,
the arguments of the Pair data constructor are of type
adt_1 and adt_2, respectively. These generic
types can be used explicitly in adtpp declarations to
give distinct names to polymorhic types that differ only in their
parameters. For example:
function<t1, t2> pair<t2, t1> swap(pair<t1, t2>); instance swap_polygon = swap<points, color>; type pair_swp = pair<adt_2, adt_1>; // type returned by swap type polygon_swp = pair<points, color>; // type returned by swap_polygon
Functions can be included in atdpp data types and allowed
as arguments and results of declared functions, using the
func keyword: t_res func(t1, t2) means a
function taking two arguments of type t1 and
t2, respectively, and returning a result of type
t_res. For example, we can declare an analogue of the
Haskell polymorphic zipWith function (which takes a
two-argument function f plus two lists and applies
f
to elements of the two lists pair-wise and returns the list of results).
function<t1,t2,t3>
list<t3> zipWith(t3 func(t1,t2), list<t1>, list<t2>);
instance mk_polygons = zipWith<color, points, polygon>;
type pointss = list<points>; // needed for mk_polygons
type list_2 = list<adt_2>; // needed for zipWith
type list_3 = list<adt_3>; // needed for zipWith
Command to generate example.h from example.adt:
adtpp example.adt
data, type, function
and instance definitions/declarations in
example.adt:
// data types predefined by adtpp:
// numeric: adt_int, adt_uint, adt_char, adt_float
// character pointer: adt_string
// generic types for polymorphism: adt_1, adt_2, adt_3, ...
// Any other identifier used as a type and not defined in example.adt
// is assumed to be defined in C before example.h is included.
// All types defined by adtpp can be used to instantiate polymorphic types
// safely; other types used to instantiate polymorphic types *must* be
// the size of a pointer.
data tree { // defines algebraic data type tree
Empty(); // data constructor with no arguments
Node(long, tree, tree); // data constructor with three arguments
}
data list<t> { // polymorphic list
Nil();
Cons(t, list<t>);
}
// An instance of list<t> (note list<int> should not be used -
// arguments of polymorphic types should be guaranteed to be the same size
// as a pointer to void but adtpp cannot check this)
type ints = list<adt_int>;
// data ints { // implicitly generated by adtpp
// Nil_ints(); // type name appended to constructor names
// Cons_ints(adt_int, ints);
// }
data pair<t1, t2> { // polymorphic pair
Pair(<t1>, <t2>);
// data pair { // implicitly generated by adtpp
// Pair(adt_1, adt_2); // parameter(s) replaced by adt_1, adt_2 etc
// }
type pair_swp = pair<adt_2, adt_1>; // explicit use of adt_1 etc
// Higher order type instance: a list of (int to int) functions
type fns = list<int func(int)>;
function<t> int length(list<t>); // Declares a polymorphic function
instance count_ints = length<adt_int>; // An instance of length
Using the generated code in example.c:
// define identifier myCtype using typedef if you want to use this
// type in ADT definitions - adtpp can't parse complex C types
typedef struct mystruct{int f1[5]; char f2[8]} myCtype;
// redefine malloc/free primitives if desired
#define ADT_MALLOC(s) my_malloc(s)
#define ADT_FREE(s) my_free(s)
// include generated example.h (this includes <stdint.h> and
// <stdlib.h>)
#include "example.h"
// Assuming an ADT named myADT and data constructor MyDC(t1, t2) are
// defined in example.adt, the following functions can be used:
// MyDC(val1, val2) // create a value (allocates space)
// free_myADT(myADTval) // free space of top level data constructor
// Also, the following macros can be used to structure code:
// if_MyDC(myADTval, var1, var2)
// if_MyDC_ptr(myADTval, ptr1, ptr2)
// else_if_MyDC(var1, var2)
// else_if_MyDC_ptr(ptr1, ptr2)
// else()
// end_if()
// switch_myADT(myADTval)
// case_MyDC(var1, var2)
// case_MyDC_ptr(ptr1, ptr2)
// default()
// end_switch()
// Example code using ADT tree:
long sum_tree(tree t) {
switch_tree(t)
case_Empty()
return 0;
case_Node(val, tl, tr)
return val + sum_tree(tl) + sum_tree(tr);
end_switch()
}
// Example polymorphic code: return length of a list of anything
// Assuming length has been declared using a function declaration,
// and count_ints is declared as an instance, example.h will contain
// function prototypes for both functions plus a definition of
// count_ints that calls length
int length(list xs) {
int len = 0;
while (1) {
if_Cons(xs, head, tail)
len++;
xs = tail;
else()
return len;
end_if()
}
}
// Return sum of elements in a list of integers (note constructor names
// have _ints appended).
int sumlist(ints xs) {
if_Cons_ints(xs, head, tail)
return head + sumlist(tail)
else()
return 0;
end_if()
}