adtpp
adtpp
provides support for algebraic data types
in C. Type information is put in a file with a .adt
extension and adtpp
processes this to produce a C header
file that contains C type definitions, macros etc to be used in C
code. The following command takes example.adt
and produces
example.h
:
adtpp example.adt
If you prefer to read code with comments, you can take a look at example.adt and
example.c that uses it. There is also a
paper about the system that gives more background, motivation and
discussion of implementation, performance etc. Here we first describe how
simple algebraic data types are defined and used, then discuss polmorphic
(generic) types, higher order code (pointers to functions). We finish
with a quick reference for adtpp
features.
ADTs are defined using data
declarations such as the
following:
data point { Point(double, double); } data color { Red(); Blue(); Green(); } data tree { Empty(); Node(long, tree, tree); } data quad_roots { Noroot(); /* no root */ Allroot(); // everything is a root Oneroot(double); // one root Tworoot(double, double); // two roots }
The first defines the type point
that has a single
data constructor, Point
with two arguments, both of
type double
. Identifiers are case sensitive and here
we use upper case for data constructors by convention - any valid C
identifier can be used for both types and data constructors. 'data'
is a keyword for parsing by adtpp
and white-space is not
significant. The arguments of data constructors in such definitions
should be valid types. Those not defined in the file (for example,
double
) are assumed to be defined elsewhere in C. Parsing
compound C type expressions such as pointers, structs, arrays etc is not
supported; if this is desired typedef
or #define
can be used in C code to give a single identifier name to the type.
As the other data
declarations above show, types can be
defined recursively, there can be multiple data constructors in a type,
those that have no arguments must still be followed by parentheses and
C-style comments are supported.
ADT values are created/constructed by using the data constructors like
C functions calls (the C compiler checks the argument and result types as
it would with other function calls). For data constructors with arguments,
this dynamically allocates memory, using ADT_MALLOC
(defined
as malloc
by default). For each type mytype
,
a function free_mytype
is generated that frees the memory
allocated by the top level data constructor (using ADT_FREE
;
it does not recursively process arguments of the data constructor). The
code can also easily be used with different memory management libraries,
including garbage collection (which we recommend). For example:
#include "example.h" // generated by "adtpp example.adt" ... point origin; // defines variable origin of type point tree t1, t2, t3; origin = Point(0.0, 0.0); // creates a point and assigns it to origin t1 = Empty(); t2 = Node(1, t1, Empty()); t3 = Node(2, t2, t2); ... free_tree(t3); // free memory for t3 only
ADT values can be tested and deconstructed (to give access to the
arguments of data constructors) using several programming constructs
based on if-then-else
and switch
. For each data
constructor MyDC
there is an if_MyDC(...)
and else_if_MyDC(...)
macro. There is also an
else()
and end_if()
common to all ADTs.
Arguments of if_MyDC(...)
must be an expression of
the correct type plus fresh variables for each of the arguments of
MyDC
(the body of the if_MyDC(...)
is a new
scope; variables with the same identifier elsewhere are unaffected).
Arguments of else_if_MyDC(...)
are the same, except
the first argument is omitted - the whole if-then-else
construct branches on the data constructors of the single value given
in the if_MyDC(...)
. For example, in the following code,
else_if_Node(val,tl,tr)
is implicitly testing t3
and introduces the three variables, whose scope ends at the final
end_if()
. However, if_Node(tl,val,tll,tlr)
introduces another variable called val
, whose scope ends
at the else()
.
if_Empty(t3) printf("Tree is empty\n"); else_if_Node(val, tl, tr) // implicitly testing t3 printf("The root is %ld\n", val); if_Node(tl, val, tll, tlr) // fresh variable val printf("The left sub-tree root is "); printf("%ld\n", val); else() printf("Left sub-tree is empty\n); end_if() end_if()
Braces are not required around compound statements (such as the
consecutive printf
statements), semicolons may be used after
the various macros but are not required but the end_if()
is required (the macros use unbalanced braces and the C compiler will
issue a possibly cryptic syntax error message otherwise). The different
data constructors used in one if-then-else
construct must be
of the same type (otherwise the C compiler will issue a possibly cryptic
type error message). There are also if_MyDC_ptr(...)
and
else_if_MyDC_ptr(...)
macros that bind the fresh variable to
pointers to the data constructor arguments; this is the only mechanism
for in-situ update of ADT values. For example, the root of a tree can
be incremented as follows:
if_Node_ptr(t3, valp, tlp, trp) *valp++; else() printf("No root to increment\n"); end_if()
For each type mytype
there is a
switch_mytype(v)
macro (where v
must have
this type) generated and there is an end_switch()
macro. For each data constructor MyDC
there is are
case_MyDC(...)
and case_MyDC_ptr(...)
macros,
with fresh variables for each of the arguments of MyDC
,
with scope limited to that case. There is also a default()
case that can be used. For example, the sum of the elements in a tree
can be computed using the following function:
long sum_tree(tree t) { switch_tree(t) case_Empty() return 0; case_Node(val, tl, tr) return val + sum_tree(tl) + sum_tree(tr); end_switch() }
Polymorphic types such as "list of t", where t is a type variable
that can be instantiated to a type allows re-use of type definitions.
Similarly, functions that operate on any list facilitate code reuse.
In adtpp
, polymorphism is supported but each instance of
a polymorphic type and function used in the program
must be declared and given a distinct
identifier in order for errors to be detected using the simple type
system of C. For the same reason, data constructors of different
instances of polymorphic types must have different names. Type
parameters are allowed in data
declarations surrounded
by "angle brackets" to define polymorphic types and their instances
are declared using type
declarations (examples below).
Type parameters must be instantiated by ADTs or C types that
are the size of a pointer (adt_int
is a predefined integer
type of the correct size, the same as intptr_t
; using
list<int>
instead of list<adt_int>
may not work and there is no checking for this).
data pair<t1, t2> { Pair(t1, t2); } data list<t> { Nil(); Cons(t, list<t>); } type points = list<point>; type colors = list<color>; type ints = list<adt_int>; type polygon = pair<color, points>; type polygons = list<polygon>; type pairs<t1, t2> = list<pair<t1, t2>>; type polygons1 = pairs<color, points>;
Note that type
declarations can introduce
new polymorphic types such as pairs
. Also,
distinct identifiers can be given to equivalent types: both
polygons
and polygons1
are equivalent to
list<pair<color,list<point>>>
. With
adtpp
these types can be used interchangably. For instances
of polymorphic types, adtpp
essentially generates a copy
of the polymorphic definition with data constructor names having an
underscore followed by the type instance name appended. For example,
ints
has data constructors Nil_ints
and
Cons_ints
and is defined as follows and macros etc are
generated as if this was defined explicitly:
data ints { // implicitly generated by adtpp Nil_ints(); Cons_ints(adt_int, ints); }
Data constructors from polymorphic type definitions can be used to define polymorphic C functions. For example, functions that compute the length of a list and concatenate two lists can be defined as follows:
int length(list xs) { int len = 0; while (1) { if_Cons(xs, head, tail) len++; xs = tail; else() return len; end_if() } } list concat(list xs, list ys) { if_Cons(xs, head, tail) return Cons(head, concat(tail, ys)); else() return ys; end_if() }
Polymorphic functions must be declared in the .adt
file using
function
declarations, giving the type of the arguments and
result, and every distinct type instance that is used must
be given a name with an instance
declaration. Having
instances with different names allows the C compiler to perform
appropriate type checking; adtpp
generates function
prototypes and definitions for the instances (which call the polymorphic
code). For example, the following declarations allow us to use a
function count_ints
that takes an argument of type
ints
and returns an int
without duplicating
the length
code or using error-prone and potentially unsafe
explicit casts.
function<t> int length(list<t>); instance count_ints = length<adt_int>; function<t> list<t> concat(list<t>, list<t>); instance join = concat<point>; instance concat_col = concat<color>;
With types and code that have multiple type parameters, we sometimes
have generic types that differ only in the order of the parameters.
For example, we maye have a polymorphic function swap
that takes an argument of type pair<t1,t2> and returns a result
of type pair<t2,t1>. Type variables in adtpp
are
implemented by using multiple distinct generic types, adt_1
,
adt_2
, etc. (they have no supported operations). Polymorphic
types are simulated by instantiating type parameters by these generic
types, in order. For example, in the polymorphic type pair
,
the arguments of the Pair
data constructor are of type
adt_1
and adt_2
, respectively. These generic
types can be used explicitly in adtpp
declarations to
give distinct names to polymorhic types that differ only in their
parameters. For example:
function<t1, t2> pair<t2, t1> swap(pair<t1, t2>); instance swap_polygon = swap<points, color>; type pair_swp = pair<adt_2, adt_1>; // type returned by swap type polygon_swp = pair<points, color>; // type returned by swap_polygon
Functions can be included in atdpp
data types and allowed
as arguments and results of declared functions, using the
func
keyword: t_res func(t1, t2)
means a
function taking two arguments of type t1
and
t2
, respectively, and returning a result of type
t_res
. For example, we can declare an analogue of the
Haskell polymorphic zipWith
function (which takes a
two-argument function f
plus two lists and applies
f
to elements of the two lists pair-wise and returns the list of results).
function<t1,t2,t3> list<t3> zipWith(t3 func(t1,t2), list<t1>, list<t2>); instance mk_polygons = zipWith<color, points, polygon>; type pointss = list<points>; // needed for mk_polygons type list_2 = list<adt_2>; // needed for zipWith type list_3 = list<adt_3>; // needed for zipWith
Command to generate example.h
from example.adt
:
adtpp example.adt
data
, type
, function
and instance
definitions/declarations in
example.adt
:
// data types predefined by adtpp: // numeric: adt_int, adt_uint, adt_char, adt_float // character pointer: adt_string // generic types for polymorphism: adt_1, adt_2, adt_3, ... // Any other identifier used as a type and not defined in example.adt // is assumed to be defined in C before example.h is included. // All types defined by adtpp can be used to instantiate polymorphic types // safely; other types used to instantiate polymorphic types *must* be // the size of a pointer. data tree { // defines algebraic data type tree Empty(); // data constructor with no arguments Node(long, tree, tree); // data constructor with three arguments } data list<t> { // polymorphic list Nil(); Cons(t, list<t>); } // An instance of list<t> (note list<int> should not be used - // arguments of polymorphic types should be guaranteed to be the same size // as a pointer to void but adtpp cannot check this) type ints = list<adt_int>; // data ints { // implicitly generated by adtpp // Nil_ints(); // type name appended to constructor names // Cons_ints(adt_int, ints); // } data pair<t1, t2> { // polymorphic pair Pair(<t1>, <t2>); // data pair { // implicitly generated by adtpp // Pair(adt_1, adt_2); // parameter(s) replaced by adt_1, adt_2 etc // } type pair_swp = pair<adt_2, adt_1>; // explicit use of adt_1 etc // Higher order type instance: a list of (int to int) functions type fns = list<int func(int)>; function<t> int length(list<t>); // Declares a polymorphic function instance count_ints = length<adt_int>; // An instance of length
Using the generated code in example.c
:
// define identifier myCtype using typedef if you want to use this // type in ADT definitions - adtpp can't parse complex C types typedef struct mystruct{int f1[5]; char f2[8]} myCtype; // redefine malloc/free primitives if desired #define ADT_MALLOC(s) my_malloc(s) #define ADT_FREE(s) my_free(s) // include generated example.h (this includes <stdint.h> and // <stdlib.h>) #include "example.h" // Assuming an ADT named myADT and data constructor MyDC(t1, t2) are // defined in example.adt, the following functions can be used: // MyDC(val1, val2) // create a value (allocates space) // free_myADT(myADTval) // free space of top level data constructor // Also, the following macros can be used to structure code: // if_MyDC(myADTval, var1, var2) // if_MyDC_ptr(myADTval, ptr1, ptr2) // else_if_MyDC(var1, var2) // else_if_MyDC_ptr(ptr1, ptr2) // else() // end_if() // switch_myADT(myADTval) // case_MyDC(var1, var2) // case_MyDC_ptr(ptr1, ptr2) // default() // end_switch() // Example code using ADT tree: long sum_tree(tree t) { switch_tree(t) case_Empty() return 0; case_Node(val, tl, tr) return val + sum_tree(tl) + sum_tree(tr); end_switch() } // Example polymorphic code: return length of a list of anything // Assuming length has been declared using a function declaration, // and count_ints is declared as an instance, example.h will contain // function prototypes for both functions plus a definition of // count_ints that calls length int length(list xs) { int len = 0; while (1) { if_Cons(xs, head, tail) len++; xs = tail; else() return len; end_if() } } // Return sum of elements in a list of integers (note constructor names // have _ints appended). int sumlist(ints xs) { if_Cons_ints(xs, head, tail) return head + sumlist(tail) else() return 0; end_if() }