Designing Cloje's host interop API

Lately I have been working on designing Cloje's host interop API. I had the outlines of a design for the API already, but as I started filling in the details and thinking about different use cases, I realized that my simple design would have had some serious problems in actual use. The host interop API was scheduled for Cloje 0.2, but I have decided to push it back to 0.3 to give me time to get it right.

I wrote this post mostly for my own benefit, to help me focus and organize my thoughts. But, it might be interesting to anyone interested in API design.

What is host interop?

For those not familiar, host interop (short for "interoperability" or "interoperaiton") refers to the ability for code written in Cloje to work with code written in the host language (Scheme, and perhaps Common Lisp in the future).

Clojure has a similar concept, Java interop, which allows Clojure code to use classes and functions available in Java, including the countless libraries created by members of the Java community. It also makes it easier for Clojure code to be integrated into existing Java-based systems.

Cloje, on the other hand, is not based on Java, so it does not have Java interop. Instead, it has interop with Scheme/Lisp. But Cloje's host interop is quite different from Clojure's Java interop. Cloje is much more semantically similar to Scheme/Lisp, than Clojure is to Java. For example, Cloje doesn't need support for instantiating classes or calling methods, because in Scheme and Lisp, such tasks are accomplished by calling normal functions, or perhaps an occasional macro — the same semantic building blocks already used by Cloje.

So, the main necessity that Cloje's host interop API must address, is converting data from Cloje types to host types, and vice versa.

At first glance, this seems like a fairly easy problem to solve, especially compared to the complexity of Clojure's Java interop. But when I started fleshing out the details, I encountered some dilemmas.

Non-analogous types

One obvious dilemma is: How do we convert types that have no analogue in the other language?

One example of this is Cloje's hash set data structure. Sets are not part of either the Scheme or Common Lisp standards, nor have most implementations independently added them (Racket being a notable exception). So if you have some data in a Cloje hash set, and you want to pass that data to a function written in the host language, you need a way to convert it to a data structure that the host language understands. (But which one? What if another Cloje type also converts to the same host type?)

This dilemma also applies to types that are not standard, and not universally supported. For example, neither hash tables nor keywords (as a type, rather than a concept) are a standard part of R5RS Scheme. Many implementations provide analogous types, but some implementations do not. So I have to consider the possibility that certain conversions (e.g. from a Cloje map to a host hash table) would be impossible on some host implementations.

It goes in the other direction, too. Pretty much every implementation of Scheme or Common Lisp includes non-standard types that have no analogue in Cloje. Plus, most implementations allow users to define custom types (e.g. SRFI-9 record types). So, it is impossible for Cloje's API to anticipate or build in support for every possible type.

Analogous types

I just described how types that do not have an analogous type present a dilemma. But it turns out that types that do have an analogue also present a dilemma! This is a subtler issue, but with very important implications for the design of the API.

The key issue is that Cloje has data types that are similar or equivalent to data types defined in the host language, but might be implemented differently (either now or in the future).

For example, Cloje has vectors, and Scheme has vectors. Early versions of Cloje have been directly using Scheme vectors, to keep things simple. But, future versions of Cloje may implement persistent immutable vectors, similar to those found in Clojure. If that happens, Cloje vectors will be distinct from host vectors, and it would then be necessary to convert between the two types.

Should I start preparing for the possibility that the implementation may change in the future, or continue to directly use host types until the implementation actually does change?

Continuing to directly use host types would have less computational overhead, and less design and implementation work for me, and less glue code for users to write — for now. But it would lead to even greater pain down the road if the implementations of any types change, because user code may have been written under the assumption that, for example, you can pass a Cloje vector to a function expecting a host vector without doing any conversion. In other words, changing the underlying implementation of any type would be a huge breaking change.

So, I'm certain I should define new types, so that users get in the habit of performing conversion. These new types could (for now) simply be wrappers around the existing host type, so that Cloje types cannot be accidentally used in place of host types. They must be converted anyway, which means I could change the underlying implementation later without breaking anything.

But which types should have wrappers? Vectors and maps are clear candidates, because I can already envision that a new implementation may be created. Lists are unclear, because although Clojure does have persistent immutable lists, normal host lists are central to the normal operation of Scheme and Lisp (because of homoiconicity and macros and such). Strings are also unclear; I'd like to make them immutable, but string literals would be host strings unless I overwrote the double-quote syntax, and that would probably break tons of stuff (e.g. any host code or libraries that use string literals).

There are some types that it doesn't make sense to wrap. For example, wrapping integers and floats would have a significant negative effect on performance, and there would be no real benefit to wrapping them. (I currently have no plans to emulate Clojure's Long/BigInt or Double/BigDecimal dichotomies.) Wrapping functions would be even worse: they would no longer be callable in the normal way, which pretty well defeats the purpose.

So, wrapping no types would be bad (it would lead to user pain further down the road), and wrapping all types would be bad (it would lead to semantic and performance issues). So I will probably end up wrapping only the few types where there is a clear benefit or established future plan, such as vectors, maps, and perhaps lists and/or strings.

But even if I don't create wrapper types for everything, should I still provide "conversion" functions (e.g. to "convert" a Cloje integer to a host integer, even though they are the same thing), and recommend people use them, just in case I need to change how they are implementated in the future? That would probably just be counterproductive overengineering.

If I don't provide conversion functions for certain types, I need to guarantee that Cloje will directly use the host types for the foreseeable future. To do otherwise would be irresponsible: if the implementations of types changed, user code might break, and without conversion functions, the user would have had no means to avoid breakage.

Consolidated versus fine-grained

Another dilemma is whether the API should offer a small number of conversion functions that handle many types (I'll refer to this as a "consolidated" design, for lack of a better term), or a larger number of functions that each handle only a few types (I'll call this a "fine-grained" design).

My initial outline for the API was very consolidated. It had only two main functions: cloje->host, which would convert any supported Cloje type to a corresponding host type, and host->cloje, which would convert any supported host type to a corresponding Cloje type.

In theory, the advantages of that design are that an API with fewer functions is easier to learn and remember, and more convenient to use. You just put any Cloje object in one end, and voila!, an equivalent object with a sensible host type comes out the other end, or vice versa. You could convert entire collections, even heterogenous ones, just by mapping the same conversion function over every element, or recursively convert a nested data structure by walking it with just one function.

That design might make sense if you have a small number of types, and if it's obvious what each type should be converted to. In such a case, it's fairly easy to implement a consolidated, "Do What I Mean" kind of API.

But reality is much messier. "Do What I Mean" is not always obvious. I already described how some types have no analogues. If you passed a Cloje set to cloje->host, what should come out the other end? A list? A vector? A hash table? Or should it throw an error? I could give solid arguments in favor of each of those possibilities, explaining why it makes the most sense in certain situations. But, if I create a consolidated API, I have to choose just one of these behaviors, and changing the behavior later would be a breaking change.

No matter what I decide, there will be situations where the user wants a different behavior. And if all I provide are large, one-size-must-fit-all functions, the user wouldn't have very good tools to customize the conversion behavior.

A consolidated approach is also more fragile. Suppose I decide that Cloje maps should be converted to host hash tables. Now what happens if I want Cloje to run on a Scheme implementation that does not have hash tables? Either I would need to make the API behave differently on that implementation than on all other implementations, or I would need to change the API so that it does not return hash tables on any implementation (which would break backwards compatibility).

Since I anticipate that this situation might occur, if I designed a consolidated API, I would probably design it from the start such that it never returns hash tables, just to avoid later issues. But that would make the API much less useful for the majority of users, who would be using implementations that have hash tables. It reduces the API functionality to the lowest common denominator.

An alternative to a consolidated API, would be a fine-grained API, which provides many functions, where each function is smaller and simpler, targetted to a specific conversion.

In this design, there could be a function that converts a Cloje set to a host list, another function that converts a Cloje set to a host vector, and so on. I could provide a function that converts a Cloje map to a host hash table, with the caveat that it will throw an error if the host implementation does not support hash tables. I could also provide a function that converts a Cloje map to a host list of key-value pairs (aka an association list), as a universally-supported alternative. If new types are added to Cloje, or new needs become apparent, I can simply add more conversion functions, without changing the semantics of the existing functions, and thus without breaking existing user code.

This design gives more power, and responsibility, to the user. Instead of trying to provide a one-size-fits-all solution, it provides the building blocks for users to create custom-tailored solutions to fit the problem they are trying to solve.

The downsides, of course, are that this is more work for the user, and that the API will be cluttered with potentially dozens of functions. Instead of just cloje->host and host->cloje, you would have cloje-set->host-list, cloje-set->host-vector, cloje-map->host-hash-table, host-vector->cloje-vector, host-hash-table->cloje-map, and so on. Not very appealing to look at, I'll admit, but much more semantically stable, robust, and extensible.

After some thought, I've realized that many of the dilemmas I was having are either caused by or exacerbated by my decision to have a consolidated API. With a fine-grained API, I don't have to create one or two functions that can handle every possible situation. It's okay to make functions that just do one thing, with predictable behavior and well-defined limitations.

Deep versus shallow

Another dilemma I was having is whether conversion functions should be deep or shallow. In other words, if the user converts a Cloje list of Cloje vectors, should the Cloje vectors also be converted (resulting in a host list of host vectors), or should only the top-level Cloje list be converted (resulting in a host list of Cloje vectors)?

In a consolidated design, where you have only a single function like cloje->host, it seems counterintuitive that the end result could still contain Cloje objects. That led me to pursue a deep strategy. But deep conversion can get very expensive, and sometimes you don't need or want it.

For example, suppose there is a host function that does really efficient in-place sorting of host vectors. Suppose the function allows you to specify a custom comparison function, so the contents of the vector can be any type, as long as they match the comparison function. Now suppose you have a Cloje vector containing many complex, deeply-nested objects, and you want to sort them.

Ideally, it should be possible to shallowly convert the Cloje vector to a host vector, pass that host vector into the sort function, then shallowly convert the result back into a Cloje vector. Done in a shallow fashion, there would be no need to convert those complex objects.

But if conversion was always deep, you would convert all the complex, deeply nested objects not just once, but twice: once with cloje->host before you sorted them, and then in the opposite direction with host->cloje after you sorted them. Not only would those conversions be expensive, they might also be lossy: the resulting objects (or nested objects deep within them) might not end up as the same type as you started with. And even if they ended up as the same types, they would be copies, not identical? to the originals.

Deep conversion also gets really messy when it comes to maps and hash tables. You have to convert not only the values, but also the keys. And it is entirely possible that two keys that were previously different, will end up the same after conversion, resulting in a key collision and data loss (or an error).

For example, there was a time where I had decided that Cloje sets would be converted to host lists. But what if you had a map where one key was (1 2), and another key was the set #{1 2}? Both keys could possibly end up as the host list (1 2), so one entry might overwrite the other, and because the order of maps is unspecified, it would be unpredictable which entry would win. Even worse, because the order of sets is also unspecified, the set might sometimes be converted to the host list (2 1), which would not cause a key collision. Thus the resulting hash table might contain one entry, the other entry, or both entries, and you would have no way of predicting or controlling the outcome. What a nightmare that would be.

While I was still considering a consolidated API design, I was thinking of offering deep and shallow variants of cloje->host and host->cloje. When cloje->host would convert a collection, it would recursively call cloje->host on each element.

But with a fine-grained API design, a different approach is called for. Suppose cloje-vector->host-vector supported deep conversion. It can't just recursively call itself on each element, because those elements might not be vectors. And I can't pre-program in logic for handling different types of elements, because then we're right back to the problems of a consolidated API.

Rather, I could make it so that collection conversion functions would optionally accept a function to be applied to the elements. The function would default to identity, so the conversion would be shallow by default. To perform deep conversion, the user would provide a function that converts the element, according to whatever rules are appropriate to the situation.

Like the decision to pursue a fine-grained API design, this decision would shift control to the user, giving them more power, but also more responsibility. The API would provide the pieces, but it would be up to the user to put them together the way they want.

Accommodating versus strict

Another dilemma is how accommodating or strict the API should be.

This was more of a dilemma when I was pursuing a consolidated API design. If the user passed a implementation-specific type or a user-defined record type to host->cloje, what should happen? Should it throw an error? Or return the object as-is? This decision would be even more significant when performing recursive conversion, because being strict would make the API blow up if you had an unsupported object anywhere, even several levels deep inside a nested data structure.

Even with a fine-grained, non-recursive API design, this is still an issue to consider. For example, if you have a function that converts a Cloje vector into a host vector, what should it do if given a host vector? Should it throw an error, because it accepts only Cloje vectors? Or should it return the host vector as-is (or, as discussed below, create a copy), because it is already the desired type?

This decision is affected by whether or not types are wrapped. If a Cloje vector is exactly the same type as a host vector, it is impossible to make a function that accepts a Cloje vector but rejects a host vector. The function would necessarily be accommodating, and that would affect the API design even if the two types are made distinct in the future.

If the API is strict, I have to create wrapper types, and users must have a way to distinguish between Cloje types and their host analogues. That means, for example, that either Cloje's vector? must be made to return false for host vectors, or I need to add new, more specific type predicates, such as cloje-vector? and/or host-vector?.

Copying versus non-copying

Along similar lines, there is the question of whether to copy objects or return them as-is, in cases when no conversion is necessary. This affects accommodating APIs, where the user might pass a mutable object to a conversion function that would return the same type. A specific example would be passing a host vector to cloje->host or cloje-vector->host-vector, assuming they are programmed to return a host vector in such a case.

On the one hand, returning the vector as-is (not copying) would potentially be more efficient, by avoiding unnecessary allocation. But it would make the semantics less clear, because sometimes the return value would be safe to modify, and sometimes it would not be. And it would be potentially dangerous, if the user assumed that the return value would be safe to modify when actually it was not.

Note that this isn't much of a dilemma when converting between two distinct types. In some cases, allocating a new data structure is inherently necessary, such as when converting from a persistent immutable vector (implemented as a tree) to a (flat) host vector. And in the case of wrapped data types where the underlying type is mutable, allocating a new data structure is necessary for safety and correctness.

It is also not much of a dilemma when dealing with immutable types. If the type is immutable, and you're returning the same type, there is no safety benefit to allocating a new structure. The issue is really only when dealing with mutable types where you are returning the same type as was originally given.

This also ties into the recursive versus shallow dilemma. If the user passes in a deeply nested data structure, the performance impact of recursively copying everything is much greater, as is the potential for accidental mutation if you re-use nested data structures. There are important trade-offs to be made there, but it must be the user's decision, based on their specific situation.

Final thoughts

So, thirty-five hundred words later, I have something resembling a plan:

None of this is set in stone, mind. There are still some details to work out, and there may be other dilemmas or implications that haven't become apparent yet, and which warrant a change of plan.

[Update 2015-07-14: There is now a follow-up post regarding Cloje types vs host types, and implicit vs explicit conversion.]

Previous post: Cloje 0.2 Status Update Next post: Cloje 0.2.0 Released