Implementation

Razor's model-finding algorithm is a variant of the chase, an algorithm for constructing models for geometric theories. Given an input first-order theory ๐“ฃ, Razor applies a standard syntactic transformation to convert ๐“ฃ to a Geometric Normal Form (GNF). The result of this transformation is a geometric theory ๐“– that is equisatisfiable with ๐“ฃ.

Note: Transformation to GNF consists of transforming the input theory to a Conjunctive Normal Form (CNF) followed by rearranging the conjuncts of the resulting CNF into an implication, where negative and positive conjuncts form the implication's premise and consequence respectively. The implication is semantically equivalent to a geometric sequence and is treated as such.

It's noteworthy that conversion to CNF involves Skolemization, a transformation by which existential quantifiers of the input theory are replaced with fresh "Skolem functions". Therefore, the resulting GNF is a theory over an extension of the original vocabulary over which the input theory is defined.

Let ๐“– be an input geometric theory. The essense of a chase-step on a model ๐•ž is to find a sequent ๐œ‘ โŠข๐˜… ๐œ“ of ๐“– with an assignment from the variables in ๐˜… to the domain of ๐•ž such that ๐œ‘ is true and ๐œ“ is false in ๐•ž. A naive implementation of the chase would iterate over all sequents of ๐“– and examine every assignment from ๐˜… to the domain of ๐•ž. This would lead to |๐˜…||๐•ž| valuations of ๐œ‘ and ๐œ“ in ๐•ž where |๐˜…| and |๐•ž| denote the size of ๐˜… and the domain size of ๐•ž respectively. Such a naive implementation is extremely inefficient for practical model-finding use-cases.

A practical implementation of the chase must address a number of algorithmic challenges: it must provide efficient data-structures for querying models and keeping track of various search branches; it should implement effective heuristics for choosing a search branch and for selecting a sequent to evaluate; and most importantly, it must implement a fast solution for finding assignments from the free variables in ๐˜… to the elements of ๐•ž.

Hello Codd!

Codd's theorem is a well known result from database theory that establishes equivalence between relational algebra and relational calculus thus first-order logic. Razor takes advantage of this equivalence to implement the chase efficiently with the help of database theory. From a database perspective, a model ๐•ž constructed by the chase can be thought of as a monotonically growing database with relational tables for each predicate of the input theory ๐“–. Consequently, positive existential formulae are expressed queries with conjunction (โˆง), disjunction (โˆจ), and existential quantifier (โˆƒ) acting as relational join (โ‹ˆ), set union (โˆช), and projection (โˆ). Finally, a geometric sequent ๐œ‘ โŠข๐˜… ๐œ“ is evaluated by a query ๐›ท - ๐›น, the set difference of the relational expressions corresponding to ๐œ‘ and ๐œ“. Intuitively, evaluating ๐›ท - ๐›น in ๐•ž results in a set of tuples that assign values to the variables in ๐˜… such that ๐œ‘ is true and ๐œ“ is false, as required by the chase-step. After evaluating ๐›ท - ๐›น, the chase-step proceeds by inserting the resulting tuples in the relations mentioned by ๐›น, thus, making ๐œ“ true in ๐•ž.

Razor's implementation of the chase uses codd, a minimal in-memory database with relational expressions as queries. The underlying database abstraction would enable Razor to take advantage of many advances in database theory and particularly streaming databases to run more efficiently. For example, codd supports incrementally updated materialized views that are used to avoid full re-evaluation of expressions during a run of the chase as the database (i.e., the model ๐•ž) grows.

Relationalization

The algorithm based on relational algebra requires a relationalization step to transform a given first-order theory ๐“ฃ with function symbols to an equivalent theory ๐“ฃโ€ฒ without functions. Relationalization is a standard transformation that involves flattening of complex terms followed by replacing function symbols and constants (i.e., nullary functions) with new predicates. This transformation also extends the original theory with integrity axioms that preserve the semantics of the eliminated functions.

A discussion about the details of relationalization is out of the scope of this document. Here, I use an example to explain the core idea. Consider the atomic formula below, where ๐š is a predicate, ๐š and ๐š are function symbols, ๐šก and ๐šข are variables, and ๐—ฐ is a constant:

๐š(๐š(๐š(๐šก), ๐šข), ๐—ฐ)

To flatten this formula, we introduce fresh variables, namely ๐šŸ1, ๐šŸ2, and ๐šŸ3 to recursively replace complex terms, consisting of function applications and constants. Notice that the original and the flattened formulae are semantically equivalent:

๐š(๐šŸ1, ๐šŸ2) โˆง ๐š(๐šŸ3, ๐šข) = ๐šŸ1 โˆง ๐š(๐šก) = ๐šŸ3 โˆง ๐—ฐ = ๐šŸ2

Next, for each function symbol of arity ๐š—, we introduce a predicate of arity ๐š— + 1 and replace the equalities between function applications and variables with atomic formulae over the new predicates:

๐š(๐šŸ1, ๐šŸ2) โˆง ๐™ต(๐šŸ3, ๐šข, ๐šŸ1) โˆง ๐™ถ(๐šก, ๐šŸ3) โˆง ๐™ฒ(๐šŸ2)

In the previous example, ๐™ต, ๐™ถ, and ๐™ฒ replace ๐š, ๐š, and ๐—ฐ in that order. The last position of these predicates correspond to the output of the functions that they replace. Finally, to preserve the semantics of the eliminated functions, the following sequents are added to the resulting theory ๐“ฃโ€ฒ:

๐™ต(๐šก1, ๐šก2, ๐šข) โˆง ๐™ต(๐šกโ€ฒ1, ๐šกโ€ฒ2, ๐šขโ€ฒ) โˆง ๐šก1 = ๐šกโ€ฒ1 โˆง ๐šก2 = ๐šกโ€ฒ2 โŠข ๐šข = ๐šขโ€ฒ
๐™ถ(๐šก, ๐šข) โˆง ๐™ถ(๐šกโ€ฒ, ๐šขโ€ฒ) โˆง ๐šก = ๐šกโ€ฒ โŠข ๐šข = ๐šขโ€ฒ
๐™ฒ(๐šข) โˆง ๐™ฒ(๐šขโ€ฒ) โŠข ๐šข = ๐šขโ€ฒ

These sequents ensure that the new predicates are interpreted by relations that map every vector of input arguments to exactly one output, as the original functions do.

Note: It's common to add another set of sequents to the resulting relationalized theory to preserve the totality of functions. The chase, however, interprets the functions of the input theory as partial functions; therefore, we're not concerned with preserving totality.

Equality

Interpreting equality as a relation can be done by axiomatizing equality as a predicate, captured by the following sequents with = as a special binary predicate symbol:

 
โŠค โŠข =(x, x)                (reflexivity) 
=(๐šก, ๐šข) โŠข =(๐šข, ๐šก)           (symmetry) 
=(๐šก, ๐šข) โˆง =(๐šข, ๐šฃ) โŠข =(๐šก, ๐šฃ) (transitivity) 

Note: Our treatment of equality takes advantage of the fact that equality axioms can be expressed as geometric sequents.

Variable Linearization

The translation to relational algebra must account for a number of practical challenges regarding variables and their order of appearance in formulae. From a database perspective, each position on the first-order predicates represents a relational attribute (i.e., a column of a relational table). Therefore, the variables of the formulae in the input theory enforce equality constraints on the resulting select query. Specifically, the positions of two conjoined predicates that are occupied by the same variable specify the key attributes of a relatoinal join where equality between join attributes is defined by the relation = from the previous section. As a result, Razor applies a standard linearization transformation that assigns unique names to variables that occupy positions on any predicate except for =. When renaming variables, linearization introduces additional equalities between the freshly introduced unique variables that rename the same variable in the input formula.

Now, I will use another example to describe linearization:

๐™ฟ(๐šก) โˆง ๐š€(๐šข, ๐šก, ๐šข)

A possible linearized version of the formula above is obtained by renaming the second occurances of x and ๐šข, in ๐š€(๐šข, ๐šก, ๐šข), with new variables, namely x0 and ๐šข0. In addition, =(x, x0) and =(๐šข, ๐šข0) are conjoined into the formula to preserve equality between a variable and its renaming counterpart:

๐™ฟ(๐šก) โˆง ๐š€(๐šข, ๐šก0, ๐šข0) โˆง =(x, x0) โˆง =(๐šข, ๐šข0)

Range Restriction

I mentioned earlier that in our implementation of the chase based on relational algebra, a sequent ๐œ‘ โŠข๐˜… ๐œ“ is evaluated as a relational expression ๐›ท - ๐›น where ๐›ท and ๐›น are relational expression obtained by transforming ๐œ‘ and ๐œ“. The set difference operator is defined only when the attributes of ๐›ท and ๐›น align; that is, they must share the same attributes in the same order. If ๐›ท and ๐›น share the same set of attributes but in different orders, it's always possible to project either of the expressions to match the order of attributes. If ๐›ท and ๐›น have different attributes, two cases are possible:

  1. An attribute ๐œถ appears in ๐›ท but not in ๐›น. In this case, we simply project ๐œถ out of ๐›ท.
  2. An attribute ๐œท appears in ๐›น but not in ๐›ท. The treatment for this case involves range restriction as described below.

In database theory, a range-restricted query

talk about other implementations