Implementation
Razor's model-finding algorithm is a variant of the chase, an algorithm for constructing models for geometric theories. Given an input first-order theory ๐ฃ, Razor applies a standard syntactic transformation to convert ๐ฃ to a Geometric Normal Form (GNF). The result of this transformation is a geometric theory ๐ that is equisatisfiable with ๐ฃ.
Note: Transformation to GNF consists of transforming the input theory to a Conjunctive Normal Form (CNF) followed by rearranging the conjuncts of the resulting CNF into an implication, where negative and positive conjuncts form the implication's premise and consequence respectively. The implication is semantically equivalent to a geometric sequence and is treated as such.
It's noteworthy that conversion to CNF involves Skolemization, a transformation by which existential quantifiers of the input theory are replaced with fresh "Skolem functions". Therefore, the resulting GNF is a theory over an extension of the original vocabulary over which the input theory is defined.
Let ๐ be an input geometric theory. The essense of a chase-step on a model ๐ is to find a sequent ๐ โข๐ ๐ of ๐ with an assignment from the variables in ๐ to the domain of ๐ such that ๐ is true and ๐ is false in ๐. A naive implementation of the chase would iterate over all sequents of ๐ and examine every assignment from ๐ to the domain of ๐. This would lead to |๐ ||๐| valuations of ๐ and ๐ in ๐ where |๐ | and |๐| denote the size of ๐ and the domain size of ๐ respectively. Such a naive implementation is extremely inefficient for practical model-finding use-cases.
A practical implementation of the chase must address a number of algorithmic challenges: it must provide efficient data-structures for querying models and keeping track of various search branches; it should implement effective heuristics for choosing a search branch and for selecting a sequent to evaluate; and most importantly, it must implement a fast solution for finding assignments from the free variables in ๐ to the elements of ๐.
Hello Codd!
Codd's theorem is a well known result from database theory that establishes equivalence between relational algebra and relational calculus thus first-order logic. Razor takes advantage of this equivalence to implement the chase efficiently with the help of database theory. From a database perspective, a model ๐ constructed by the chase can be thought of as a monotonically growing database with relational tables for each predicate of the input theory ๐. Consequently, positive existential formulae are expressed queries with conjunction (โง), disjunction (โจ), and existential quantifier (โ) acting as relational join (โ), set union (โช), and projection (โ). Finally, a geometric sequent ๐ โข๐ ๐ is evaluated by a query ๐ท - ๐น, the set difference of the relational expressions corresponding to ๐ and ๐. Intuitively, evaluating ๐ท - ๐น in ๐ results in a set of tuples that assign values to the variables in ๐ such that ๐ is true and ๐ is false, as required by the chase-step. After evaluating ๐ท - ๐น, the chase-step proceeds by inserting the resulting tuples in the relations mentioned by ๐น, thus, making ๐ true in ๐.
Razor's implementation of the chase uses codd
, a minimal
in-memory database with relational expressions as queries. The
underlying database abstraction would enable Razor to take advantage
of many advances in database theory and particularly streaming
databases to run more efficiently. For example, codd
supports
incrementally updated materialized views that are used to avoid full
re-evaluation of expressions during a run of the chase as the
database (i.e., the model ๐) grows.
Relationalization
The algorithm based on relational algebra requires a relationalization step to transform a given first-order theory ๐ฃ with function symbols to an equivalent theory ๐ฃโฒ without functions. Relationalization is a standard transformation that involves flattening of complex terms followed by replacing function symbols and constants (i.e., nullary functions) with new predicates. This transformation also extends the original theory with integrity axioms that preserve the semantics of the eliminated functions.
A discussion about the details of relationalization is out of the scope of this document. Here, I use an example to explain the core idea. Consider the atomic formula below, where ๐ is a predicate, ๐ and ๐ are function symbols, ๐ก and ๐ข are variables, and ๐ฐ is a constant:
๐(๐(๐(๐ก), ๐ข), ๐ฐ)
To flatten this formula, we introduce fresh variables, namely ๐1, ๐2, and ๐3 to recursively replace complex terms, consisting of function applications and constants. Notice that the original and the flattened formulae are semantically equivalent:
๐(๐1, ๐2) โง ๐(๐3, ๐ข) = ๐1 โง ๐(๐ก) = ๐3 โง ๐ฐ = ๐2
Next, for each function symbol of arity ๐, we introduce a predicate of arity ๐ + 1 and replace the equalities between function applications and variables with atomic formulae over the new predicates:
๐(๐1, ๐2) โง ๐ต(๐3, ๐ข, ๐1) โง ๐ถ(๐ก, ๐3) โง ๐ฒ(๐2)
In the previous example, ๐ต, ๐ถ, and ๐ฒ replace ๐, ๐, and ๐ฐ in that order. The last position of these predicates correspond to the output of the functions that they replace. Finally, to preserve the semantics of the eliminated functions, the following sequents are added to the resulting theory ๐ฃโฒ:
๐ต(๐ก1, ๐ก2, ๐ข) โง ๐ต(๐กโฒ1, ๐กโฒ2,
๐ขโฒ) โง
๐ก1 = ๐กโฒ1 โง ๐ก2 = ๐กโฒ2 โข ๐ข = ๐ขโฒ
๐ถ(๐ก, ๐ข) โง ๐ถ(๐กโฒ, ๐ขโฒ) โง ๐ก = ๐กโฒ โข ๐ข = ๐ขโฒ
๐ฒ(๐ข) โง ๐ฒ(๐ขโฒ) โข ๐ข = ๐ขโฒ
These sequents ensure that the new predicates are interpreted by relations that map every vector of input arguments to exactly one output, as the original functions do.
Note: It's common to add another set of sequents to the resulting relationalized theory to preserve the totality of functions. The chase, however, interprets the functions of the input theory as partial functions; therefore, we're not concerned with preserving totality.
Equality
Interpreting equality as a relation can be done by axiomatizing equality as a predicate, captured by the following sequents with = as a special binary predicate symbol:
โค โข =(x, x) (reflexivity) =(๐ก, ๐ข) โข =(๐ข, ๐ก) (symmetry) =(๐ก, ๐ข) โง =(๐ข, ๐ฃ) โข =(๐ก, ๐ฃ) (transitivity)
Note: Our treatment of equality takes advantage of the fact that equality axioms can be expressed as geometric sequents.
Variable Linearization
The translation to relational algebra must account for a number of practical challenges regarding variables and their order of appearance in formulae. From a database perspective, each position on the first-order predicates represents a relational attribute (i.e., a column of a relational table). Therefore, the variables of the formulae in the input theory enforce equality constraints on the resulting select query. Specifically, the positions of two conjoined predicates that are occupied by the same variable specify the key attributes of a relatoinal join where equality between join attributes is defined by the relation = from the previous section. As a result, Razor applies a standard linearization transformation that assigns unique names to variables that occupy positions on any predicate except for =. When renaming variables, linearization introduces additional equalities between the freshly introduced unique variables that rename the same variable in the input formula.
Now, I will use another example to describe linearization:
๐ฟ(๐ก) โง ๐(๐ข, ๐ก, ๐ข)
A possible linearized version of the formula above is obtained by renaming the second occurances of x and ๐ข, in ๐(๐ข, ๐ก, ๐ข), with new variables, namely x0 and ๐ข0. In addition, =(x, x0) and =(๐ข, ๐ข0) are conjoined into the formula to preserve equality between a variable and its renaming counterpart:
๐ฟ(๐ก) โง ๐(๐ข, ๐ก0, ๐ข0) โง =(x, x0) โง =(๐ข, ๐ข0)
Range Restriction
I mentioned earlier that in our implementation of the chase based on relational algebra, a sequent ๐ โข๐ ๐ is evaluated as a relational expression ๐ท - ๐น where ๐ท and ๐น are relational expression obtained by transforming ๐ and ๐. The set difference operator is defined only when the attributes of ๐ท and ๐น align; that is, they must share the same attributes in the same order. If ๐ท and ๐น share the same set of attributes but in different orders, it's always possible to project either of the expressions to match the order of attributes. If ๐ท and ๐น have different attributes, two cases are possible:
- An attribute ๐ถ appears in ๐ท but not in ๐น. In this case, we simply project ๐ถ out of ๐ท.
- An attribute ๐ท appears in ๐น but not in ๐ท. The treatment for this case involves range restriction as described below.
In database theory, a range-restricted query
talk about other implementations