Skip to main content

Lagunita is retiring and will shut down at 12 noon Pacific Time on March 31, 2020. A few courses may be open for self-enrollment for a limited time. We will continue to offer courses on other online learning platforms; visit http://online.stanford.edu.

Non-registered mode. Register to save your course progress.

By exploring the course, you are agreeing to our Terms of Service and Privacy Policy. Please read them carefully.

Captions: Nonlinear Mutual Recursion

In this video we'll be exploring

some further issues involved in recursion in the SQL language.

First a reminder of how SQL implements recursion.

There's a with statement in SQL

that can be specified to have

recursively defined relations in it.

We say with recursive, and

then we define a set of

relations, where the query

to find relation could involve

the relation itself, so that's where recursion pops in.

And then, at the end the

final result is a query

that might involve those recursively defined

relations as well as other tables in the database.

As we saw in the previous

video and demo, it's very

common for recursively defined relations

in the with statement to take

the structure of having a

base query that doesn't involve

R, the recursively define relation, unioned

with the recursive query and we saw many examples of that form.

The first thing I want to talk

about in this video is what's called linear recursion.

Linear recursion specifies that in

the recursive definition of R,

and again let's assume it takes

this form of the base query union and the recursive query.

In the recursive query, There

is only one reference to

the recursively defined relation R.

So let's take a look at

an example to understand linear

recursion and nonlinear recursion.

the first example we used

when we introduced recursion was finding

ancestor relationships from a

base table that just has

parent child relationships so a

basic transit of closure

operation and the query

we wanted to run was to find all of Mary's ancestors.

And here's the query that we wrote.

It does take the form of

having a base query here.

Which says if we have

a parent relationship that's also an ancestor relationship.

And then the recursion occurs in

the second part of the union

where we join the recursively defined

ancestor relationship, ancestor relation,

with parents so that we

extend the ancestors with one more generation.

Now this query does have

linear recursion because we only

have one instance here of

the recursively defined relation, ancestor.

So let's take a

look at what happens underneath when this query is executed.

We start with our parent table.

And here it is with

a parent and child and let's

suppose we have say, Sue

and John, and John

and Mary for example, in our parent table.

Then in what's effectively the first iteration,

the base query here is

run that copies the parent table to the ancestor table.

So now we have Sue and

John, and John and Mary,

and anything else that we had

in the parent table in the ancestor table.

As the iteration continues we're

effectively joining the parent

table and the ancestor table

to get additional tuples in the ancestor table.

For example, we see that

Sue and John, the

Sue and John tuple here, would

join with the John and Mary

tuple, and that would give

us Sue and Mary in the ancestor table.

The iteration continues until there

are no new tuples to add to the ancestor table.

And then we're done with our

recursively defined relation, and we

can go ahead and execute the final query in the with statement.

And again, often when I say we, I really mean we the system.

All of this, is of course, being

performed by the system as it

executes the recursively defined with statement.

Now, let's take a look

at a non-linear expression of the same query.

And here it is.

What we see here is that

the primary change is right in here.

Instead of joining the parent

with the ancestor in the

recursive half, we're going

to join two instances of the ancestor relation.

And let's see what happens during

execution when this is how we express our recursion.

So we again start by

copying the contents of the

parent table into the ancestor

table as part of the base query.

And I've already shown that here.

But, now, instead of during

iteration joining the parent table with the ancestor table.

We're actually going to join

the ancestor table with itself to generate new tuples.

For example, we will join

the first two tuples in ancestor

with each other, Sue-John and John-Mary,

in order to obtain what

was the same tuple we obtained with

the linear recursion, which would be the tuple with Sue and Mary.

Just a quick reminder, I intended

to say this earlier, but it's

the fact that we have these

two references to ancestor in

the recursion here, that makes it non-linear.

OK so what's the deal with these two queries?

Why might we prefer one form of the query over the other?

And take my word for it,

by the way, we do get

equivalent results to the

query in its linear and non-linear versions.

Well, here's some pros and cons to non-linear versus linear.

For this particular query and actually

in general, when we can express a query both ways.

First of all there's some

pluses to the non-linear so the query looks cleaner.

If you go back and look at

the 2 queries the non-linear version

is sort of more symmetric, a

little shorter even to express, than the linear version.

Second of all, the nonlinear

version actually converges faster to

the fixed point, to the final state,

than the linear version.

And I'm going to show that a

little bit abstractly because it is actually fairly important.

So I'm going to create this

abstract example, parent-child

relation, which is going

to be completely linear, just for illustrative purposes.

So we have this person here

who's the parent of the person

here, who's the parent of

a person here, and so on.

We're going to make it eight levels deep.

So this is an abstraction of

our "a parent" table, and

now let's see how ancestors are computed.

So in the first step

we'll add one ancestor tuple

for each tuple in the

parent relation, so the

purple are the tuples that are added to ancestor.

Then in the second iteration

we're going to join those with themselves.

I'm sorry, we're going to

join the ancestor tuples with parent tuples

so each ancestor tuple could

be extended by one.

So that's going to give us all pairs of tuples.

I'm sorry, it's already getting a bit crowded here, but I think that you will get the idea.

On the next iteration we're going

to again take our ancestor

tuple and extend them by one, by joining them with parent.

So after the second we'll

have all triples here,

so all great-grandparent relationships.

OK, and that's a big mess.

But you can really see what's going on.

Each time we iterate, we get

one more generation added into the ancestors.

And now let's think about what happens when we use the non-linear version.

Where after the first step, we

join ancestor with itself instead of ancestor with parent.

So as before on the

first step, and now I'm

going to make these red, the ancestor

relation will contain exactly the same as the parents.

And the second step is

the same as well, we're going

to join ancestor with itself but,

since each one of ancestor

is only the parent relationship, we're

again, going to get all pairs

in the second step of the iteration.

The difference begins in the third step.

Now we're joining ancestor with itself.

So we will be joining these

two-step ancestors with the

single ones, just like before, to get all the threes.

But we will also be joining twos with twos.

In other words we will joining

grandparent relationships with grandparent relationships.

And we will be getting in that same iteration the fours.

So as you can see, the

nonlinear version does converge faster.

Now this example is very

small so it's not as blatantly obvious.

But the linear version

is going to take a linear number

of iterations in order to

converge to the final recursively

defined relation contents.

Whereas when we use the non-linear version, it's actually logarithmic.

So for a large database it can be considerably faster.

So what about the downsides of non-linear recursion?

Well, the major downside is

that it's harder to implement or

certainly harder to implement efficiently.

And as a result of that actually

the SQL standard only requires linear recursion.

And the postgres system that

we've been using also only supports linear recursion.

Last modified:
March 1, 2014, 6:58 p.m.