Presentations
Hello, I'm a student in computer science and mathematics.
I've really liked using Julia so far and did some blogs already, some with good feed backs, some a bit less good...
Pronouns are she/they, and I'm proudly transfem.
Anyway, I just want to share some cool stuffs.
Generalities
Tested on julia 1.9, the interface may change in the future, but the ideas will probably still remain.
I'm not going to speak a lot about Iterators.jl nor Lazy.jl, etc. too much because they hide away some things I want to show.
If constructed from not at lot, I believe we can see how it works clearer. (I still much prefer julia than C, maybe I like being blind oo ??)
Blah blah blah, made simplier, blah blah, not the most optimal algorithms, blah blah for learning (also I'm not perfect). (exercice to the reader: change the blahs to make the intended sentence)
There will probably be lots of code, it's where the important part resides, it's better not to blindly copy and paste it, but instead, try to understand it, stop looking at the blog, and come up with a solution for similar problem.
I'm assuming quite a lot of familiarity with Julia (and I do mean it, I'll use a bit of metaprogramming, functors, shortcircuiting, closure, etc. I'm by no mean a pro, but I've dabbled with Julia for a long time), it's a post for Julia programmers who haven't explored the iterator world.
Iterators in Julia and infinite data structures
Iterators
I'll call "iterator" everything (which I'll symbolize with an V
) which can be iterated with the syntax:
for element in V
...
end
Base.iterate
Most of the post will be there, because it's the hearth of all iterators, once you understand it, you understand the others way faster.
The usual way to create an iterator is to overload the Base.iterate
function for your own type (it thus require you to create a new type for your iterating buisness... which I think is the main downfall of this method, but generalizing makes it okay)
Don't forget to do import Base: iterate
before overloading the iterate function, otherwise you'll make a new one, and EVERYTHING fall apart.
To understand the iterate
interface, here's a how iterations are translated (approximately, it's not purely equivalent because it introduce a new symbol, which may already exist, and also continue
and break
makes it more complicated):
for e in V
f(e)
end
becomes
it = iterate(V)
while it !== nothing
e, state = it
f(e)
it = iterate(V, state)
end
If you don't understand:

iterate
takes your iterable at first argument  to return the
e
value, it's the first element returned byiterate
 the
iterate
function return astate
at the second argument if it returns ane
 the
iterate
function will returnnothing
when it stops iterating  the
iterate
function take astate
at second argument, which will help up know where we are iterationwise  at first, the
iterate
function won't take a state, because it's the start and we don't know what the state will be yet :)
This means that we'll have to develop strategies on how to represent the state, which isn't always obvious.
making states
There are no enforced structure for your iterator state, which is good as it is very flexible, and bad at the start because you may not know how to start/model it.
The biggest difficulty I've found in writing raw iterators is transforming nonobvious recursive iteration algorithms to an iterative (even harder than iterative because you'll go to the "top" of your function at each step) one, so this section is all about patterns and exemples.
So here your new friend, the binary tree:
struct BinTree{T<:Number}
value::T
left::Union{Nothing, BinTree{T}}
right::Union{Nothing, BinTree{T}}
end
Here's some helper function to facilitate making binary trees (it makes it easier to try it yourself :) :
BinTree(v::Number) = BinTree(v, nothing, nothing)
function BinTree(t::Tuple)
if length(t) == 1
BinTree(t[1], nothing, nothing)
elseif length(t) == 2
BinTree(t[1], BinTree(t[2]), nothing)
else
BinTree(t[1], BinTree(t[2]), BinTree(t[3]))
end
end
and displaying them nicely:
import Base.Multimedia: display
function display(t::BinTree, n::Integer)
print(" "^n)
println("$(t.value)")
if t.left !== nothing
print(" "^n)
println("left:")
display(t.left, n+1)
end
if t.right !== nothing
print(" "^n)
println("right:")
display(t.right, n+1)
end
end
display(t::BinTree) = display(t, 0)
I did this exemple as an exercice to a friend (and they managed to complete it with an idea I didn't thought about, big congratz) as a sort of julia course, I feel like it's complexe enough to be interesting, and simple enough to show the basics.
We'll come up with various strategies iterate it in preorder.
Cheating a lot
Definitely not always the best, but it's the easy hack so I find it useful to know (the next one if way more useful, but it builds on this one).
If you'd want to print the tree in preorder, you'd (probably) do:
preorder(_::Nothing, _) = nothing
function preorder(t::BinTree)
println(t.value)
preorder(t.left)
preorder(t.right)
end
Really easy, right ? But it sadly doesn't give an iterator :/
But we can make a vector out of it, and iterate through this vector instead !
Here's how:
preorder(nothing, _) = nothing
function preorder(t::BinTree, io)
println(io, t.value)
preorder(t.left, io)
preorder(t.right, io)
end
function iterate(t::BinTree{T}) where T
io = IOBuffer()
preorder(t, io)
V = take!(io) > String > chomp > Base.Fix2(split, r"\r?\n") .> Base.Fix1(parse, T)
close(io)
it = iterate(V)
it === nothing ? nothing : (it[1], (V, it[2]))
end
function iterate(t::BinTree, state)
V, s = state
it = iterate(V, s)
it === nothing ? nothing : (it[1], (V, it[2]))
end
Here, we take the output of preorder as a String, we split it by newlines, and parse it, to make a vector which is at the "right" order, and we iterate through this vector instead.
As you'd imagine, this particular approach have multiple problems:
 It does lots of useless stuffs (putting elements to an IOBuffer before parsing them)
 You have no guarenties that the parse method is implemented for your type
 You have no guarenties that
parse(T, String(x::T)) == x
The idea of wrapping over an iterator is a really powerful one (with some definition of powerful that I'm not gonna give so that I'm always right oo), as an exercice, do a Map
structure taking a function and an iterator and map over it (but not precomputing every values), do the same for Filter
and Product
and Zip
.
Cheating a little bit less
You can use serializer + deserializer to remove some of the above problems (specifically: having a parse and parsing string is same as identity), while still having an overhead, but probably less.
Here, I'll consider only BinTree{Float64}
and BinTree{UInt64}
to implement serializer and deserializer only on Float64
and UInt64
serialize(io, f::Float64) = write(io, f) # write will write f content in pack of 8 bytes.
serialize(io, u::UInt64) = write(io, u) # same here
function deserialize(bytes, pos, ::Type{T}) where T <: Union{UInt64, Float64}
V = @view bytes[pos:pos+81] # get the 8 bytes which represent the result we want
u::UInt64 = 0
if ENDIAN_BOM == 0x04030201 # little endian store bytes in "reverse"
for i in 8:1:1
u <<= 8
u = V[i]
end
else # big endian
for i in 1:8
u <<= 8
u = V[i]
end
end
reinterpret(T, u)
end
and the iterating code will be:
preorder(nothing, _) = nothing
function preorder(t::BinTree{T}, io) where T <: Union{Float64, UInt64}
serialize(io, t.value)
preorder(t.left, io)
preorder(t.right, io)
end
function iterate(t::BinTree{T}) where T
io = IOBuffer()
preorder(t, io)
bytes = take!(io)
close(io)
V = [deserialize(bytes, pos, T) for pos in 1:8:length(bytes)1]
it = iterate(V)
it === nothing ? nothing : (it[1], (V, it[2]))
end
function iterate(t::BinTree, state)
V, s = state
it = iterate(V, s)
it === nothing ? nothing : (it[1], (V, it[2]))
end
It has the disadvantage of having to write serializer and deserializer for lots of types (you can do a lot of them in one go, but not all type is a bit type, and you may want to do another representation to encode a sumtype/uncertainty).
It's still way better in my opinion than last exemple.
Cheating right
I can already hear the screams about what I did above, how obvious it is to do the one I'll show instead in the cheating style, sure, but I showed you two ideas which are more universal than iterating, and I'm happy about it :)
Let's thinkg about what we've been doing so far: iterating via recursion on the tree, putting elements on an IOBuffer, and then retrieving the elements of the IOBuffer into a vector...
There sure seems to be an easy way to do only one step... oo
Oh yeah, simply pushing elements to the vector without doing the extra steps.
Well, this one is simpler to implement (and, it's the simpliest we'll see in my opinion):
preorder(nothing, _) = nothing
function preorder(t::BinTree, V)
push!(V, t.value)
preorder(t.left, V)
preorder(t.right, V)
end
function iterate(t::BinTree{T}) where T
V = T[]
preorder(t, V)
it = iterate(V)
it === nothing ? nothing : (it[1], (V, it[2]))
end
function iterate(t::BinTree, state)
V, s = state
it = iterate(V, s)
it === nothing ? nothing : (it[1], (V, it[2]))
end
Makes you wonder why I did the complicated stuffs above :P (probably to make you come up with this solution, and/or showing you how nice this one is)
There are little overhead, less than the above ones (I don't want to benchmark, but I'd guess so).
Though, it's using $O(n)$ of memory where n
is the number of nodes in the tree. That's suboptimal (if your goal is to not use lots of memory)
The state as the steps to go where you are
Okay, so, the idea seems well explained in the title, but I'll still explain it more in depth:
 the step is a Vector of functions, or elements which you can associate a defined function to (we'll do this one instead).
 the initial object is V, and the "current" object, if your state is
[fn1, fn2, fn3, ...]
isV > fn1 > fn2 > fn3 > ...
 you setup your new state so that it gives you the new "current" object.
An example will make it clearer:
The function representations
@enum Direction begin
Left
Right
end
(dir::Direction)(t::BinTree) = dir == Left ? t.left : t.right
With Left
representing x > x.left
and Right
representing x > x.right
.
The iteration process becomes:
function iterate(t::BinTree, state = Direction[])
for fn in state
next = fn(t)
next === nothing && return nothing
t = next
end
if t.left === nothing && t.right === nothing
right_count = 0
dir = Right # just a default value which is not Left but with the same type
while !isempty(state) && dir != Left
dir = pop!(state)
right_count += 1
end
if dir == Left
push!(state, Right)
return t.value, state
end # state is empty:
for _ in 1:right_count+1
push!(state, Right) # put the state into an invalid combinaison so that the iteration will end after
end
return t.value, state
end
if t.left !== nothing
push!(state, Left)
return t.value, state
end # t.right !== nothing
push!(state, Right)
return t.value, state
end
I recommend taking a piece of paper and apply this algorithm on a tree to get a hand of the logic.
There are good and bad part about this way, on the good parts: it's quite easy, you can use it without wrapping around another iterator, it takes less memory generally than constructing a Vector
and wrapping around it, it doesn't have the initial time cost.
You can change the Left
and Right
enum for integers and remember which means what, so that you don't pollute the namespace with types. (but it a namespace vs readability tradeoff)
Now, on the bad parts: you have to think about how to go from the current to next state by changing functions ahead of computing them, that's not the most intuitive thing, you compute the same operations LOTS of time, in fact, this algorithm is $O(n*h)$ where h is the height of the tree and n the number of nodes, if the tree is not balanced at all, that's $O(n^2)$ !
Exercice: inspire yourself by this algorithm to make a $O(n)$ iterator (the best you can have) with a memory usage of $O(1)$, given a binary tree structure which also have its parent in its fields, and where each of its nodes are distinct.
Replicating the function stack as the state
Great, several hundred of lines in, and finally an "interesting" (as in, you probably didn't do that) idea. It came to me by writing the Hanoi tower iteratively and then writing the Ackermann function in assembly.
(get your seatbelt, there'll be lots of code and explaining in coming)
For the first time I'm going to try to formalize this process instead of yoloing it all the way through, feel free to tell me how unreadable it is :)
You first start by using the recursive function, and the return iterator will be the value returned by the imaginary function yield_iter
function preorder(t::BinTree)
yield_iter(t.value)
if t.left !== nothing
preorder(t.left)
end
if t.right !== nothing
preorder(t.right)
end
end
Then you label each new scope, and the "just after" for every recursive and yield_iter
call:
function preorder(t::BinTree)
@label START
yield_iter(t.value)
@label AFTER_YIELD
if t.left !== nothing
@label LEFT
preorder(t.left)
end
@label RIGHT
if t.right !== nothing
@label RIGHT_NOT_NOTHING
preorder(t.right)
@label END
end
end
Now, the state
will be a vector of tuple of the arguments/environment, and where it is in the function.
We'll take the "where it is" as a Symbol
with values being the same as the preorder labels.
The state
will be at first [(t, :START)]
, and we'll operate on it while it's not empty, and do a big switch case if elseif else end group to manage what to do under each steps.
Which give:
function iterate(t::BinTree, state = [(t, :START)])
while !isempty(state)
t, s = pop!(state)
if s == :START
push!(state, (t, :AFTER_YIELD))
return t.value, state
elseif s == :AFTER_YIELD
push!(state, (t, :RIGHT))
if t.left !== nothing
push!(state, (t, :LEFT))
end
elseif s == :LEFT
push!(state, (t.left, :START))
elseif s == :RIGHT
if t.right !== nothing
push!(state, (t, :RIGHT_NOT_NOTHING))
end
elseif s == :RIGHT_NOT_NOTHING
push!(state, (t.right, :START))
elseif s == :END
nothing
else
throw("forgot label in iterating BinTree")
end
end
return nothing # useless but it makes it clearer that it's expected to return nothing
end
Which, in this case, lots of labels can be merged (some jumps are unecessary), giving the little bit clearer function:
function iterate(t::BinTree, state = [(t, :START)])
while !isempty(state)
t, s = pop!(state)
if s == :START
push!(state, (t, :AFTER_YIELD))
return t.value, state
elseif s == :AFTER_YIELD
if t.right !== nothing
push!(state, (t.right, :START))
end
if t.left !== nothing
push!(state, (t.left, :START))
end
else
throw("forgot label in iterating BinTree")
end
end
end
Be careful, the order in which you process your state is the reverse of the order you put label. (it's a stack)
It's IMO, the most general way to create an iterator, it can be quite complex sometimes though.
So here's some exercices in increasing order of difficulty (while still being manageable): doing a Tower of Hanoi iterator (it returns A => B where A and B are number from 1 to 3, explaining the current step to do to solve the problem), iterator of Ackermann function (returning each intermediate values), doing a lazy SKI combinator calculus interpreter (using the leftmost Sreduction strategy (aka. not evaluating the arguments first, but instead evaluating the leftmost function transformation first)), and finally, writting Kosaraju's algorithm by simulating the function stack (and thus, no recursion).
The state as "what do I need still to do"
Hello friendo, I liked your idea some I'm βοΈ stealing it βοΈ
(I did asked them if it was okay, and it's okay)
So, in some sense, the one above is a specific case of this one, but the "what you need to do" is the step in the algorithm.
Here, instead of the next step in the algorithm, it'll be which elements are still to be proceeded.
Imagine you're processing a node, after returning your value, you'll need to proceed the left side, and then the right side.
We'll put those steps in a stack, and thus we'll put those in a reversed order.
This gives the iterate
definition of:
function iterate(t::BinTree, v=[t])
isempty(v) && return nothing # no steps to be processed
top = pop!(v)
top.right === nothing  push!(v, top.right)
top.left === nothing  push!(v, top.left)
top.value, v
end
I find it really elegant (that's how you can verify that it's not mine oo), it makes for some easy but efficient definitions, and it's easily adaptable for cyclic datastructures, when you want to iterate on each node (I know, not everything is a graph, but I can't come up with a precise word so do if I did...) once.
Other tricks
Handling repetitions
We'll now stop with our BinTree
, I think we've explored it enough.
If you data structure in recursive, in a way that you may come to an already visited node, and you don't want to loop on it again, I recommend have a Tuple{Set{T}, normal_state_if_no_loop(T)}
data structure for your state
, you push!
the seen elements into the set, and you skip the elements that are already inside the set. (relevent exercice: do an iterator on a nonoriented, connex graph)
Stateful Iterators
As someone who likes functional programming, seeing the purity of iterators getting destroyed and adding states to it is truly what I do like. (said no one ever)
Well, it's still a part of iterators in julia, and I'll cover it despite not being so much of a fan of it.
A stateful iterator is one who have an internal sense of state whose scoping goes beyound what you can obtain with the second argument of the iterate
function.
There's multiple ways to do about that, so I'll go from the cursest to the easiest.
You can have global variables, and have your iterator modyfying them with some eval
and crafted expressions in your iterate
function. (I'll be kind and now show example to spare your eyes)
You can have global variables, and have your iterator modyfying them with some gloabal <variabl> =
semantics in your iterate
function. (I'll once again be kind and spare your eyes)
You may say, "but (explicitely saying my username here would be too ambiguous), what about local variables??", and I tried... But basically, esc
on expressions and eval
uating them didn't work, so that's the big sad :/
I just (partially) lied to you, you can do the same with local variables, but with a downside, you won't be able to define your iterate
globally, at least, it's not the one you'll be using, you'll be using a closure.
I lied to you again oo, for
isn't equivalent to the while
way of doing things because the closure won't be passed, which... is quite sad, I'd really like it to work π
So I wrote a pretty long macro to transform for
iterating over one variable (I didn't handled the multiple variable one) into an equivalent (or so I think, I'm not good at macro hygiene):
function transform_for_f(s::Symbol, n)
str = "$s"
!isempty(str) && str[1] == '@' ? s : esc(s), n # you shouldn't escape macro calls, it doesn't work
end
transform_for_f(s, n) = s, n
function transform_for_f(e::Expr, label)
if e.head == :for
continue_label = "continue_" * "$label" > Symbol
break_label = "break_" * "$label" > Symbol
it = Symbol("it_$label")
V = Symbol("V_$label")
state = Symbol("state_$label")
label += 1
i = esc(e.args[1].args[1])
v, label = transform_for_f(e.args[1].args[2], label) # I actually just want to escape stuffs, uggly hack but who write for loops inside the iterator ??
body = eltype(e.args[2].args)[]
for expr in e.args[2].args
new_expr, label = transform_for_f(expr, label)
push!(body, new_expr)
end
w = Expr(
:while,
Expr(:call, :(!==), it, esc(:nothing)),
Expr(
:block,
Expr(:(=), i, Expr(:ref, it, 1)),
Expr(:(=), state, Expr(:ref, it, 2)),
body..., # I like how I don't have to do push! anymore :)
Expr(:macrocall, Symbol("@label"), :(()), continue_label),
Expr(:(=), it, Expr(:call, esc(:iterate), V, state)),
),
)
Expr(
:block,
Expr(:(=), V, v), # don't compute the V lots of time, it may have side effects
Expr(:(=), it, Expr(:call, esc(:iterate), V)),
w,
Expr(:macrocall, Symbol("@label"), :(()), break_label),
),
label
elseif e.head == :continue
continue_label = "continue_" * "$(label1)" > Symbol
Expr(:macrocall, Symbol("@goto"), :(()), continue_label), label
elseif e.head == :break
break_label = "break_" * "$(label1)" > Symbol
Expr(:macrocall, Symbol("@goto"), :(()), break_label), label
else
h = e.head
args = eltype(e.args)[]
for expr in e.args
new_expr, label = transform_for_f(expr, label)
push!(args, new_expr)
end
Expr(h, args...), label
end
end
macro transform_for(f::Expr)
transform_for_f(f, 0)[1]
end
And now we can make it work ! (I won't explain how I did this macro, it was through pain (lots of bugs) and I don't feel confident saying that it's good)
Example:
struct It end
@transform_for function example()
a = 0
function iterate(::It, state=0)
state === nothing && return nothing
last_a = a
a = 1
last_a, nothing
end
for e in It()
println(e) # 0
end
for e in It()
println(e) # 1
end
end
(exercice to the reader: change the macro to accomodate for iterating over multiple variables (the for a in A, b in B ... end
))
Okay, now I can stop being an absolute weirdo (nobody write stateful iterators like that) oo
The easiest (and probably most common) way to do a stateful iterator is to have the iterator be a mutable struct, and mutating it :P
Example:
mutable struct mutFib
a::Int
b::Int
end
function iterate(f::mutFib, _=nothing)
tmp = f.a
f.a = f.b
f.b += tmp
tmp, nothing
end
F = mutFib(1, 0)
for f in F
println(f)
f > 1000 && break
end
for f in F
println(f)
f > 10000 && break
end
It can be used if you don't want to do the iterations from the beginning each time, and also if the iteration is clostly and you decide to store the computed value for the next time you'll need it.
Iterator traits
Someone asked me if I was going to do iterators traits, and I needed to look at what it was :P
So I did some basic research, and I'll address it, here's the documentation, if someone knows more about this, feel free to educate us.
Not a trait, but we all know that type stability is cool, and it seems like julia don't use the infered type of iterate for eltype :/ so fix that !! And override the Base.IteratorEltype
to Base.EltypeUnknown
if the result of eltype
if you don't know the eltype. (the default is HasEltype
)
Some algorithms are different depending on if your iterator is stateful or not, and sadly Stateful
isn't one of the traits...
For the existing traits, you can (should) also override the Base.IteratorSize
, to Base.IsInfinite
if it won't stop, Base.SizeUnknown
if you can't tell in advance and iterating to know would be too computational heavy (or may even not finish !), Base.HasLength
if you know the length and that length(iter)
is implemented and Base.HasShape{N}()
if you know the length and thath size(iter, [dim])
is implemented (for multidimensional iteration). Related to size, but not a trait, try to implement the isdone
function, because isempty
may advance your iterator, which is a problem for stateful ones, so generic code should be done with isdone
.
Generators
Generators is what you obtain with the (f(i) for i in V)
syntax, unlike the [f(i) for i in V]
syntax, it doesn't precompute everything in one go and create a Vector
.
Hmmmm, almost if it was iterating over the V
and returning f(i)
for value...
And that's exactly it, just some syntaxic sugar + compiler magics (which makes it way harder to do multipledispatch, that's not cool).
You can found about by using julia:
(f(i, j) for i in 1:10) > show # for me, shows: "Base.Generator{UnitRange{Int64}, var"#9#10"}(var"#9#10"(), 1:10)"
You see those var"#9#10
? Well, that makes overriding harder :/ But they makes sense, it's because it's creating a lambda under the hood and lambdas have those weirdish names.
They really are just map
over products on its iteration arguments, and if there are a if ...
clause, it's a map iterator over a filter iterator over a product iterator (chaining iterators <3).
And by the way, Base.Generate
really is just the map and product, it doesn't handle the filtering, so you'll need to use Base.Iterators.Filter
(or something similar)
Channels
Channels, despite being in the multithreading section of the julia documentation, can be used as mere iterators.
It has for definition Channel{T=Any}(func::Function, size=0; taskref=nothing, spawn=false)
, we won't talk about taskref
and spawn
(they aren't important for single process iteration)
The way that it works is that it have an internal queue of maximum size size+1
, it'll go over the function, and when the queue is full (or when the function ends), it'll give back the executing control, and it'll resume when the function isn't finished and that queue isn't full anymore.
The func
is a function which takes only one argument, let's call it c
, and its use is that you can put!(c, <value>)
to add <value>
to the channel queue.
If you have troubles understanding, here is the relevent code:
mutable struct Channel{T} <: AbstractChannel{T}
cond_take::Threads.Condition # waiting for data to become available
cond_wait::Threads.Condition # waiting for data to become maybe available
cond_put::Threads.Condition # waiting for a writeable slot
@atomic state::Symbol
excp::Union{Exception, Nothing} # exception to be thrown when state !== :open
data::Vector{T}
@atomic n_avail_items::Int # Available items for taking, can be read without lock
sz_max::Int # maximum size of channel
function Channel{T}(sz::Integer = 0) where T
if sz < 0
throw(ArgumentError("Channel size must be either 0, a positive integer or Inf"))
end
lock = ReentrantLock()
cond_put, cond_take = Threads.Condition(lock), Threads.Condition(lock)
cond_wait = (sz == 0 ? Threads.Condition(lock) : cond_take) # wait is distinct from take iff unbuffered
return new(cond_take, cond_wait, cond_put, :open, nothing, Vector{T}(), 0, sz)
end
end
function Channel{T}(func::Function, size=0; taskref=nothing, spawn=false) where T
chnl = Channel{T}(size)
task = Task(() > func(chnl))
task.sticky = !spawn
bind(chnl, task)
if spawn
schedule(task) # start it on (potentially) another thread
else
yield(task) # immediately start it, yielding the current thread
end
isa(taskref, Ref{Task}) && (taskref[] = task)
return chnl
end
Channel(func::Function, args...; kwargs...) = Channel{Any}(func, args...; kwargs...)
function put!(c::Channel{T}, v) where T
check_channel_state(c)
v = convert(T, v)
return isbuffered(c) ? put_buffered(c, v) : put_unbuffered(c, v)
end
(put_buffered
and put_unbuffered
are a big long so I'm not gonna include them, check it for yourself if you need to :) (I love having julia code being in julia))
The T
in Channel{T}
is the eltype of the iterator, I highly recommend specifying it.
Unlike the basic creating your own type + iterate
overloading, you aren't clustering your namespace with new types, you don't have to manually manage your state, and unlike the generator method, you can return a value and continue where you was before (you don't go to the top of your function each time)
For example:
function isprime(n::Integer)
n < 2 && return false
for i in 2:isqrt(n)
n % i == 0 && return false
end
true
end
second_prime_chnl = Channel{Int}() do c
n = 2
parity = 0
while true
if isprime(n)
if parity % 2 == 0
put!(c, n)
end
parity += 1 # number of representations is a power of two, so overflowing doesn't cause problems
end
n += 1
end
end
There I have a easy to make iterator which goes over prime numbers, and returns every 2 primes (starting with 2). Could we do it with by creating our own types, absolutely! But it would be a bit more involved.
For me, the kill feature is its ability to handle recursive functions, let tree::BinTree{Float64}
, to iterate over it, you can do:
iterator = Channel{Float64}() do c
preorder(::Nothing) = nothing
function preorder(t)
put!(c, t.value)
preorder(t.left)
preorder(t.right)
end
preorder(tree)
end
And unlike the similar example with Vector
and push!
, it's not iterating over everything before yielding values ! That means that you can use it on a structure which don't end.
Infinite data structures
We have already seen multiple "infinite" data structures, such as the "every 2 prime number" iterator.
Because infinite is too big to compute for "modern" computers (I know, it's shocking right), the trick is to not compute it (I know, it's shocking right), we'll use something more similar to lazy evaluation than eager one.
Something important is that the stack is limited, and julia don't have tailcall optimization (TCO), so if you're doing recursion in julia, it will still be using recursion when running. Which is problematic because function call takes some memory on the stack, and they are freed at the end of the function, so as you can guess, having it really deep can StackOverflow. And it doesn't allow you to provide your own heap allocated stack for function call frames (heap memory is also limited, but waaaay less than the stack and it'll take a long time before doing so (depending on the amount of heap you have))
Thus, forget about writing all your algorithms in a nice recursive way, despite it being one of the cleanest way (in my opinion) to handle infinite data structures.
"How to write recusive functions in a nonrecursive way" cool honestly make for another long blog, so I'll be quite brief.
For tailcall optimization (the best/easiest way most of the time), your goal is to transform your function into the form:
function recfun(a, b, c...)
if returncond(a, b, c...)
return f(a, b, c...)
end
return recfun(h(a, b, c...))
end
and then, you can rewrite recfun as:
function recfun(a, b, c...)
while !returncond(a, b, c...)
a, b, c... = h(a, b, c...)
end
return f(a, b, c...)
end
Having no operation done to the recfun
at the end is really important (otherwise it's not in a TCO style)
Now, that's something which you can apply quite easily, unlike true TCO where functions are jumping to other functions instead of themselves (here, if you are the one implementing them, use @goto
and @label
, if you're not the one implementing it, I'm sorry... But there's little you can do)
Another technique for functions which don't TCO easily is to simulate your stack trace as we've done in one of the iterators example, it's quite general, just not really nice to write.
You may have guessed it, we'll mixing infinite immutable vectors (otherly known as "functions on the naturals" :P) and iterators to make more complex structures.
Let's do some wrapper,
import Base: iterate, getindex, firstindex
struct NatIter{T}
f::T
end
getindex(f::NatIter, i::Integer) = f.f(i)
iterate(f::NatIter, n=0) = f[n], n+1
firstindex(::NatIter) = 0 # I use peano construction for natural numbers, 0 *is* a natural number, I'll die to that hill.
Have you ever wanted to do V[2:Inf]
when your V didn't have a fixed size ? Me neither, but now I want to.
Because type piracy is bad, I won't use the a:b:Β±Inf
but instead I'll use the a..b
, you don't know what it is ? Great, because it doesn't exist yet, we'll define it.
struct InfRange{T} # same T for type stability
start::T
step::T
end
a .. b = InfRange(a, b)
iterate(ir::InfRange, state=ir.start) = state, state+ir.step
getindex(ir::InfRange, n::Integer) = ir.start + n*ir.step
Ok, so, first thing we'll see is a sort of view
(gives you the data structure without copying anything), but on infinite indexable structures, and with InfRange
(it's possible to do a more general one, but ... wait, this is starting to get really long oo)
struct InfView{L, R}
V::L
it::R
end
firstindex(::InfView) = 1 # It'll be useful
getindex(V, it::InfRange) = InfView(V, it)
function getindex(V::InfView, i::Integer)
@assert i > 0
it = iterate(V)
@assert it !== nothing
for _ in 2:i
it = iterate(V, it[2])
@assert it !== nothing
end
it[1]
end
# state: (i, state_V, state_it) where i is where we are now and state_V and it are the ones you have to reach to (off by one error seems too easy...)
function iterate(V::InfView, state = (firstindex(V.V), iterate(V.V), iterate(V.it)))
new_it_state = iterate(V.it, state[3][2])
i_dist = state[3][1]  state[1]
@assert i_dist >= 0 # for simplicity reason, I will only consider increasing ranges
new_V_state = state[2] # I'll potentially mutate it later
for _ in 1:i_dist
new_V_state = iterate(V.V, new_V_state[2])
end
new_V_state[1], (state[3][1], new_V_state, new_it_state)
end
And tadaaa, stuffs works! (I don't even believe it myself oo)
exercice: cover the case where the indexing is not monotonus and make so that collect
works for strictly monotonous (infinite) indexing for finite vector, and the reverse too (infinite vector, finite indexing)
The iterators versions of map
, filter
are your friends (you can use map
for conditionals by the way).
exercice: create an infinite 1based vector with value P(n) if n is abundant and P(n) if it isn't, where P(n) is the nth Pell number, and then indexed it to every second prime number (I know that you know what a prime number is... but distinguishing it from gaussian primes ig) of the form 6n+1
, skipping mersenne primes. (we both agree that it doesn't particularly makes sense... but you know, finding cool stuffs can be hard :/)
Great and all, but there are other things which can be considered for "infinite" strucures than linear (indexable vectors) ones.
The choice of your algorithms may also change sometimes due to "it may not terminate" problems.
example:
You have an infinite (on each direction) 2Dgridlike graph structure dependent type M(Ο΅)
with Ο΅ > 0
, with the coordinates (y, x)
being connected to ((y, x+1), (y, x1), (y+1, x), (y1, x))
and with weight of the vertice between two coordinates pos1
and pos2
being f(pos1, pos2) >= Ο΅
for some function f
with f(a, b) = f(b, a)
.
Let's suppose you want to know the sum of the weights of the shortest path between two coordinate p_begin::Tuple{Integer, Integer}
and p_end::Tuple{Integer, Integer}
givien m::M(Ο΅)
(and let's suppose that you don't know the value of Ο΅
). (exercice: proove that there exist a shortest path, and the it has a finite length).
If you knew Ο΅
, you could go up (or down) as long as the y
components aren't the same, then right (or left), get the length, l
of this path, and then restrict the graph to a finite one whoose manhattan distance is less than l/Ο΅
, which would fall back to a finite case too easily.
Here, because it's not a finite graph, you cannot apply Dijkstra, and f(a, b) = 1
, you still couldn't apply DFS (as in, it wouldn't necessarely terminate, and anyway, it'd probably give the wrong restul) though you could use BFS, also because we don't know Ο΅
, we cannot have a better heuristic for A* than h(x)=0
which is meh...
Anyway, by virtue of "I'm way too dumb for this world to understand most path finding algorithms" here's mine, with a time complextity of $O(no)$ but it finishes in a finite amount of time for this problem oo (normally, tbf, floating point rounding may break it). Definition of M
:
struct M{T}
Ο΅::Float64
f::T
function M(Ο΅::Float64, f)
@assert Ο΅ > 0
new{typeof(f)}(Ο΅, f)
end
end
function (m::M)(pos1, pos2)
rs = m.f(pos1, pos2)
@assert rs >= 0
m.Ο΅ + rs
end
My algorithm: (I'm saying that it's mine because I don't think I've seen it, but I don't claim that it doesn't exist yet as it seems really simple)
using DataStructure # importing PriorityQueue
function distance(m::M, pos_begin::T, pos_end::T) where T <: Tuple{Integer, Integer}
pos_begin == pos_end && return 0.0
dist_to_end = Inf
dists = Dict{typeof(pos_begin), Float64}(pos_begin => 0.0)
pq = PriorityQueue(Base.Order.Forward, pos_begin => 0.0)
while !isempty(pq)
node, dist = popfirst!(pq)
y, x = node
dist >= dist_to_end && continue
for new_node in ((y, x+1), (y+1, x), (y1, x), (y, x1))
new_dist = dist + m(node, new_node)
if !haskey(dists, new_node)  dists[new_node] > new_dist
dists[new_node] = new_dist
if new_node == pos_end
dist_to_end = new_dist
else
push!(pq, new_node => new_dist)
end
end
end
end
dists[pos_end]
end
(end example)
Another problem with infinite datastructures is that you may need to use probabilitic algorithm as deterministic ones may not finish:
example:
(problem inspired by percolation theory, if I don't make sense, tell me, but also check this)
Suppose that you have an infinite graph G
where nodes can be placed along a 2D integer coordinate system, and with node having a probability p
to being connected with other node with coordinate being directly adjacent to it and not diagonal, and 0 otherwise.
The problem is: suppose you're given a coordinate (y, x)
, and that you make class equivalence in G
with relation "there exist a path", is the equivalence class in which (y, x)
is in infinite ?
Here, you can easily check if the class if bigger than 2, 10, 1000, 10^6, etc. elements, but not that it's infinite.
So for the positive case, the best you can do (the best I think I can do) is saying that it's probably infinite.
here's an algorithm for that:
function inf_class(G, pos, precision::Integer)
G.p < 0.5 && return false # have you done your percolation homework oo ?
# counting and flooding to check that it have length at least precision with precision being more than 1
found = 1
explored = Set([pos])
V = [pos]
while !isempty(V)
y, x = pop!(V)
for new_v in ((y, x+1), (y+1, x), (y1, x), (y, x1))
new_v in explored && continue
if connected(G, (y, x), new_v) # suppose that this function returns true if the two nodes are connected in G, false otherwise
push!(explored, new_v)
push!(V, new_v)
found += 1
found >= precision && return true
end
end
end
return false
end
(end example)
But remember, infinite datastructures aren't necessarely indexable:
example:
(this example is a total ripoff of the Collatz_conjecture, wanna fight about my lack of imagination ?)
Here, our graph (again!) will have for vertix $\mathbb{Z}$, and each node k
will be connected to the nodes 2k
, (k1)/3
if 6k4
, k/2
if 2k
and 3k+1
if 2k+1
.
We'll again interest ourselves in the equivalence classes of connectiveness. Expressed in another way, given the function step(n) = n % 2 == 0 ? div(n, 2) : 3n+1
, two integersk
and n
are in the same class if there exist two numbers, a
and b
such that step^a(k) = step^b(n)
(where step^w
is the function step repeated w
times (it's identity for w=0
))
Searching if they are infinite or not would be too easy (is_inf_class(n) = n != 0
)
So instead, we well ask the question "given k and n two integers, are they in the same class ?"
Problem: it is unknown if any number won't loop by always applying step
, so we can't wait that it loops, or we write an algorithm which is dependent on this conjecture in a way that instead of giving a wrong result sometimes (which may happen on probabilistic algs), will never terminate (and thus, programs using it may never terminate; which imo, is a way more severe behavior)!
Well, let's define the node type first:
import Base: getproperty
struct Node{T<:Integer}
value::T
end
step(n) = n % 2 == 0 ? div(n, 2) : 3n+one(n)
function getproperty(n::Node, prop::Symbol)
@assert prop in [:left, :right, :down, :value]
prop == :value ? getfield(n, :value) :
prop == :left ? Node(2n.value) :
prop == :right ? (k4 % 6 == 0 ? div(k1, 3) : throw("node field access not valid")) :
Node(step(n.value))
end
And a probabilistic approach to our original probem may be:
function same_tree(a::Node, b::Node)
a.value + b.value == 0 && return a.value == 0 # the graph containing 0 has only 0 # assume no overflow
a.value/b.value <= 0 && return false # different signs
a.value > 0 && return true # collatz conjecture, seems true everytime, so it's a handy shortcut, which will most likely give the right result
# a.value < 0 && b.value < 0
a.value < big(10)^200 && return a.value == b.value  missing # I personally have no guarentee that it'll not diverge to inf if below 10^100
# below 10^100, it'll either go to 1 or 5 or 17, those are the three known loops
while !(a.value in (1, 5, 17)) # assume no overflow
a = a.down
end
while !(b.value in (1, 5, 17)) # assume no overflow
b = b.down
end
a == b
end
this mixes probabilitic means, with some knowledge with the behavior of the collatz sequences.
Not the cleanest code ever, but I think it shows how domainspecific knowledge can tranform questions on infinite datastructures to more manageable ones, and that they aren't necessarely all about being indexable.
(end example)
Problem/fielddomain knowledge can be handy to write your algorithms.
I won't talk about stuffs like continuous infinite datastructures and dimension reduction because I simply don't know enough about that, ask your local machinelearning enthousiast and I'm sure they'll give your better answers than me :P
As you may have noticed, infinite datastructures may be necessary for some problems (such as seeking infinite structures), but they lose some pretty important properties of finite ones, so you need to be a bit more careful.
End speech
Here comes the end of this blog/tutorial/article (I'm unsure myself on how to call it).
I hope that you appreciated it, I expected it to be ~150 lines, done in 2h, weeeeelllllll, that hasn't been the case, it took me some weeks of in and out, around 20h of actual thinking... (last time, someone told me they didn't know how long writing stuffs like that take before saying that they disliked it, so... I'm answering in advance)
Feel free to leave comments, and to write stuffs yourself to educate others, it helps others, but I personally learned a lot doing those "technical" writings.
Top comments (2)
Loved this post! Highly unconventional and I didn't understand a lot of it (I'm too busy to try your exercises), but it was super fun to read. Your explanation of TCO and "simulating the stack trace" cleared up a lot of misconceptions I had about what TCO means.
Your path finding alg is a Bellman style dynamic programming right? You assign a minimum distance to every visited node, your reachable set expanding out...
I did not know how to use channels, @enum, @label and @goto in Julia so learned a LOT about iterations, and Julia in general!
Please, please write more posts!!! You have a unique perspective, thanks for sharing.
P.S. I really really wish writing Bellman style recursions was easier in Julia. I wrote the same thing in Mathematica with little work (slow as heck tho). I keep looking for a nicer way to do recursion in Julia, but your TCO "rewrite code" teaches me a lot.
Some of the issue is solved by enlarging the base case I guess using memorize so the expanding AST gets truncated at stored results. I guess this is a totally different blog post since we are no longer calling
itearate
at all, but I'm still not sure how much of the problem memoization solves, and how much remains. A new evaluation might still stack overflow! I'd love to hear your thoughts on the best way to do robust and general recursion on finite and terminating directed acyclic graphs in Julia: What I mean by DAG is that in the above TCO code, (h(a, b,c...) might return a list (multiple parent nodes) in the AST).Thanks, well, seems like I didn't explain a lot of stuffs correctly as a lot of persons said that they didn't undertood a lot (but in all fairness, this isn't meant to be read in one time and be fully understood at the end, more of a exposure of ideas to explore)
I don't really think it's a Bellman style dynamic programming, but I may understand this term wrongly; but from what I've read/understood, there is a need to decompose it in a recursive manner, which I think I'm doing it more in a distributed programming style than recursive one (where each points are an actor, and they send messages to specify their new distance and to "activate" them (asking them to also send messages) but in a controlled manner. (again I can maybe just not see the right recursion underneath). To explain how my algorithm work: it's doing approximations by reducing the lowerbound (by iterating through the nodes by the closest to the furthest) and then the higherbound (when a path have been found), and thus until it doesn't collapse to 1 number.
That's very nice! That's one reasons I decided to absolutely not care about how technical my julia code could be outside of what I wanted to say :) (the others being that it was simpler for me and allowed to speak about other stuffs)
If there's a need for it and that I have the knowledge to speak about that need, I'll definitely do another blog, I really enjoy making them, I think I'm gonna talk about either algorithm optimisation (making them faster, not algorithms for optimisation) or what kind of optimisation the compiler does.
Nice! I haven't dabbled a lot with Mathematica, so I'm not qualified to speak about it :P
Well, there's multiple ways you can enlarge the base case, setting them directly (probably what you meant) or having them update with functions calls (for example memoization), I'll first talk about what I think you didn't meant: memoization, it's really good for speedsup when it's applicable (it sometimes isn't) to reduce the growthrate of the number of traversals, but first call may still stack overflow (but others are less likely to do so), also, requires lots of memory, but you can cut down the memory usage by not memoizing everything.
For enlarging the base case "statically" (explicit the base case in your code), it's still there, it still may stack overflow, but would be less prone to it than small base case but also would take more time for small n (if your function is f(n), which is maybe not but I don't know how to express it correctly), not caching every number from 0 to k (where k is a natural number) is a way to reduce RAM usage + less checks for some values of n, something like caching linearly 1:c:k or expondentially (1, 2, 4, 8, 16, etc. for base 2) may be a solution depending on the growth rate of the recursion, so overall, it'd be highly context dependent, and I haven't solve recursion in the general case oo (some langage like Haskell does the stack on the heap, which prevent stack overflow, but it's a language thing and thus not really applicable in Julia (at least, not easily you can probably macro it and do analysis on the code to insert some label + goto, but it doesn't seem like an easy task)
So my take is that recursion is less robust than imperative code (at least in Julia) (sadly) so do your best to not write recursion...