Why you might avoid `deepcopy` in Julia

Why use `deepcopy`?

In Julia, copy is a function which creates a shallow copy. For example:

julia> a = [1] # vector with one element, namely 1
1-element Vector{Int64}:
 1

julia> b = [a] # vector with one element, `a`
1-element Vector{Vector{Int64}}:
 [1]

julia> b2 = copy(b) # new vector, also with one element which is `a`
1-element Vector{Vector{Int64}}:
 [1]

julia> push!(a, 2) # mutate `a` so it contains 1 and 2
2-element Vector{Int64}:
 1
 2

julia> b # since `b` contains `a`, we can see its (nested) contents have changed
1-element Vector{Vector{Int64}}:
 [1, 2]

julia> b2 # same for `b2`!
1-element Vector{Vector{Int64}}:
 [1, 2]

Since copy is shallow, b2 still contains the same vector a (whose contents we modified to be [1,2]), just like b, even though they are independent vectors which do not share memory:

julia> push!(b, [10]) # mutate `b`
2-element Vector{Vector{Int64}}:
 [1, 2]
 [10]

julia> b2 # not mutated
1-element Vector{Vector{Int64}}:
 [1, 2]

In contrast, deepcopy is a function which recursively copies objects:

julia> a = [1] # vector with one element, namely 1
1-element Vector{Int64}:
 1

julia> b = [a] # vector with one element, `a`
1-element Vector{Vector{Int64}}:
 [1]

julia> b2 = deepcopy(b) # new vector, with new contents
1-element Vector{Vector{Int64}}:
 [1]

julia> push!(a, 2) # mutate `a` so it contains 1 and 2
2-element Vector{Int64}:
 1
 2

julia> b
1-element Vector{Vector{Int64}}:
 [1, 2]

julia> b2 # contents are unchanged
1-element Vector{Vector{Int64}}:
 [1]

We can see that b2 still just contains [1].

It's easy to see why deepcopy might be appealing: it could be surprising that modifying a affects b2.

Note also that deepcopy also has some other nice properties. For example, note here that since b contains a twice, modifying a has the following effect on b:

julia> a = [1]
1-element Vector{Int64}:
 1

julia> b = [a, a]
2-element Vector{Vector{Int64}}:
 [1]
 [1]

julia> push!(a, 2)
2-element Vector{Int64}:
 1
 2

julia> b
2-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]

deepcopy preserves this internal structure. Continuing the example:

julia> b2 = deepcopy(b)
2-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]

julia> a2 = b2[1]
2-element Vector{Int64}:
 1
 2

julia> push!(a2, 3)
3-element Vector{Int64}:
 1
 2
 3

julia> b2
2-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [1, 2, 3]

Pretty nice! Semantically, deepcopy should be the same as composing deserialize and serialize.

Why not use `deepcopy`?

deepcopy is reaching into the internals of the object, rather than relying on the API of the object (namely, its method for copy).

It's easy to construct cases in which this is semantically incorrect. For example, lets say we are constructing our own vector type which stores its memory elsewhere, and stores a token to use to lookup the memory. Here is a quick implementation:

# mutable so each instance has its own identity
mutable struct Token end

const STORAGE = Dict{Token, Vector{Float64}}()

struct MyVectorType <: AbstractVector{Float64}
    token::Token
end
# construction from a `vector`
function MyVectorType(v::Vector)
    token = Token()
    STORAGE[token] = v
    return MyVectorType(token)
end
Base.getindex(m::MyVectorType, i::Int) = STORAGE[m.token][i]
Base.setindex!(m::MyVectorType, v, i::Int) = STORAGE[m.token][i] = v
Base.size(m::MyVectorType) = size(STORAGE[m.token])
function Base.copy(m::MyVectorType)
    return MyVectorType(copy(STORAGE[m.token]))
end

For example:

julia> v = MyVectorType(rand(2))
2-element MyVectorType:
 0.49321258978106763
 0.6022070713363459

Then copy works as expected, since we defined a method for it:

julia> v2 = copy(v)
2-element MyVectorType:
 0.49321258978106763
 0.6022070713363459

julia> v[1] = 2.0
2.0

julia> v
2-element MyVectorType:
 2.0
 0.6022070713363459

julia> v2
2-element MyVectorType:
 0.49321258978106763
 0.6022070713363459

But deepcopy fails:

julia> deepcopy(v)
Error showing value of type MyVectorType:
ERROR: KeyError: key Token() not found
Stacktrace:
  [1] getindex
    @ ./dict.jl:477 [inlined]
  [2] length
    @ ./REPL[6]:1 [inlined]

It has constructed new Token instance which does not have a corresponding entry in STORAGE.

Using deepcopy, we have made assumptions about the implementation details of how MyVectorType works and constructed an invalid instance!

One can run into similar problems when the object contains references to memory allocated in another language, see e.g. JuMP's rationale for disabling deepcopy on its models.

Revisiting reasons to use `deepcopy`

Sometimes, feeling a need to use deepcopy is actually a hint that something else is wrong. I think the main one is missing copy methods, but I believe there might be several other reasons; if you think of one, let me know and I might add it here.

"Missing" `copy` method

I think one of the most common reasons to reach for deepcopy is that one of the objects you are working with is missing a copy method, or it is not copying quite what it should¹.

While copy is defined to be a "shallow" copy, it is not always totally clear exactly how deep or shallow that should be. Adding semantics around whether the object represents a nested or flat datastructure can clarify this.

For example, a DataFrame is semantically a 2D object, which is implemented with a vector-of-vectors similar to our b in the first example. DataFrame defines a copy method which by default does copy the nested inner vectors, but not any of their contents. I believe this is semantically correct, because the vector-of-vectors construction is simply an implementation detail of DataFrame, and the object itself is a flat 2D object (as shown by e.g. size), and therefore copy should not behave like it is a vector-of-vectors.

Sometimes deepcopy usage hints at both a "missing" struct definition and copy method. For example, when using nested dictionaries to store configuration state, one might start with a "default configuration", then deepcopy it to pass to a user to modify. This seems practical and totally fine, but it might be nicer to wrap up the configuration in a struct and define a copy method for it.

Should you use `deepcopy`?

Maybe! It depends on the situation. To me it is often a sign that some abstraction is not working as intended, or a copy method is missing somewhere. But it is a good workaround, and sometimes a useful shortcut in some scenarios like writing tests, and there are situations in which it seems like the correct tool for the job.

Note: in this case, using deepcopy may be the best way to stay unblocked in this situation! I don't think there's anything wrong with using it as a workaround, but it may signal there is an upstream issue somewhere to be filed or fixed. ↩