Tutorials
Juyter Logo

Intro to Julia

Ziyi (Francis) Yin

#Introduction to Julia

Julia is a fast, dynamic, reproducible, composable, general and open source programming language that can be downloaded here. The full documentation of the current release (v1.7) is available here. This tutorial is designed to go over the basics of Julia, and to cover several topics on which Julia's speed is based.

  1. use/add/remove/develop package
  2. basic linear algebra
  3. just-in-time (JIT) compiler
  4. multiple dispatch
  5. type-stability

#Package management

In Julia, packages are loaded via using (like packages are loaded in python via import). We typically load all the packages we want to use at the start of the file as

using LinearAlgebra, Test, BenchmarkTools

To add/remove a package, we can follow Pkg.jl and run

using Pkg
Pkg.add("IterativeSolvers")
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
    Updating registry at `~/.julia/registries/SLIMregistryJL`
    Updating git-repo `https://github.com/slimgroup/SLIMregistryJL.git`
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [42fd0dbc] ~ IterativeSolvers v0.9.2 `https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git#master` ⇒ v0.9.2
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [42fd0dbc] ~ IterativeSolvers v0.9.2 `https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git#master` ⇒ v0.9.2
Precompiling project...
IterativeSolvers
JOLI
GenSPGL
JUDI
SetIntersectionProjection
MECurvelets
ImageGather
InvertibleNetworks
  8 dependencies successfully precompiled in 16 seconds (356 already precompiled)
Pkg.rm("IterativeSolvers")
    Updating `~/.julia/environments/v1.7/Project.toml`
  [42fd0dbc] - IterativeSolvers v0.9.2
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`

Note that we can also install a Julia package at a specific version, or from a GitHub repository

Pkg.add(name="IterativeSolvers", version="0.9.2")
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [42fd0dbc] + IterativeSolvers v0.9.2
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Pkg.add(url="https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git", rev="master")
    Updating git-repo `https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git`
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [42fd0dbc] ~ IterativeSolvers v0.9.2 ⇒ v0.9.2 `https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git#master`
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [42fd0dbc] ~ IterativeSolvers v0.9.2 ⇒ v0.9.2 `https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git#master`
Precompiling project...
IterativeSolvers
JOLI
GenSPGL
JUDI
SetIntersectionProjection
ImageGather
MECurvelets
InvertibleNetworks
  8 dependencies successfully precompiled in 16 seconds (356 already precompiled)

We can also do these on julia's interactive command-line REPL (read-eval-print loop) via

] add IterativeSolvers
] rm IterativeSolvers
] add https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git

etc.

#Linear algebra

The syntax of creating multidimensional arrays is a bit similar to MATLAB. We use square brackets [] to set up a multidimensional array, and use semi-colon ; to switch rows.

A = [1 2; 3 4]
2×2 Matrix{Int64}: 1 2 3 4

Note that multidimensional arrays are stored in a column-major order, meaning that they fill up one column at a time. We can verify this by vectorizing A via vec or [:]

A[:]
4-element Vector{Int64}: 1 3 2 4
vec(A)
4-element Vector{Int64}: 1 3 2 4

It is also quite simple to create a vector using square bracket, and Julia assumes 1D vector as column vector.

x = [1, 2]
2-element Vector{Int64}: 1 2

There are built-in basic linear algebraic operations, such as

inv(A)
2×2 Matrix{Float64}: -2.0 1.0 1.5 -0.5
A * x
2-element Vector{Int64}: 5 11
A \ x
2-element Vector{Float64}: 0.0 0.5
det(A)
-2.0

#Just-in-time (JIT) compilation

Julia code is just-in-time compiled---i.e., every statement runs using compiled functions which are either compiled right before they are used, or cached compilations from before. This is different from e.g. C, which is compiled before executed. Due to the JIT compilation, the first run of a function/operation typically takes slightly longer time as it needs to compile first. An example is shown below

A = randn(Float32, 10^4, 10^4);
b = randn(Float32, 10^4);
# first run
@time A * b;
  0.080792 seconds (290.82 k allocations: 16.171 MiB, 81.98% compilation time)
# second run
@time A * b;
  0.017111 seconds (2 allocations: 39.109 KiB)

The first time of execution this "Float32 matrix and Float 32 vector product" takes longer time and more memory allocations as this is the first time to run such functionality in the current notebook. In order for robust estimate of the execution time and memory usage, we can use the @benchmark macro in BenchmarkTools.jl package.

@benchmark A * b
BenchmarkTools.Trial: 205 samples with 1 evaluation. Range (minmax): 16.436 ms 27.088 ms GC (min … max): 0.00% … 0.00% Time (median): 24.335 ms GC (median): 0.00% Time (mean ± σ): 24.451 ms ± 909.266 μs GC (mean ± σ): 0.00% ± 0.00% ▃ 16.4 ms Histogram: frequency by time 26 ms < Memory estimate: 39.11 KiB, allocs estimate: 2.

#Multiple dispatch

The core design to make Julia fast is type-stability through specialization via multiple-dispatch. A single function in Julia can take different types of input and compile the function in specialized ways. This "specialization" brings huge performance gains to Julia. Let's first go over a simple example, namely scalar multiplication, and use the macro @which to check the method being used

@which 1 * 2

In this example, we want to compute the multiplication of 2 integers. As we can see, the underlying method is running the multiplication of integers.

@which 1.0 * 2.0

If we multiply 2 floating point numbers, we can see that a different method is used, which is designed specifically for the type double (Float64 in Julia).

@which 1.0 * 2

From here, we can see that Julia can't find a specialized way to compute 1.0 * 2 according to the type of inputs. In the end, it chooses to use the very generic Number to describe the type of 1.0 and 2 and to perform the multiplication. Note that this is one of the cases that we want to avoid during large-scale computations. We usually want to make sure that the type of input and output is the same in our functions.

This example demonstrates that the same function (multiplication * in this case) is executed via different compiled code blocks that are specialized for different kinds of input. This feature brings huge performance gain and potentially makes Julia to run almost as fast as C/Fortran (which are strictly typed). With that being said, we should also take advantage of this type-stability when implementing functions---i.e., dispatch a function to multiple types of input.

Let's show a simple example: we can define a foo function that works for different kinds of input. Let's first build it to take String inputs.

foo(x::String, y::String) = println("My inputs x and y are both strings!")
foo (generic function with 1 method)
foo("hello", "hi!")
My inputs x and y are both strings!

Notice that now foo is a generic function with 1 method (a.k.a. taking 2 String and output a sentence). Now the foo function won't work for numbers.

foo(3, 4) # doesn't work here
MethodError: no method matching foo(::Int64, ::Int64)

Stacktrace:
 [1] top-level scope
   @ In[23]:1
 [2] eval
   @ ./boot.jl:373 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

To get foo to work on integer (Int) inputs, let's add ::Int onto our input arguments when we declare foo.

foo(x::Int, y::Int) = println("My inputs x and y are both integers!")
foo (generic function with 2 methods)
foo(3, 4) # works now
My inputs x and y are both integers!

Now foo(::Int, ::Int) works as well. Notice that it says foo (generic function with 2 methods). This means foo is a general function but contains 2 methods: the first one works for 2 strings and the second one works for 2 numbers. This means that the functionality of foo on 2 strings isn't overwritten when you define how to run foo on another type of input(s). Instead, we just added an additional method to the generic function foo.

foo("hello", "hi!")
My inputs x and y are both strings!

Then let's show an explanatory example to discuss how we should write code in Julia. Suppose we want to build a function to take the 2 norm of a scalar/vector/matrix (while we choose to ignore the existing norm functionality). A naive way to implement this would be

function my2norm(x)
    if isa(x, Number)
        return abs(x)
    elseif isa(x, Vector)
        return sqrt(sum(x.^2))
    elseif isa(x, Matrix)
        return return maximum(svdvals(x))
    else
        println("my function is not defined for this kind of input")
        return -1
    end
end
my2norm (generic function with 1 method)

In the above code block, we defined a function my2norm that uses nested if-else to find the type of x and perform the corresponding operation (absolute value for scalar, euclidean norm for vector, and largest singular value for matrix).

However, there is a smarter and cleaner way to write this function, which provides exactly the same functionality.

mysmart2norm(x::Number) = abs(x)
mysmart2norm(x::Vector) = sqrt(sum(x.^2))
mysmart2norm(x::Matrix) = maximum(svdvals(x))
mysmart2norm(x) = begin println("my function is not defined for this kind of input"); return -1; end
mysmart2norm (generic function with 4 methods)
x = randn(10^2, 10^2);
@test my2norm(x) == mysmart2norm(x)
Test Passed Expression: my2norm(x) == mysmart2norm(x) Evaluated: 19.768341325704956 == 19.768341325704956

This mysmart2norm will lead to cleaner and more maintainable code: when a new type comes to our sight (e.g. an abstract type defined by user), then we don't need to dig into the very long my2norm to add another set of ifelse. A simple mysmart2norm(x::NewType) will do the work. For example, check here to see how JUDI.jl defines norm for judiMultiSourceVector, an abstract type defined for seismic data.

#Type stability

The multiple dispatch feature makes Julia easy to use like Python, and fast to run like C. In order for Julia compiler to produce highly efficient code and show ultimate performance, we should write type-stable code---i.e., given the source code and the concrete type of all arguments, the compiler is able to infer the concrete type of every variable and expression within the method.

Let's consider the following example. funcN takes a sequence of Number (... is the splatting operation) and it aims to compute the product of this entire array.

function funcN(a::Number...)
    c=copy(a[1])
    L=length(a)
    for l=2:L
        c=c*a[l]
    end
    return c
end
funcN (generic function with 1 method)

Now we want to compute the product of the sequence 1, 1.5, 2, 2.5, 3. We have 2 following options:

x1 = (1, 1.5, 2, 2.5, 3)
@benchmark funcN(x1...)
BenchmarkTools.Trial: 10000 samples with 413 evaluations. Range (minmax): 234.228 ns 50.863 μs GC (min … max): 0.00% … 99.35% Time (median): 259.131 ns GC (median): 0.00% Time (mean ± σ): 272.421 ns ± 507.021 ns GC (mean ± σ): 1.86% ± 0.99% ▃ 234 ns Histogram: frequency by time 390 ns < Memory estimate: 96 bytes, allocs estimate: 6.
x2 = (1.0, 1.5, 2.0, 2.5, 3.0)
@benchmark funcN(x2...)
BenchmarkTools.Trial: 10000 samples with 803 evaluations. Range (minmax): 146.661 ns 29.191 μs GC (min … max): 0.00% … 99.30% Time (median): 158.966 ns GC (median): 0.00% Time (mean ± σ): 171.676 ns ± 495.770 ns GC (mean ± σ): 4.99% ± 1.72% ▃ 147 ns Histogram: frequency by time 259 ns < Memory estimate: 96 bytes, allocs estimate: 6.

Interesting! Why do they have different runtimes? Julia provides a macro @code_warntype to check whether the code is type-stable. Let's see.

@code_warntype funcN(x1...)
MethodInstance for funcN(::Int64, ::Float64, ::Int64, ::Float64, ::Int64)
  from funcN(a::Number...) in Main at In[30]:1
Arguments
  #self#::Core.Const(funcN)
  a::Tuple{Int64, Float64, Int64, Float64, Int64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  L::Int64
  c::Union{Float64, Int64}
  l::Int64
Body::Union{Float64, Int64}
1 ─ %1  = Base.getindex(a, 1)::Int64
       (c = Main.copy(%1))
       (L = Main.length(a))
 %4  = (2:L::Core.Const(5))::Core.Const(2:5)
       (@_3 = Base.iterate(%4))
 %6  = (@_3::Core.Const((2, 2)) === nothing)::Core.Const(false)
 %7  = Base.not_int(%6)::Core.Const(true)
└──       goto #4 if not %7
2 ┄ %9  = @_3::Tuple{Int64, Int64}
       (l = Core.getfield(%9, 1))
 %11 = Core.getfield(%9, 2)::Int64
 %12 = c::Union{Float64, Int64}
 %13 = Base.getindex(a, l)::Union{Float64, Int64}
       (c = %12 * %13)
       (@_3 = Base.iterate(%4, %11))
 %16 = (@_3 === nothing)::Bool
 %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄       return c

@code_warntype funcN(x2...)
MethodInstance for funcN(::Float64, ::Float64, ::Float64, ::Float64, ::Float64)
  from funcN(a::Number...) in Main at In[30]:1
Arguments
  #self#::Core.Const(funcN)
  a::NTuple{5, Float64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  L::Int64
  c::Float64
  l::Int64
Body::Float64
1 ─ %1  = Base.getindex(a, 1)::Float64
       (c = Main.copy(%1))
       (L = Main.length(a))
 %4  = (2:L::Core.Const(5))::Core.Const(2:5)
       (@_3 = Base.iterate(%4))
 %6  = (@_3::Core.Const((2, 2)) === nothing)::Core.Const(false)
 %7  = Base.not_int(%6)::Core.Const(true)
└──       goto #4 if not %7
2 ┄ %9  = @_3::Tuple{Int64, Int64}
       (l = Core.getfield(%9, 1))
 %11 = Core.getfield(%9, 2)::Int64
 %12 = c::Float64
 %13 = Base.getindex(a, l)::Float64
       (c = %12 * %13)
       (@_3 = Base.iterate(%4, %11))
 %16 = (@_3 === nothing)::Bool
 %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄       return c

A red line in the output indicates type-instability in the code. So what happened? In the function, we first read the variable c as the first entry of the sequence. In x1, the first entry is an Int64, so c will get the type of Int64 as well initially. However, c later multiplies with Float64 entries (namely 1.5 and 2.5). Therefore, it is forced to change its type of Float64.

When we are very sure of the input and output types of a function we expect (like in this case we know everything ends up to be Float64), we should keep in mind to write type-stable code so that Julia doesn't find a hard time to struggle on the type of the variables.