Tutorials # Intro to Julia

Ziyi (Francis) Yin

# #Introduction to Julia

Julia is a fast, dynamic, reproducible, composable, general and open source programming language that can be downloaded here. The full documentation of the current release (v1.7) is available here. This tutorial is designed to go over the basics of Julia, and to cover several topics on which Julia's speed is based.

2. basic linear algebra
3. just-in-time (JIT) compiler
4. multiple dispatch
5. type-stability

## #Package management

In Julia, packages are loaded via using (like packages are loaded in python via import). We typically load all the packages we want to use at the start of the file as

using LinearAlgebra, Test, BenchmarkTools

To add/remove a package, we can follow Pkg.jl and run

using Pkg
Pkg.add("IterativeSolvers")
Pkg.rm("IterativeSolvers")

Note that we can also install a Julia package at a specific version, or from a GitHub repository

Pkg.add(name="IterativeSolvers", version="0.9.2")
Pkg.add(url="https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git", rev="master")

We can also do these on julia's interactive command-line REPL (read-eval-print loop) via

] add IterativeSolvers
] rm IterativeSolvers
] add https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git

etc.

## #Linear algebra

The syntax of creating multidimensional arrays is a bit similar to MATLAB. We use square brackets [] to set up a multidimensional array, and use semi-colon ; to switch rows.

A = [1 2; 3 4]

Note that multidimensional arrays are stored in a column-major order, meaning that they fill up one column at a time. We can verify this by vectorizing A via vec or [:]

A[:]
vec(A)

It is also quite simple to create a vector using square bracket, and Julia assumes 1D vector as column vector.

x = [1, 2]

There are built-in basic linear algebraic operations, such as

inv(A)
A * x
A \ x
det(A)

## #Just-in-time (JIT) compilation

Julia code is just-in-time compiled---i.e., every statement runs using compiled functions which are either compiled right before they are used, or cached compilations from before. This is different from e.g. C, which is compiled before executed. Due to the JIT compilation, the first run of a function/operation typically takes slightly longer time as it needs to compile first. An example is shown below

A = randn(Float32, 10^4, 10^4);
b = randn(Float32, 10^4);
# first run
@time A * b;
# second run
@time A * b;

The first time of execution this "Float32 matrix and Float 32 vector product" takes longer time and more memory allocations as this is the first time to run such functionality in the current notebook. In order for robust estimate of the execution time and memory usage, we can use the @benchmark macro in BenchmarkTools.jl package.

@benchmark A * b

## #Multiple dispatch

The core design to make Julia fast is type-stability through specialization via multiple-dispatch. A single function in Julia can take different types of input and compile the function in specialized ways. This "specialization" brings huge performance gains to Julia. Let's first go over a simple example, namely scalar multiplication, and use the macro @which to check the method being used

@which 1 * 2

In this example, we want to compute the multiplication of 2 integers. As we can see, the underlying method is running the multiplication of integers.

@which 1.0 * 2.0

If we multiply 2 floating point numbers, we can see that a different method is used, which is designed specifically for the type double (Float64 in Julia).

@which 1.0 * 2

From here, we can see that Julia can't find a specialized way to compute 1.0 * 2 according to the type of inputs. In the end, it chooses to use the very generic Number to describe the type of 1.0 and 2 and to perform the multiplication. Note that this is one of the cases that we want to avoid during large-scale computations. We usually want to make sure that the type of input and output is the same in our functions.

This example demonstrates that the same function (multiplication * in this case) is executed via different compiled code blocks that are specialized for different kinds of input. This feature brings huge performance gain and potentially makes Julia to run almost as fast as C/Fortran (which are strictly typed). With that being said, we should also take advantage of this type-stability when implementing functions---i.e., dispatch a function to multiple types of input.

Let's show a simple example: we can define a foo function that works for different kinds of input. Let's first build it to take String inputs.

foo(x::String, y::String) = println("My inputs x and y are both strings!")
foo("hello", "hi!")

Notice that now foo is a generic function with 1 method (a.k.a. taking 2 String and output a sentence). Now the foo function won't work for numbers.

foo(3, 4) # doesn't work here

To get foo to work on integer (Int) inputs, let's add ::Int onto our input arguments when we declare foo.

foo(x::Int, y::Int) = println("My inputs x and y are both integers!")
foo(3, 4) # works now

Now foo(::Int, ::Int) works as well. Notice that it says foo (generic function with 2 methods). This means foo is a general function but contains 2 methods: the first one works for 2 strings and the second one works for 2 numbers. This means that the functionality of foo on 2 strings isn't overwritten when you define how to run foo on another type of input(s). Instead, we just added an additional method to the generic function foo.

foo("hello", "hi!")

Then let's show an explanatory example to discuss how we should write code in Julia. Suppose we want to build a function to take the 2 norm of a scalar/vector/matrix (while we choose to ignore the existing norm functionality). A naive way to implement this would be

function my2norm(x)
if isa(x, Number)
return abs(x)
elseif isa(x, Vector)
return sqrt(sum(x.^2))
elseif isa(x, Matrix)
return return maximum(svdvals(x))
else
println("my function is not defined for this kind of input")
return -1
end
end

In the above code block, we defined a function my2norm that uses nested if-else to find the type of x and perform the corresponding operation (absolute value for scalar, euclidean norm for vector, and largest singular value for matrix).

However, there is a smarter and cleaner way to write this function, which provides exactly the same functionality.

mysmart2norm(x::Number) = abs(x)
mysmart2norm(x::Vector) = sqrt(sum(x.^2))
mysmart2norm(x::Matrix) = maximum(svdvals(x))
mysmart2norm(x) = begin println("my function is not defined for this kind of input"); return -1; end
x = randn(10^2, 10^2);
@test my2norm(x) == mysmart2norm(x)

This mysmart2norm will lead to cleaner and more maintainable code: when a new type comes to our sight (e.g. an abstract type defined by user), then we don't need to dig into the very long my2norm to add another set of ifelse. A simple mysmart2norm(x::NewType) will do the work. For example, check here to see how JUDI.jl defines norm for judiMultiSourceVector, an abstract type defined for seismic data.

## #Type stability

The multiple dispatch feature makes Julia easy to use like Python, and fast to run like C. In order for Julia compiler to produce highly efficient code and show ultimate performance, we should write type-stable code---i.e., given the source code and the concrete type of all arguments, the compiler is able to infer the concrete type of every variable and expression within the method.

Let's consider the following example. funcN takes a sequence of Number (... is the splatting operation) and it aims to compute the product of this entire array.

function funcN(a::Number...)
c=copy(a)
L=length(a)
for l=2:L
c=c*a[l]
end
return c
end

Now we want to compute the product of the sequence 1, 1.5, 2, 2.5, 3. We have 2 following options:

x1 = (1, 1.5, 2, 2.5, 3)
@benchmark funcN(x1...)
x2 = (1.0, 1.5, 2.0, 2.5, 3.0)
@benchmark funcN(x2...)

Interesting! Why do they have different runtimes? Julia provides a macro @code_warntype to check whether the code is type-stable. Let's see.

@code_warntype funcN(x1...)
@code_warntype funcN(x2...)

A red line in the output indicates type-instability in the code. So what happened? In the function, we first read the variable c as the first entry of the sequence. In x1, the first entry is an Int64, so c will get the type of Int64 as well initially. However, c later multiplies with Float64 entries (namely 1.5 and 2.5). Therefore, it is forced to change its type of Float64.

When we are very sure of the input and output types of a function we expect (like in this case we know everything ends up to be Float64), we should keep in mind to write type-stable code so that Julia doesn't find a hard time to struggle on the type of the variables.