Intro to Julia
#Introduction to Julia
Julia is a fast, dynamic, reproducible, composable, general and open source programming language that can be downloaded here. The full documentation of the current release (v1.7) is available here. This tutorial is designed to go over the basics of Julia, and to cover several topics on which Julia's speed is based.
- use/add/remove/develop package
- basic linear algebra
- just-in-time (JIT) compiler
- multiple dispatch
In Julia, packages are loaded via
using (like packages are loaded in python via
import). We typically load all the packages we want to use at the start of the file as
using LinearAlgebra, Test, BenchmarkTools
To add/remove a package, we can follow Pkg.jl and run
using Pkg Pkg.add("IterativeSolvers")
Note that we can also install a Julia package at a specific version, or from a GitHub repository
We can also do these on julia's interactive command-line REPL (read-eval-print loop) via
] add IterativeSolvers ] rm IterativeSolvers ] add https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl.git
The syntax of creating multidimensional arrays is a bit similar to MATLAB. We use square brackets
 to set up a multidimensional array, and use semi-colon
; to switch rows.
A = [1 2; 3 4]
Note that multidimensional arrays are stored in a column-major order, meaning that they fill up one column at a time. We can verify this by vectorizing
It is also quite simple to create a vector using square bracket, and Julia assumes 1D vector as column vector.
x = [1, 2]
There are built-in basic linear algebraic operations, such as
A * x
A \ x
#Just-in-time (JIT) compilation
Julia code is just-in-time compiled---i.e., every statement runs using compiled functions which are either compiled right before they are used, or cached compilations from before. This is different from e.g. C, which is compiled before executed. Due to the JIT compilation, the first run of a function/operation typically takes slightly longer time as it needs to compile first. An example is shown below
A = randn(Float32, 10^4, 10^4); b = randn(Float32, 10^4);
# first run @time A * b;
# second run @time A * b;
The first time of execution this "Float32 matrix and Float 32 vector product" takes longer time and more memory allocations as this is the first time to run such functionality in the current notebook. In order for robust estimate of the execution time and memory usage, we can use the
@benchmark macro in BenchmarkTools.jl package.
@benchmark A * b
The core design to make Julia fast is type-stability through specialization via multiple-dispatch. A single function in Julia can take different types of input and compile the function in specialized ways. This "specialization" brings huge performance gains to Julia. Let's first go over a simple example, namely scalar multiplication, and use the macro
@which to check the method being used
@which 1 * 2
In this example, we want to compute the multiplication of 2 integers. As we can see, the underlying method is running the multiplication of integers.
@which 1.0 * 2.0
If we multiply 2 floating point numbers, we can see that a different method is used, which is designed specifically for the type
Float64 in Julia).
@which 1.0 * 2
From here, we can see that Julia can't find a specialized way to compute
1.0 * 2 according to the type of inputs. In the end, it chooses to use the very generic
Number to describe the type of
2 and to perform the multiplication. Note that this is one of the cases that we want to avoid during large-scale computations. We usually want to make sure that the type of input and output is the same in our functions.
This example demonstrates that the same function (multiplication
* in this case) is executed via different compiled code blocks that are specialized for different kinds of input. This feature brings huge performance gain and potentially makes Julia to run almost as fast as C/Fortran (which are strictly typed). With that being said, we should also take advantage of this type-stability when implementing functions---i.e., dispatch a function to multiple types of input.
Let's show a simple example: we can define a
foo function that works for different kinds of input. Let's first build it to take
foo(x::String, y::String) = println("My inputs x and y are both strings!")
Notice that now
foo is a generic function with 1 method (a.k.a. taking 2
String and output a sentence). Now the
foo function won't work for numbers.
foo(3, 4) # doesn't work here
foo to work on integer (Int) inputs, let's add
::Int onto our input arguments when we declare
foo(x::Int, y::Int) = println("My inputs x and y are both integers!")
foo(3, 4) # works now
foo(::Int, ::Int) works as well. Notice that it says
foo (generic function with 2 methods). This means
foo is a general function but contains 2 methods: the first one works for 2 strings and the second one works for 2 numbers. This means that the functionality of
foo on 2 strings isn't overwritten when you define how to run
foo on another type of input(s). Instead, we just added an additional method to the generic function
Then let's show an explanatory example to discuss how we should write code in Julia. Suppose we want to build a function to take the 2 norm of a scalar/vector/matrix (while we choose to ignore the existing norm functionality). A naive way to implement this would be
function my2norm(x) if isa(x, Number) return abs(x) elseif isa(x, Vector) return sqrt(sum(x.^2)) elseif isa(x, Matrix) return return maximum(svdvals(x)) else println("my function is not defined for this kind of input") return -1 end end
In the above code block, we defined a function
my2norm that uses nested if-else to find the type of
x and perform the corresponding operation (absolute value for scalar, euclidean norm for vector, and largest singular value for matrix).
However, there is a smarter and cleaner way to write this function, which provides exactly the same functionality.
mysmart2norm(x::Number) = abs(x) mysmart2norm(x::Vector) = sqrt(sum(x.^2)) mysmart2norm(x::Matrix) = maximum(svdvals(x)) mysmart2norm(x) = begin println("my function is not defined for this kind of input"); return -1; end
x = randn(10^2, 10^2); @test my2norm(x) == mysmart2norm(x)
mysmart2norm will lead to cleaner and more maintainable code: when a new type comes to our sight (e.g. an abstract type defined by user), then we don't need to dig into the very long
my2norm to add another set of
ifelse. A simple
mysmart2norm(x::NewType) will do the work. For example, check here to see how JUDI.jl defines
judiMultiSourceVector, an abstract type defined for seismic data.
The multiple dispatch feature makes Julia easy to use like Python, and fast to run like C. In order for Julia compiler to produce highly efficient code and show ultimate performance, we should write type-stable code---i.e., given the source code and the concrete type of all arguments, the compiler is able to infer the concrete type of every variable and expression within the method.
Let's consider the following example.
funcN takes a sequence of
... is the splatting operation) and it aims to compute the product of this entire array.
function funcN(a::Number...) c=copy(a) L=length(a) for l=2:L c=c*a[l] end return c end
Now we want to compute the product of the sequence
1, 1.5, 2, 2.5, 3. We have 2 following options:
x1 = (1, 1.5, 2, 2.5, 3) @benchmark funcN(x1...)
x2 = (1.0, 1.5, 2.0, 2.5, 3.0) @benchmark funcN(x2...)
Interesting! Why do they have different runtimes? Julia provides a macro
@code_warntype to check whether the code is type-stable. Let's see.
A red line in the output indicates type-instability in the code. So what happened? In the function, we first read the variable
c as the first entry of the sequence. In
x1, the first entry is an
c will get the type of
Int64 as well initially. However,
c later multiplies with
Float64 entries (namely
2.5). Therefore, it is forced to change its type of
When we are very sure of the input and output types of a function we expect (like in this case we know everything ends up to be
Float64), we should keep in mind to write type-stable code so that Julia doesn't find a hard time to struggle on the type of the variables.