Rust for Java developers, an introduction
Get ready to get started…
So you’ve heard about Rust and are a Java developer? You want to give Rust a try? This article will try to get you started, with the basics… The basic differences between Java and Rust and try to avoid you falling into some of pitfalls, and fight the borrow checker too much. The intent is to hopefully help you get into the mindset of a Rust developer, coming from Java.
We won’t cover here how to install the toolchain et al… that part has been covered more than enough. The guide on their website is probably all that you’ll need anyways.
So let’s dive into it and look at how a Java program is different from a Rust
program. What better example is there to use than Hello, World!
? We’ll build
on this example in this introduction to cover a few topics that will lay some of
the foundation to get you started.
Running our first program
When we say Java we mean it to be a few things. The language itself of course;
the JDK classes; the JDK tools; and the Java Virtual Machine. All of which we
need to use to print Hello, World!
onto our computer’s screen. Here’s our
HelloWorld.java
program’s source:
class HelloWorld {
public static void main(String... args) {
System.out.println("Hello, World!");
}
}
Where we declare a class
, named HelloWorld
. Itself declares a single static
method, which like in many C-style languages is conventionally named main
.
Finally that method accepts an array of String
instances representing the
arguments passed to the program which will be populated by the runtime at
invocation time.
The method’s body is fairly straight forward to any Java developer: it makes a
lookup to the static field out
on the class System
and invokes the method
.println()
on that object, passing it the string Hello, World!
.
Let’s try to run this, we’re using Java 19 here, so we can just invoke java
passing in the source file without compiling it first to the Java Virtual
Machine:
$ java --version
openjdk 19.0.1 2022-10-18
OpenJDK Runtime Environment (build 19.0.1+10-21)
OpenJDK 64-Bit Server VM (build 19.0.1+10-21, mixed mode, sharing)
$ java HelloWorld.java
Hello, World!
All very straightforward, there really is no magic there. Or… is there? Let’s
see how achieving the same result in Rust is slightly different. First, we need
to write the equivalent program in Rust. Here is our little hello.rs
:
fn main() {
println!("Hello, World!");
}
The source is fairly similar, but you’ll probably noticed that there is no outer
class or other decoration to our main
method. Well, it isn’t a method, as it
is a function in this case. One could argue that a static method is pretty to
close to a function itself. Here, it definitively is a function as main
is
neither attached to a class instance or to a class, like how static methods
in Java are. Functions in Rust are declared using a name, as in Java, that is
prefixed by the fn
keyword, pronounced "fun". While you could expect a
function in Rust to declare a list of typed arguments, just like in Java, our main
function here, that serves as the entry point into our program, never takes any
arguments. We’ll get back to that in a little bit.
Finally the function’s body, within curly braces just as in Java, is not much
different. It invokes a
function-like
macro. In Rust, these macros are suffixed
with !
, which makes them easily recognizable. You might be wondering how it is
resolved? As, unlike in Java where you explicitly referenced
System.out.println()
, we don’t see anything indicating where this macro comes
from?! The definition of the macro can be found in std::println!
, consider
std
somewhat the equivalent of java.lang
, where for instance String
can be
found, yet we didn’t have to explicitly import it in our Java program neither,
the compiler will take care of resolving these for us and while how javac
and
rustc
differ on how they both do that, we’ll keep that subject for another time.
Speaking of rustc
though, the Rust compiler, let’s compile our
example. We’ll use the version 1.66 of the toolchain here. And finally run our
little program:
$ rustc --version
rustc 1.66.0 (69f9c33d7 2022-12-12)
$ rustc hello.rs
$ ./hello
Hello, World!
Success! We’ve had to go through the explicit compilation step as, unlike when using Java, our program doesn’t require a virtual machine to run. We could have compiled the Java example to native code ahead of time, in which case the resulting binary wouldn’t require a Java Runtime to execute neither, but we’ll use Java byte code based execution in this post to illustrate some of the differences in the two languages.
Now that we have seen how to run a basic program in Rust, let’s make it a little more interesting and try to be more specific about who we’d like to greet.
Let’s be a little more precise
We could make our little example a little more interesting by greeting someone
in particular rather than just generically saying hello to the whole world.
Let’s do this in a few steps, first extracting a variable containing the person
being greeted, in this current state: World
. We’ll keep on using that as the
default subject, when no argument is provided. There isn’t much involved in
doing this in Rust:
let who = "World";
println!("Hello, {who}!");
As you can see, this is really straight forward. Using the keyword let
we
declare the binding, our variable, who
. On the second line, we let println!
resolve the {who}
to the binding we just declared and substitute it with
World
.
All that remains is now overriding the value of who
with the argument passed
to the program, in case one was actually provided. Again, that seems like really
straightforward. So much I’m sure most of you can probably picture the Java code
below (or a slight variant thereof) as you are reading this:
public class Greeter {
public static void main(String... args) {
var who = "World";
if (args.length > 0) {
who = args[0];
}
System.out.println("Hello, " + who + "!");
}
}
But we have a few issues doing the same in our Rust program. First, where do we
source the argument from? As mentioned before, our main
function takes no
arguments… ever. The standard library comes to rescue here providing us with
std::env::args()
. This function returns an Args
which is an iterator over
the arguments as individual String
instances.
Another issue is that, in Rust, variable bindings are immutable by default.
Which means that we couldn’t reassign a named passed to program to who
as we
did in Java. Or at least not as is. One option would be to declare the binding
as mutable. This is done by adding the mut
keyword to the binding’s
declaration:
let mut who = "World";
But that would not be how you’d do that in idiomatic Rust. There is even an
actual good reason to not do it this way. As what you are actually saying with
the line above is that "whatever" who
points to is mutable by the program. But all we want is
to bind it to another complete different instance in case one was provided.
There are a few ways to achieve the binding of who
conditionally. We chose the
if let
pattern in the following example. There are a few reasons for this. But
for now think of it as the ternary operator expression you know from Java:
<booleanExpression> ? <expression1> : <expression2>
(an operator that Rust
actually lacks). Using it in our Java version, we would have refactored the
assignment of the var who
in our example above as so:
var who = args.length > 0 ? args[0] : "World" ;
Below is what our greeter.rs
example looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use std::env;
fn main() {
let arguments: Vec<String> = env::args().collect();
let anybody_at_all = arguments.get(1);
let who = if let Some(one) = anybody_at_all {
one
} else {
"World"
};
println!("Hello, {who}!");
}
The first line of our program declares the use
of the std::env
module,
which is the equivalent of importing a package in Java. Modules are separated
using the two colons (::
), whereas Java uses the dot (.
). Nothing too
different than what you already know.
On line 5, we declare a arguments
binding with the let
keyword as we did
with who
previously. But this time the variable name is followed by a colon
and a type definition. That’s how you declare a variable to be of a certain type
in Rust. In this case, we need to declare arguments
as being of type
Vec<String>
as the compiler can’t infer for us what the iterator of String
should be collected into. Vec
is a vector, which would be the equivalent
of Java’s ArrayList
, i.e. a heap allocated contiguous growable array type.
Vec
makes use of Rust’s generics, just like Java, to declare the type of items
in the vector, here String
. Rust’s String
are UTF-8 encoded, not UTF-16 as
in Java, and can be mutated. Finally, as you can infer from how we are
populating the Vec<String>
, Rust’s iterator are closer to Java Streams than
its Iterator
type.
Now that we have all the arguments passed to our program handy, we need to
verify whether something was actually given to us to print instead of "World".
We’re doing so on line 7 by using the .get(index)
method on our
Vec<String>
. But you’d probably be surprised to notice that we don’t do any
bound check before doing so! You could be expecting it to return null
if the
index is out of bound, yet that would be really surprising, especially coming
from Java, as null
is a perfectly fine value within an ArrayList<String>
.
But Rust couldn’t do that, as Rust has no such concept as null
. Yes, you heard
that right, no null value, ever! Instead, we get a Option<&String>
back.
Option<T>
is conceptually the equivalent of Optional<T>
in Java. But unlike
the Java version, this Option<T>
is a proper Algebraic Data Type (ADT) and
Rust provides first class support for these, though that’s for another post. There
are declared as enum
in Rust, but are different from the ones you know in Java.
If only because they do have state. But for now, you
need to know that there are Option::Some(T)
and Option::None
variants, where
the former represents the presence of an instance of type T
, while the latter
the absence thereof. That’s what our binding anybody_at_all
now represents,
either the is somebody to greet, or no one. Rust uses snake case rather than
camel-case syntax (anybodyAtAll
).
You may have noticed that we do lookup the .nth(1)
element in the Vec
. This
is not because Rust indexes start at one. Rust uses zero-based indexing just
as Java does. But the first (i.e. .nth(0)
) element of the vector is the actual
binary’s name invoked at startup, so probably ./greeter
if the working
directory contained the binary.
And finally let’s address the ampersand &
before String
of our
Option<&String>
. The ampersand in Rust defines a reference. Since there is
no null, we know that an &String
points a String
… somewhere. Unlike what we
did on line 5, which did construct a Vec<String>
, i.e. a vector containing
the actual String values, here .get
returns an Option
of a reference to
the String
in that vector. There is no copying of the said string happening,
the ownership of the String
remains with arguments
. Rust’s &String
is
actually the equivalent of String
in Java, which is a reference to a String.
Line 9 is where the conditional assignment of who
happens using the if let
expression. There is again a few things going on in this somewhat weird line of
code. So let’s break it down a bit. First the if let Some(one) = anybody_at_all
expression tests whether anybody_at_all
is a Option::Some(T)
, if that’s the
case the first branch is evaluated, if not the else
branch is. In the former
case, the if let
statement also performs what is known as destructuring in
Rust. The left hand side of the equal represent the form we expect to match. It
declares a variable, one
, that will bind the value within the Option::Some
.
That variable then becomes usable within the scope of the if let
branch.
Line 10 is probably also somewhat surprising, or at least as much as line 12
is. They both lack a trailing semicolon (;
) at the end. In Rust almost
everything is an expression. In order to define what such an expression
evaluates to, the last statement of a block, when no semicolon is present,
becomes the value the expression evaluates to. So what these two line do, is
either return one
(a reference to the string value read from the argument
list, stored in arguments
) or "World"
from the if let
expression. That
value then gets associated with the binding who
we have declared preceding the
if let
expression on line 9. Line 13 on the other hand has a trailing
semicolon. It marks the end of let statement that assigns to our who
variable. So that we are finally ready to greet, on line 15, as we did at the
beginning of our "small" refactoring…
$ rustc greeter.rs
$ ./greeter
Hello, World!
Whereas if we now provide an argument to our program… drumroll!
$ ./greeter John
Hello, John!
It works! Well, that was easy, was it not?
If the let if
expression is somewhat odd to you, there is another solution
you can use. One that is very common in Rust as well, the match
statement.
Using match
is close to a switch
expression in Java and makes use of pattern
matching and destructuring as did the previous example. Here is how this looks like:
let who = match anybody_at_all {
Some(one) => one,
None => "World",
};
A match
expression is indeed just an expression, so we use it to assign to
who
again. Unlike the let if
expression tho, the match
expression has
to be exhaustive over all Option
variants. Since there are only two,
Option::Some(T)
and Option::None
, there isn’t much difference, we end up
with two branches again. But this time, we don’t have to declare a block. We
just have each match
branch be the value to evaluate the expression two.
If we would want to write the real equivalent of our let if
expression, we
could use a catch all as the last branch of our match
, using the underscore
instead of matching explicitly against None
, like so: _ ⇒ "World"
.
Strings are "just" instances of String
, right?
We’ve been introduced to two important concepts with the example above: the
String
type and references, denoted with the leading &
ampersand character.
Let us look a little closer at what’s going on with our different strings in the
example above. We’ve seen that the Rust compiler can infer many types for us,
removing the burden of declaring the types explicitly from us. We still can
declare these types explicitly though and that can be helpful for programmers
reading our code, and that includes ourselves. This example here might be such a
case, so here it goes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use std::env;
fn main() {
let everyone: &str = "World";
let arguments: Vec<String> = env::args().collect();
let anybody_at_all: Option<&String> = arguments.get(1);
let who: &str = if let Some(one) = anybody_at_all {
one
} else {
everyone
};
println!("Hello, {who}!");
}
The code changed slightly to not only include the explicit types where possible, but there is also a
a new variable everyone
which is our default "World"
we are
greeting when no argument was provided. As you can see on line 4, "World"
is
no String
, but of some &str
type instead!? The ampersand would be familiar by
now, it’s a reference. But a reference to a str
? Before explaining what a
str
is, let us first understand what happens to "World"
when we compile our
program. The value everyone
points to (it’s a reference, remember?) is indeed
somewhat special. Unlike the arguments provided to our program, the "World"
is
known at compile time, as it is right there in our source code. The actual value
will as such be present within the resulting binary being compiled, in this case
an executable. What happens is that when the program get loaded into memory, so
is "World"
. And everyone
needs to point to that location in memory. That
explains the reference, but not the str
instead of String
!
There are few reasons why everyone
couldn’t be a String
. For one, String
instances are mutable in Rust. This string should not be mutated. That being
said, rustc already forbids making that reference mutable, i.e. &mut str
.
Also, a String
is heap
allocated, always. It is actually backed by a Vec
containing the bytes that
compose the UTF-8 encoded string. Whereas an str
is a slice, a view, into a
sequence of valid UTF-8 bytes - sometimes called a string slice. You don’t
really ever use an str
directly. But rather through a reference. An str
is
also known to be !Sized
, which means its size is unknown at compile time. The
reference is the one that adds that information and contains the length
information of the actual string.
So, our everyone: &str
is pointing to some place in memory where the bytes
World
are present. These might be immediately followed by another string value
or just garbage. That is our actual str
. The reference encodes that location
as well as the length of the string it points to, which would be 5 in this particular
case. On the other hand, String
instances are heap allocated and contain a
vector, which itself is a pointer to some contiguous memory location, a capacity
and a current length. Which is how our program’s arguments are made available to
us by the env::args()
function call.
Let’s take a look at these arguments then. Currently we collect all of them and
put them in a arguments: Vec<String>
on line 6. We do this because later,
we’ll only use a reference to the value to be printed. Not the value itself.
Here comes another very important difference between Rust and Java, there is no
garbage collector. The downside is that now we need to have a reference point to
something valid. And that something has to eventually be freed when its no
longer needed. When you look at our code though, there is no free
or delete
or anything alike you might know from other none garbage-collected languages.
Rust uses an idiom wellknown to C++ developer:
RAII,
which stands for Resource Acquisition Is Initialization. The compiler makes
sure that when an object goes out of scope it gets freed (or Drop
-ped as Rust
calls it). In our example above, it means that at the end of our main
function, the memory allocated by creating our let arguments: Vec<String>
also
gets freed and not leaked. And we need that, as our code uses a reference into
that vector when printing the message to the user when an argument was provided.
The Rust compiler makes sure our program only compiles if all references point
to something valid. If for instance we tried to save a few lines of code by
inlining the arguments
variable like so:
let anybody_at_all = env::args().collect::<Vec<String>>().get(1);
The code would not compile. The problem with it is that the Vec<String>
that
owns the argument’s value that the anybody_at_all
possibly points to (the
&String
) is freed as soon as it’s created, so that the reference would point
to something possibly invalid. The compile gives us a nice error message
explaining not only the issue, but also the fix:
$ rustc greeter.rs
error[E0716]: temporary value dropped while borrowed
--> greeter.rs:6:43
|
6 | let anybody_at_all = env::args().collect::<Vec<String>>().get(1);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
7 |
8 | let who: &str = if let Some(one) = anybody_at_all {
| -------------- borrow later used here
|
help: consider using a `let` binding to create a longer lived value
|
6 ~ let binding = env::args().collect::<Vec<String>>();
7 ~ let anybody_at_all: Option<&String> = binding.get(1);
Now we do not need to keep all of the arguments in memory as we only use the first one.
Here is some code that does just that. First by .skip(n)
-ping the first entry, and
then only keeping the following one. Again, this returns an Option<String>
, as
there might be none. That’s the first line here:
let anybody_at_all: Option<String> = env::args().skip(1).next();
let who = anybody_at_all.as_ref().map_or("World", String::as_str);
On the second line, we do a little dance: first we call .as_ref()
on
the Option<String>
, which returns Option<&String>
, i.e. a possible reference
to a String
. The last call, .map_or()
does one of two things: either there
is no value, in which case it’ll return the first argument, i.e. "World"
which
is of type &str
or, if there is a value, map it by invoking the
.as_str()
method on the actual value, which returns a &str
to our string.
Now you might be wondering why we didn’t have to "map" our one: &String
to
&str
explicitly in our previous version:
let who: &str = if let Some(one) = anybody_at_all {
one
} else {
everyone
};
Why isn’t line 10 reading one.to_str()
? The compiler actually takes care of
the conversion for us. Not using .to_str()
though, but using .deref()
, which
comes from the trait, a trait coming closest to an interface in Java,
std::ops::Deref
. So that line 10 becomes one.deref()
implicitly. This
mechanism is called
Deref
coercion. To do
it explicitly yourself, you’d need to add the use
statement for that trait at
the top of the file.
Alternatively, you use the explicit notation, which will deref the &String
into a String
, that into a str
and then return the reference to that, by
using &**one
. That’s quite some weird syntax, one that clippy
- the rust
linter - actually will discourage you from using:
warning: deref which would be done by auto-deref
--> greeter.rs:10:9
|
10 | &**one
| ^^^^^^ help: try this: `one`
|
The above being one amongst many reasons to use clippy
on your projects. It
will at the very least help you get acquainted with the Rust way of writing
idiomatic code.
Let’s recap
We’ve seen how to declare functions in Rust using the fn
keyword. Functions
are the equivalent of Java’s static methods, but aren’t bound to a class. We’ve
declared variables, using the let
keyword. Their type can be inferred by the
compiler in many cases, but we can provide that type information explicitly by
suffixing the variable name with colon :
followed by the type.
With std::Option
we saw that enum
in Rust is different from the ones we know
from Java. We’ve pattern matched over these in a few ways with let if
and
match
statements, and leveraged the more functional style API they expose to
conditionally assign to a variable.
And finally we thought a bit more about where data lives and its lifecycle, when
gathering the first argument provided to our program. Got introduced to two
different types of string: String
and str
and played with references and
dereferencing them.
With all that, the hope is that you are looking at starting some Rust experimentation on your own. Why not just start today, by bootstrapping a small project:
$ cargo new playground
Created binary (application) `playground` package
$ cd playground
$ find *
Cargo.toml
src
src/main.rs
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
$ cargo run
Compiling testing v0.1.0
Finished dev [unoptimized + debuginfo] target(s) in 0.78s
Running `target/debug/playground`
Hello, world!
See here, Hello, world!
… back to square one! But that would get you started. The build
tool cargo
is part of the toolchain, your Java’s maven or gradle equivalent, and
Cargo.toml
the
manifest for your project.
And if you do get started, run cargo clippy
regularly on your code base.
Unlike the other clippy, I promise this one will be useful! I also gave
a presentation that you might find
interesting. If you have any feedback or question, please let me know as chances
are that if anything is unclear to you, it is to someone else as well! And
should you be eager for more, feel free to continue to the
next post in this series, on ownership, the
borrow checker and a few things more!
This post is part of the series Rust for Java developers
- Part 1: Rust for Java developers, an introduction
- Part 2: Rust for Java developers, part 2