Jan 4, 2023

Rust for Java developers, an introduction

Get ready to get started…

So you’ve heard about Rust and are a Java developer? You want to give Rust a try? This article will try to get you started, with the basics… The basic differences between Java and Rust and try to avoid you falling into some of pitfalls, and fight the borrow checker too much. The intent is to hopefully help you get into the mindset of a Rust developer, coming from Java.

We won’t cover here how to install the toolchain et al… that part has been covered more than enough. The guide on their website is probably all that you’ll need anyways.

So let’s dive into it and look at how a Java program is different from a Rust program. What better example is there to use than Hello, World!? We’ll build on this example in this introduction to cover a few topics that will lay some of the foundation to get you started.

Running our first program

When we say Java we mean it to be a few things. The language itself of course; the JDK classes; the JDK tools; and the Java Virtual Machine. All of which we need to use to print Hello, World! onto our computer’s screen. Here’s our HelloWorld.java program’s source:

class HelloWorld {
    public static void main(String... args) {
        System.out.println("Hello, World!");
    }
}

Where we declare a class, named HelloWorld. Itself declares a single static method, which like in many C-style languages is conventionally named main. Finally that method accepts an array of String instances representing the arguments passed to the program which will be populated by the runtime at invocation time.

The method’s body is fairly straight forward to any Java developer: it makes a lookup to the static field out on the class System and invokes the method .println() on that object, passing it the string Hello, World!.

Let’s try to run this, we’re using Java 19 here, so we can just invoke java passing in the source file without compiling it first to the Java Virtual Machine:

$ java --version
openjdk 19.0.1 2022-10-18
OpenJDK Runtime Environment (build 19.0.1+10-21)
OpenJDK 64-Bit Server VM (build 19.0.1+10-21, mixed mode, sharing)

$ java HelloWorld.java
Hello, World!

All very straightforward, there really is no magic there. Or… is there? Let’s see how achieving the same result in Rust is slightly different. First, we need to write the equivalent program in Rust. Here is our little hello.rs:

fn main() {
    println!("Hello, World!"); 
}

The source is fairly similar, but you’ll probably noticed that there is no outer class or other decoration to our main method. Well, it isn’t a method, as it is a function in this case. One could argue that a static method is pretty to close to a function itself. Here, it definitively is a function as main is neither attached to a class instance or to a class, like how static methods in Java are. Functions in Rust are declared using a name, as in Java, that is prefixed by the fn keyword, pronounced "fun". While you could expect a function in Rust to declare a list of typed arguments, just like in Java, our main function here, that serves as the entry point into our program, never takes any arguments. We’ll get back to that in a little bit.

Finally the function’s body, within curly braces just as in Java, is not much different. It invokes a function-like macro. In Rust, these macros are suffixed with !, which makes them easily recognizable. You might be wondering how it is resolved? As, unlike in Java where you explicitly referenced System.out.println(), we don’t see anything indicating where this macro comes from?! The definition of the macro can be found in std::println!, consider std somewhat the equivalent of java.lang, where for instance String can be found, yet we didn’t have to explicitly import it in our Java program neither, the compiler will take care of resolving these for us and while how javac and rustc differ on how they both do that, we’ll keep that subject for another time. Speaking of rustc though, the Rust compiler, let’s compile our example. We’ll use the version 1.66 of the toolchain here. And finally run our little program:

$ rustc --version
rustc 1.66.0 (69f9c33d7 2022-12-12)

$ rustc hello.rs

$ ./hello
Hello, World!

Success! We’ve had to go through the explicit compilation step as, unlike when using Java, our program doesn’t require a virtual machine to run. We could have compiled the Java example to native code ahead of time, in which case the resulting binary wouldn’t require a Java Runtime to execute neither, but we’ll use Java byte code based execution in this post to illustrate some of the differences in the two languages.

Now that we have seen how to run a basic program in Rust, let’s make it a little more interesting and try to be more specific about who we’d like to greet.

Let’s be a little more precise

We could make our little example a little more interesting by greeting someone in particular rather than just generically saying hello to the whole world. Let’s do this in a few steps, first extracting a variable containing the person being greeted, in this current state: World. We’ll keep on using that as the default subject, when no argument is provided. There isn’t much involved in doing this in Rust:

let who = "World";
println!("Hello, {who}!");

As you can see, this is really straight forward. Using the keyword let we declare the binding, our variable, who. On the second line, we let println! resolve the {who} to the binding we just declared and substitute it with World.

All that remains is now overriding the value of who with the argument passed to the program, in case one was actually provided. Again, that seems like really straightforward. So much I’m sure most of you can probably picture the Java code below (or a slight variant thereof) as you are reading this:

public class Greeter {
    
    public static void main(String... args) {

        var who = "World";
        if (args.length > 0) {
            who = args[0];
        }
        
        System.out.println("Hello, " + who + "!");
    }    
}

But we have a few issues doing the same in our Rust program. First, where do we source the argument from? As mentioned before, our main function takes no arguments… ever. The standard library comes to rescue here providing us with std::env::args(). This function returns an Args which is an iterator over the arguments as individual String instances.

Another issue is that, in Rust, variable bindings are immutable by default. Which means that we couldn’t reassign a named passed to program to who as we did in Java. Or at least not as is. One option would be to declare the binding as mutable. This is done by adding the mut keyword to the binding’s declaration:

let mut who = "World";

But that would not be how you’d do that in idiomatic Rust. There is even an actual good reason to not do it this way. As what you are actually saying with the line above is that "whatever" who points to is mutable by the program. But all we want is to bind it to another complete different instance in case one was provided. There are a few ways to achieve the binding of who conditionally. We chose the if let pattern in the following example. There are a few reasons for this. But for now think of it as the ternary operator expression you know from Java: <booleanExpression> ? <expression1> : <expression2> (an operator that Rust actually lacks). Using it in our Java version, we would have refactored the assignment of the var who in our example above as so:

var who = args.length > 0 ? args[0] : "World" ;

Below is what our greeter.rs example looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
use std::env;

fn main() {

    let arguments: Vec<String> = env::args().collect();

    let anybody_at_all = arguments.get(1);

    let who = if let Some(one) = anybody_at_all {
        one
    } else {
        "World"
    };

    println!("Hello, {who}!");
}

The first line of our program declares the use of the std::env module, which is the equivalent of importing a package in Java. Modules are separated using the two colons (::), whereas Java uses the dot (.). Nothing too different than what you already know.

On line 5, we declare a arguments binding with the let keyword as we did with who previously. But this time the variable name is followed by a colon and a type definition. That’s how you declare a variable to be of a certain type in Rust. In this case, we need to declare arguments as being of type Vec<String> as the compiler can’t infer for us what the iterator of String should be collected into. Vec is a vector, which would be the equivalent of Java’s ArrayList, i.e. a heap allocated contiguous growable array type. Vec makes use of Rust’s generics, just like Java, to declare the type of items in the vector, here String. Rust’s String are UTF-8 encoded, not UTF-16 as in Java, and can be mutated. Finally, as you can infer from how we are populating the Vec<String>, Rust’s iterator are closer to Java Streams than its Iterator type.

Now that we have all the arguments passed to our program handy, we need to verify whether something was actually given to us to print instead of "World". We’re doing so on line 7 by using the .get(index) method on our Vec<String>. But you’d probably be surprised to notice that we don’t do any bound check before doing so! You could be expecting it to return null if the index is out of bound, yet that would be really surprising, especially coming from Java, as null is a perfectly fine value within an ArrayList<String>. But Rust couldn’t do that, as Rust has no such concept as null. Yes, you heard that right, no null value, ever! Instead, we get a Option<&String> back. Option<T> is conceptually the equivalent of Optional<T> in Java. But unlike the Java version, this Option<T> is a proper Algebraic Data Type (ADT) and Rust provides first class support for these, though that’s for another post. There are declared as enum in Rust, but are different from the ones you know in Java. If only because they do have state. But for now, you need to know that there are Option::Some(T) and Option::None variants, where the former represents the presence of an instance of type T, while the latter the absence thereof. That’s what our binding anybody_at_all now represents, either the is somebody to greet, or no one. Rust uses snake case rather than camel-case syntax (anybodyAtAll).

You may have noticed that we do lookup the .nth(1) element in the Vec. This is not because Rust indexes start at one. Rust uses zero-based indexing just as Java does. But the first (i.e. .nth(0)) element of the vector is the actual binary’s name invoked at startup, so probably ./greeter if the working directory contained the binary.

And finally let’s address the ampersand & before String of our Option<&String>. The ampersand in Rust defines a reference. Since there is no null, we know that an &String points a String… somewhere. Unlike what we did on line 5, which did construct a Vec<String>, i.e. a vector containing the actual String values, here .get returns an Option of a reference to the String in that vector. There is no copying of the said string happening, the ownership of the String remains with arguments. Rust’s &String is actually the equivalent of String in Java, which is a reference to a String.

Line 9 is where the conditional assignment of who happens using the if let expression. There is again a few things going on in this somewhat weird line of code. So let’s break it down a bit. First the if let Some(one) = anybody_at_all expression tests whether anybody_at_all is a Option::Some(T), if that’s the case the first branch is evaluated, if not the else branch is. In the former case, the if let statement also performs what is known as destructuring in Rust. The left hand side of the equal represent the form we expect to match. It declares a variable, one, that will bind the value within the Option::Some. That variable then becomes usable within the scope of the if let branch.

Line 10 is probably also somewhat surprising, or at least as much as line 12 is. They both lack a trailing semicolon (;) at the end. In Rust almost everything is an expression. In order to define what such an expression evaluates to, the last statement of a block, when no semicolon is present, becomes the value the expression evaluates to. So what these two line do, is either return one (a reference to the string value read from the argument list, stored in arguments) or "World" from the if let expression. That value then gets associated with the binding who we have declared preceding the if let expression on line 9. Line 13 on the other hand has a trailing semicolon. It marks the end of let statement that assigns to our who variable. So that we are finally ready to greet, on line 15, as we did at the beginning of our "small" refactoring…

$ rustc greeter.rs

$ ./greeter
Hello, World!

Whereas if we now provide an argument to our program… drumroll!

$ ./greeter John
Hello, John!

It works! Well, that was easy, was it not?

If the let if expression is somewhat odd to you, there is another solution you can use. One that is very common in Rust as well, the match statement. Using match is close to a switch expression in Java and makes use of pattern matching and destructuring as did the previous example. Here is how this looks like:

let who = match anybody_at_all {
    Some(one) => one,
    None => "World",
};

A match expression is indeed just an expression, so we use it to assign to who again. Unlike the let if expression tho, the match expression has to be exhaustive over all Option variants. Since there are only two, Option::Some(T) and Option::None, there isn’t much difference, we end up with two branches again. But this time, we don’t have to declare a block. We just have each match branch be the value to evaluate the expression two.

If we would want to write the real equivalent of our let if expression, we could use a catch all as the last branch of our match, using the underscore instead of matching explicitly against None, like so: _ ⇒ "World".

Strings are "just" instances of `String`, right?

We’ve been introduced to two important concepts with the example above: the String type and references, denoted with the leading & ampersand character. Let us look a little closer at what’s going on with our different strings in the example above. We’ve seen that the Rust compiler can infer many types for us, removing the burden of declaring the types explicitly from us. We still can declare these types explicitly though and that can be helpful for programmers reading our code, and that includes ourselves. This example here might be such a case, so here it goes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
use std::env;

fn main() {
    let everyone: &str = "World";

    let arguments: Vec<String> = env::args().collect();
    let anybody_at_all: Option<&String> = arguments.get(1);

    let who: &str = if let Some(one) = anybody_at_all {
        one
    } else {
        everyone
    };

    println!("Hello, {who}!");
}

The code changed slightly to not only include the explicit types where possible, but there is also a a new variable everyone which is our default "World" we are greeting when no argument was provided. As you can see on line 4, "World" is no String, but of some &str type instead!? The ampersand would be familiar by now, it’s a reference. But a reference to a str? Before explaining what a str is, let us first understand what happens to "World" when we compile our program. The value everyone points to (it’s a reference, remember?) is indeed somewhat special. Unlike the arguments provided to our program, the "World" is known at compile time, as it is right there in our source code. The actual value will as such be present within the resulting binary being compiled, in this case an executable. What happens is that when the program get loaded into memory, so is "World". And everyone needs to point to that location in memory. That explains the reference, but not the str instead of String!

There are few reasons why everyone couldn’t be a String. For one, String instances are mutable in Rust. This string should not be mutated. That being said, rustc already forbids making that reference mutable, i.e. &mut str. Also, a String is heap allocated, always. It is actually backed by a Vec containing the bytes that compose the UTF-8 encoded string. Whereas an str is a slice, a view, into a sequence of valid UTF-8 bytes - sometimes called a string slice. You don’t really ever use an str directly. But rather through a reference. An str is also known to be !Sized, which means its size is unknown at compile time. The reference is the one that adds that information and contains the length information of the actual string.

So, our everyone: &str is pointing to some place in memory where the bytes World are present. These might be immediately followed by another string value or just garbage. That is our actual str. The reference encodes that location as well as the length of the string it points to, which would be 5 in this particular case. On the other hand, String instances are heap allocated and contain a vector, which itself is a pointer to some contiguous memory location, a capacity and a current length. Which is how our program’s arguments are made available to us by the env::args() function call.

Let’s take a look at these arguments then. Currently we collect all of them and put them in a arguments: Vec<String> on line 6. We do this because later, we’ll only use a reference to the value to be printed. Not the value itself. Here comes another very important difference between Rust and Java, there is no garbage collector. The downside is that now we need to have a reference point to something valid. And that something has to eventually be freed when its no longer needed. When you look at our code though, there is no free or delete or anything alike you might know from other none garbage-collected languages.

Rust uses an idiom wellknown to C++ developer: RAII, which stands for Resource Acquisition Is Initialization. The compiler makes sure that when an object goes out of scope it gets freed (or Drop -ped as Rust calls it). In our example above, it means that at the end of our main function, the memory allocated by creating our let arguments: Vec<String> also gets freed and not leaked. And we need that, as our code uses a reference into that vector when printing the message to the user when an argument was provided. The Rust compiler makes sure our program only compiles if all references point to something valid. If for instance we tried to save a few lines of code by inlining the arguments variable like so:

let anybody_at_all = env::args().collect::<Vec<String>>().get(1);

The code would not compile. The problem with it is that the Vec<String> that owns the argument’s value that the anybody_at_all possibly points to (the &String) is freed as soon as it’s created, so that the reference would point to something possibly invalid. The compile gives us a nice error message explaining not only the issue, but also the fix:

$ rustc greeter.rs
error[E0716]: temporary value dropped while borrowed
 --> greeter.rs:6:43
  |
6 |     let anybody_at_all = env::args().collect::<Vec<String>>().get(1);
  |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^       - temporary value is freed at the end of this statement
  |                          |
  |                          creates a temporary which is freed while still in use
7 |
8 |     let who: &str = if let Some(one) = anybody_at_all {
  |                                        -------------- borrow later used here
  |
help: consider using a `let` binding to create a longer lived value
  |
6 ~     let binding = env::args().collect::<Vec<String>>();
7 ~     let anybody_at_all: Option<&String> = binding.get(1);

Now we do not need to keep all of the arguments in memory as we only use the first one. Here is some code that does just that. First by .skip(n) -ping the first entry, and then only keeping the following one. Again, this returns an Option<String>, as there might be none. That’s the first line here:

let anybody_at_all: Option<String> = env::args().skip(1).next();
let who = anybody_at_all.as_ref().map_or("World", String::as_str);

On the second line, we do a little dance: first we call .as_ref() on the Option<String>, which returns Option<&String>, i.e. a possible reference to a String. The last call, .map_or() does one of two things: either there is no value, in which case it’ll return the first argument, i.e. "World" which is of type &str or, if there is a value, map it by invoking the .as_str() method on the actual value, which returns a &str to our string.

Now you might be wondering why we didn’t have to "map" our one: &String to &str explicitly in our previous version:

let who: &str = if let Some(one) = anybody_at_all {
    one
} else {
    everyone
};

Why isn’t line 10 reading one.to_str()? The compiler actually takes care of the conversion for us. Not using .to_str() though, but using .deref(), which comes from the trait, a trait coming closest to an interface in Java, std::ops::Deref. So that line 10 becomes one.deref() implicitly. This mechanism is called Deref coercion. To do it explicitly yourself, you’d need to add the use statement for that trait at the top of the file.

Alternatively, you use the explicit notation, which will deref the &String into a String, that into a str and then return the reference to that, by using &**one. That’s quite some weird syntax, one that clippy - the rust linter - actually will discourage you from using:

warning: deref which would be done by auto-deref
  --> greeter.rs:10:9
   |
10 |         &**one
   |         ^^^^^^ help: try this: `one`
   |

The above being one amongst many reasons to use clippy on your projects. It will at the very least help you get acquainted with the Rust way of writing idiomatic code.

Let’s recap

We’ve seen how to declare functions in Rust using the fn keyword. Functions are the equivalent of Java’s static methods, but aren’t bound to a class. We’ve declared variables, using the let keyword. Their type can be inferred by the compiler in many cases, but we can provide that type information explicitly by suffixing the variable name with colon : followed by the type.

With std::Option we saw that enum in Rust is different from the ones we know from Java. We’ve pattern matched over these in a few ways with let if and match statements, and leveraged the more functional style API they expose to conditionally assign to a variable.

And finally we thought a bit more about where data lives and its lifecycle, when gathering the first argument provided to our program. Got introduced to two different types of string: String and str and played with references and dereferencing them.

With all that, the hope is that you are looking at starting some Rust experimentation on your own. Why not just start today, by bootstrapping a small project:

$ cargo new playground
     Created binary (application) `playground` package

$ cd playground

$ find *
Cargo.toml
src
src/main.rs

$ cat src/main.rs
fn main() {
    println!("Hello, world!");
}

$ cargo run
   Compiling testing v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.78s
     Running `target/debug/playground`
Hello, world!

See here, Hello, world!… back to square one! But that would get you started. The build tool cargo is part of the toolchain, your Java’s maven or gradle equivalent, and Cargo.toml the manifest for your project.

And if you do get started, run cargo clippy regularly on your code base. Unlike the other clippy, I promise this one will be useful! I also gave a presentation that you might find interesting. If you have any feedback or question, please let me know as chances are that if anything is unclear to you, it is to someone else as well! And should you be eager for more, feel free to continue to the next post in this series, on ownership, the borrow checker and a few things more!

This post is part of the series Rust for Java developers

Part 1: Rust for Java developers, an introduction
Part 2: Rust for Java developers, part 2

Get ready to get started…​