Scripting Elixir applications

Scripting an application is the capability to automate some functionalities of that application. That means being able to bundle together some basic actions the application exposes and to execute that bundle as if it had been a functionality integrated in the application itself. This provides a great power as new functionalities can be created (easily) by the users themselves. This post describes how we can provide such a behaviour to Elixir applications.

Script and extension languages

A script language is a programming language dedicated for automation of tasks within a given environment. Typical example are operating systems that come with various shells which we use to clean file systems, to schedule backups, to manage users, etc.. Many other languages are tagged as “script language”; think of Perl, JavaScript, ruby, php, lua, python, guile to list a few. They all have some similar characteristics:

they are interpreted even though some can be compiled into a byte code or even into native code
they are most of the time general purpose languages, that is they are able to do computations on various data types, they have access directly or via libraries to the operating system (file system, network, sometimes graphics, …), etc.
they often propose a REPL to the programmer so they are quite easy to learn and to experiment with
some of them are embeddable by design into other applications.

Embeddable means that the scripting language can be integrated into an application so that it will be able to execute some code that had not been defined when the application was itself implemented. Think for example about an application that reads its data files in only one format. If a new format is to be processed, it would be “just” a matter of defining a new read function that would transform the data from the new format into the internal format. When a script language is embeddable into an application, it is often called an extension language.

Many major applications have an extension language and I believe that they are major because they can be extended by regular users and not only developers. Conversely, if an application doesn’t provide such a capability it is more likely to disappear and be replaced by one that does. To be convinced, imagine the internet without JavaScript (the older ones may remember Mosaic) or MS Office and VB or AutoCAD with AutoLisp. On the open source side, Emacs, despite its Spartan interface comes with thousands of extensions written in ELisp, gimp with scheme, sublime text with python and lua itself extends hundreds of applications. Extensibility is a future proof feature and it is a very important to take it into account in the early stages of application design.

The nature of the language itself is not a requirement per se: being object oriented or functional or what ever else is mainly a matter of taste. Yet, we won’t be the only users of that language and if it is easy the adoption of the application will be obviously broader.

How does this fit in Elixir’s landscape

Elixir compiles its code down to bytecode that is interpreted by a virtual machine called the beam (well, to be precise, it compiles and executes the code). The beam is actually the VM for the Erlang language and Elixir is just one of several languages that run on the beam.

There are several ways one can think of to integrate an extension language with an application running on the beam.

Integrate the interpreter as a NIF

In the Erlang and beam ecosystem, a NIF, a Native Implemented Function, is a function, typically implemented in C, and liked to the beam VM itself. This makes that function available to Erlang (and Elixir) just as a “regular” function. If we take a language that comes bundled as a C library, it should be possible to write some glue code so that the application and the language can interact. An example is esqlite, an Erlang NIF that encapsulates the sqlite3 database engine and its SQL interpreter.

Never forget, however, that NIF are dangerous as they can crash the whole beam.

Integrate the interpreter via a port

There is a much more traditional solution in the beam echo system that enable communications between the beam and an external process called a port. Basically, a port is a pipe which on one end is connected to a beam process (called the port owner or the connected process) and on the other end to a OS spawned process. Communication on the OS process side is done on standard input and output. Because the connected process is a beam process, it can be managed and supervised the Erlang/Elixirs way. As an example, Apache CouchDB database written in Erlang uses the Spidermonkey JavaScript engine as its query language.

Write the full interpreter in Elixir (or Erlang)

In this scenario, the interpreter engine is written itself in Elixir. To be executed, programs are transformed into some internal representation, typically an AST, Abstract Syntax Tree (and possibly further into a byte code) and evaluated on the fly. An example is Lispex, a toy lisp interpreter.

Compile the interpreter into the beam bytecode

Just as Elixir which compiles to the beam bytecode, get a language that also compiles to beam bytecode, integrate it as a library in the application and use it as the extension language.

There are about twenty languages that run on the beam (see lists on github and on Erlang ecosystem foundation). Most of them are functional because of the nature of the beam itself which was designed in the first place for a functional language, Erlang. There are some flavors of Lisp, some languages from the ML family, a haskell, a prolog and some unclassifiable languages that mainly have in common to be statistically typed. There are also few imperative languages that are maintained actively : lua and php. Published in November this year, this post explains how lua can be used from within Erlang and Elixir to make configurations more flexible.

And in this zoo, there is a nice functional language called Elixir which can also be used to script applications written in… Elixir. Let’s give it a try.

Getting a taste of scripting with Elixir

Elixirs core API contains a module called Code which documentation says it contains Utilities for managing code compilation, code evaluation, and code loading.

Evaluating strings

eval_string(string, binding \\ [], opts \\ []) evaluates Elixir code contained in the string. For example:

iex(39)> Code.eval_string("1+2")
{3, []}

The first element of the tuple is the result and the second is a keyword list of bindings which, in this example, is empty.

Let’s try with some variables:

iex(45)> Code.eval_string("c = a + b")
Warning: variable "a" does not exist and is being expanded to "a()", ...
...
** (CompileError) nofile:1: undefined function a/0
...

First notice, the error is a CompileError which means our code gets compiled and is not just interpreted which is important in terms of performance. The error says here that the variables a and b don’t exist in the execution context of a+b.

To solve the issue, we need to define the variables and write:

iex(48)> 
Code.eval_string("""
...(48)> a=1
...(48)> b=2
...(48)> c=a+b
...(48)> """)
{3, [a: 1, b: 2, c: 3]}

or to explicitly write the newline characters:

iex(48)>
iex(50)> Code.eval_string("a=1\n b=2\n c=a+b")
{3, [a: 1, b: 2, c: 3]}

or use the binding parameter to define the values of a and b:

iex(54)> Code.eval_string("c=a+b", [a: 1, b: 2])
{3, [a: 1, b: 2, c: 3]}

The code contained in the string can be of any complexity. We can for example define a module with functions:

iex(80)> s="defmodule M do\n def add(a, b) do\n a+b\n end\n end"
"defmodule M do\n def add(a, b) do\n a+b\n end\n end"
iex(81)> Code.eval_string(s)
{ {:module, M,
	<< 70, 79, 82, 49, 0, 0, ...> >, {:add, 2}}, []}
iex(82)> M.add(3, 6)
9

Note that here the result of the evaluation is the compiled code as there is no other calculation. The code above is not very clean for the human eye: the value of the string can also be read from a file. Let’s define the file M.exs to be:

defmodule M do
  def add(a, b), do: a + b
  def sub(a, b), do: a - b
end

then:

iex(104)> {:ok, code} = File.read("M.exs") 
{:ok,
  "defmodule M do\n  def add(a, b), do: a + b\n  def sub(a, b), do: a - b\ned\n"}
   iex(105)> Code.eval_string(code)
{ {:module, M,
	<<70, 79, 82, 49, ...>>, {:sub, 2}}, []}
iex(106)> M.add(M.sub(2, 1), M.sub(5, 4))
2

Evaluating files

Evaluating a file is a matter of loading the file content in a string and of evaluating the string. There is a convenience function that encapsulates all this, eval_file(filename, relative_path \\ nil). As for eval_string, this function returns the result of the evaluation and the bindings

Other interesting functions

There are some other compilation functions:

compile_string(string, file \\ "nofile") compiles the code contained in string and returns the generated bytecode. The optional file parameter is used for reporting warnings or errors, if any, as if the code was located from a file.
compile_file(filename, relative_path \\ nil) compiles the content of the file.
require_file(filename, relative_path \\ nil) compiles the content of the file. The difference with compile_file is that if the file is compiled by several processes concurrently, it will get compiled only once

Finally, the module contains functions that make sure a module had been compiled, had been loaded, that specific compiler options are set, etc. Going through the documentation of the Code module helps also to understand how the whole Elixir system works.

Scripting an Elixir application with Elixir

A dummy application for testing

As we’ll be doing some experiments with code, let’s create a dummy command line application which we will modify to illustrate our tests. Every test will be sitting on its own branch. The repository is hosted on Gitlab.

To initialize the project, type:

$ mix new se --module SE
$ cd se

Then edit the mix.exs file and add the line escript: [main_module: SE] in the project function:

def project do
  [
    app: :se,
    version: "0.1.0",
    elixir: "~> 1.10",
    start_permanent: Mix.env() == :prod,
    escript: [main_module: SE],           # <<<<< here
    deps: deps()
  ]
end

and finally change the content of the lib/se.ex file with:

defmodule SE do

   def main(args) do
    IO.inspect(args, label: "Command Line Arguments")
  end

end

Eventually, compile the application and execute the generated executable, just to make sure everything works properly:

$ mix escript.build
Generated escript se with MIX_ENV=dev
$ ./se 1 2 3
Command Line Arguments: ["1", "2", "3"]

Perfect! This basic code is on branch master.

Reading data from an external script file

Imagine a basic use case where we want to read some constants defined in a configuration file. We’ll first modify the se.ex file to make it load and evaluate a file as discussed in the previous section:

defmodule SE do
  
  def main([]), do: main(["-f", "init.cnf"])
  
  def main(["-f" , filename]) do
    IO.puts("This is Elixir code. Config file is #{filename}")
    {_res, bind} = Code.eval_file(filename)
    IO.puts("Result=#{inspect(bind, pretty: true)}")
    
    server = bind[:server]
    port = Keyword.get(bind, :port, 80)
    url = "http://#{server}:#{port}/"
    IO.puts("url=#{url}")
  end
  
  def main(any) do
    IO.puts("Error in argument list #{inspect(any)}")
  end

end

Notice there are three main() functions: this is a quick command line parameter parsing method, thanks to pattern matching. The “interesting main” is the second one: it loads the file which name is provided after the -f switch and interprets it. If we define a configuration file, init.cnf with the following content:

server = "myserver.com"
port = 8080

Code.eval_file() would return a tuple where the first element is the result of the interpretation, that is /my/doc and the second a keyword list of variable bindings: [path: "/my/doc", port: 8080, server: "myserver.com"]. From within the main() function, we get access to the parameters with, for example server = bind[:server] or port = Keyword.get(bind, :port, 80) to allow the port parameter to be optional with a default value of 80. This code is on branch configuration-1.

Calling application’s code from the script

One step further is to make the script file call code from the application’s core. Taking the example above, we can define the url variable in the script file rather than in the core application. We’ll also modify the definition of the variables to illustrate that as the script file contains Elixir regular code, we can use any valid Elixir data structure:

endpoint = %{
  :server => "myserver.com",
  :port   => 8080,
  :path   => "/my/doc"
}

url = "http://#{endpoint[:server]}:#{endpoint[:port]}#{endpoint[:path]}

The url is build during script evaluation which calls string interpolation and concatenation functions.

We modify the content of se.ex accordingly:

  def main(["-f", filename]) do
    ## as before

    server = bind[:endpoint].server
    port = bind[:endpoint].port
    path = bind[:endpoint].path
    url = bind[:url]
	
    IO.puts("server=#{server}")
    IO.puts("port=#{port}")
    IO.puts("path=#{path}")
    IO.puts("url=#{url}")
  end

Checkout the branch configuration-2 for the code.

Of course, the script code can call any function available to the application. Try for example to add the following at the begining of the script file:

require Logger
Logger.info("Logging from the script")

This also holds for functions you may have defined yourself in your application. Let’s define the following function in se.ex:

  def pretty_print(msg) do
    IO.puts("Message: #{msg}")
  end

and call it the following way from the script: SE.pretty_print("Start loading configuration file").

Checkout the branch configuration-3 for the code.

Calling code defined in the script file

Reversely, we can call code defined in the script file, as we have quickly seen above with the M module. Let’s reuse it and let’s modify the se.ex to make it load the M.exsfile:

defmodule SE do

  def main(_args) do
    Code.eval_file("M.exs")

    x1 = M.add(1, 2)
    IO.puts("x1=#{inspect(x1)}")

    x2 = M.sub(10, M.add(1, 1))
    IO.puts("x2=#{inspect(x2)}")
  end

end

When we build the executable (with mix escript.build), the compilation succeeds but we get a bunch of warnings:

$ mix escript.build
Compiling 1 file (.ex)
warning: M.add/2 is undefined (module M is not available or is yet to be defined)
Found at 2 locations:
  lib/se.ex:6: SE.main/1
  lib/se.ex:9: SE.main/1

warning: M.sub/2 is undefined (module M is not available or is yet to be defined)
  lib/se.ex:9: SE.main/1

Generated se app
Generated escript se with MIX_ENV=dev
$

Obviously, the module M is not available as we are going to load it later, when the script will be executed. The execution of the program works fine:

$ ./se 
x1=3
x2=8

If we want an interactive session with iex, we’ll run into the same issue as during the compilation: the module Mis not available. To get it available, we have to execute the main function so that it loads and compiles the source code of the module M:

$ iex -S mix
Erlang/OTP 22 [erts-10.6.4] ...

Interactive Elixir (1.10.1) ...
iex(1)> M.add(1, 2)
** (UndefinedFunctionError) function M.add/2 is undefined (module M is not available)
    M.add(1, 2)
iex(1)> SE.main([])
x1=3
x2=8
:ok
iex(2)> M.add(1, 2)
3
iex(3)> 

Conclusion

We have seen in this post a whole spectrum of possibilities where we can use Elixir as a scripting language of an Elixir application. At one end, it can be used to define basic configuration constants, just like an .ini file. On the other end, it can be used to write the full business logic of the application as the ELisp files do with Emacs. Where to put the boundary is a matter of architectural taste, of the flexibility we want to provide to end users, etc.. We also need to be aware that this provides a way to execute arbitrary code by the bean VM so security is an important pillar for the choice.

One last important point is performance. Usually, scripting languages are associated with poor performance. However, as we saw previously, the Elixir “scripted” part of the Elixir code is actually compiled just the same way as the rest of the application so except for the compilation phase that needs to be done at application’s start up, the “scripted” code should be as fast as the rest

Happy Elixir scripting!