Literals (Constants)

Absent

The simplest constant of all is the unique value absent. This is used as a general default value. (If you know Java then it helps to know that this is the same as Java's null.)

Booleans

These are the distinct unique constants true and false. Boolean values are required by if expressions and while loops.

Numbers

At the time of writing, you can write negative and positive integers in the usual fashion e.g. 3, -78, 2357 etc. Floating point numbers will be added on request.

Currently all integers are limited to 2's-complement, 32-bit integers, a defect inherited from the underlying implementation (Java). When this proves an issue, this restriction will be removed.

Strings, Characters and Atoms

In MillScript there are three string-like literals: strings, character sequences, and atoms. All three are variants on the idea of a sequence of 16-bit Unicode characters.

Strings are double-quoted sequences such as and use backslash as the escape character for quoting awkward characters. One very useful feature is the ability to directly include HTML entities. e.g.

"abc"
"bingle bongle"
"\tHe said, \"Quote me.\"\r\n"
"\ "
        

Atoms are back-quoted sequences such as `I'm the only one`. The big difference between atoms and strings is that, for a given character sequence, there is only ever one atom but there can be lots of different strings. [If you are a Java programmer you'll probably want to think of atoms as wrappers around intern'd strings.] This means that atoms are more expensive to create but cheaper to compare than strings. e.g.

`I'm the only one`
        

Strings and atoms are distinct - if you try using one when the other is required the MillScript interpreter will complain loudly. Long experience shows that this, apparent inconvenience, is the right design decision.

Character sequences are single-quoted, just as you might expect. However, unlike C or Java, you can have as many characters as you like. For example, you can write 'yowsa!' - but, you may be asking, what does it mean? The answer is that this innocent expression returns six results! e.g.

'yowsa!'
        

MillScript inherits from Spice an elegant and truly simple approach to expressions with multiple values. These are dealt with in more detail in a later section. For the moment, it is enough to note that an expression can return any number of results, from 0 upwards.

Why would you want to return several characters? Well, in MillScript there are plenty of functions that take an arbitrary number of arguments and, because multiple values are handled so elegantly, this all fits together nicely.

The backslash escape convention is identical in strings, atoms, and character sequences. The following sequences are supported:

Escape sequence Description
\\ Introduces a single backslash character
\t A single tab character
\n A newline character
\r A carriage return character
\& A single unicode character, as represented by the following HTML entity. Any valid HTML entity can be used including named and numeric(decimal and hexadecimal). e.g. "\ ", "\ " or "\ " are all valid and represent the same character.
\( Introduces an interpolated section, up to a matching ). This transforms the string from a literal constant to a dynamic value, which can change each time it is executed. Any expression can be included and any results it returns will be inserted at this point in the string, e.g.
;-) var name = "Kevin";
There are 0 results
;-) "hello, \(name)!";
There is 1 result
"hello, Kevin!"
;-) "5 + 5 = \( 5 + 5 )";
There is 1 result
"5 + 5 = 10";
;-) "A sequence of numbers: \( 1,2,3,4,5 )";
"A sequence of numbers: 12345"
              

Regular Expressions

In MillScript there are two regular expression literals: traditional and native. The traditional syntax is based on the Java java.util.regex package, while the native is designed to fit in with the normal string syntax. Currently MillScript only supports traditional regular expression, so those are the only ones we will discuss here.

Traditional regular expression literals start with the // sequence, followed by a sequence of characters up to a closing /. The sequence follows the syntax laid down in the Java java.util.regex package. The sequence terminates at the first unprotected /, which you can escape by preceeding it with a backslash character.

//a regex/
//a regex with a \/ in it/
        

A regular expression can be modified with certain flags that change how the regular expression matches against a character sequence. e.g. it can be made case insensitive. These flags are specified by various modifier characters that are placed after the closing / character. The following table explains each modifier:

Modifier Description
d Enables Unix lines mode, in which \n is the only character to be considered a line terminator in the behaviour of ., ^ and $.
i Enables case insensitive matching, where the case of ASCII characters in the regular expression will be matched in a case insensitive way. Non-ASCII Unicode characters will still be matched in a case sensitive way, see the u flag to change this.
m Enables multi line mode, where the character ^ matches just after a line terminator and $ just before. By default these characters match at the begining and end of the input sequence.
s Enables single line, or dotall, mode, where the . character matches any character, including line terminators.
u Enables Unicode case mode, used in conjunction with the i flag. This mode indicates the case insensitive matching should be done in a way that is consistent with the Unicode standard.
x Enables comments mode, where whitespace and comments are permitted in a regular expression. This mode indicates whitespace should be ignored and comments beginning with # are ignored to the end of a line.

Lets just have a few quick examples with regular expression flags:

//a regex/
//A rEgEx/i  # Matches the same as the above line and more
//^.+$/m     # Matches a single line from the input sequence
        

For more information on how to use regular expressions, see the datatypes section, specifically the Regular Expression and Binding sections.