The simplest constant of all is the unique value
absent. This is used as a general default value. (If
you know Java then it helps to know that this is the same as Java's
null.)
These are the distinct unique constants true and
false. Boolean values are required by if
expressions and while loops.
At the time of writing, you can write negative and positive integers in the usual fashion e.g. 3, -78, 2357 etc. Floating point numbers will be added on request.
Currently all integers are limited to 2's-complement, 32-bit integers, a defect inherited from the underlying implementation (Java). When this proves an issue, this restriction will be removed.
In MillScript there are three string-like literals: strings, character sequences, and atoms. All three are variants on the idea of a sequence of 16-bit Unicode characters.
Strings are double-quoted sequences such as and use backslash as the escape character for quoting awkward characters. One very useful feature is the ability to directly include HTML entities. e.g.
"abc"
"bingle bongle"
"\tHe said, \"Quote me.\"\r\n"
"\ "
Atoms are back-quoted sequences such as `I'm the only one`. The big difference between atoms and strings is that, for a given character sequence, there is only ever one atom but there can be lots of different strings. [If you are a Java programmer you'll probably want to think of atoms as wrappers around intern'd strings.] This means that atoms are more expensive to create but cheaper to compare than strings. e.g.
`I'm the only one`
Strings and atoms are distinct - if you try using one when the other is required the MillScript interpreter will complain loudly. Long experience shows that this, apparent inconvenience, is the right design decision.
Character sequences are single-quoted, just as you might expect. However, unlike C or Java, you can have as many characters as you like. For example, you can write 'yowsa!' - but, you may be asking, what does it mean? The answer is that this innocent expression returns six results! e.g.
'yowsa!'
MillScript inherits from Spice an elegant and truly simple approach to expressions with multiple values. These are dealt with in more detail in a later section. For the moment, it is enough to note that an expression can return any number of results, from 0 upwards.
Why would you want to return several characters? Well, in MillScript there are plenty of functions that take an arbitrary number of arguments and, because multiple values are handled so elegantly, this all fits together nicely.
The backslash escape convention is identical in strings, atoms, and character sequences. The following sequences are supported:
| Escape sequence | Description |
|---|---|
\\ |
Introduces a single backslash character |
\t |
A single tab character |
\n |
A newline character |
\r |
A carriage return character |
\& |
A single unicode character, as represented by the following HTML entity. Any valid HTML entity can be used including named and numeric(decimal and hexadecimal). e.g. "\ ", "\ " or "\ " are all valid and represent the same character. |
\( |
Introduces an interpolated section, up to a matching
). This transforms the string from a literal
constant to a dynamic value, which can change each time it is
executed. Any expression can be included and any results it
returns will be inserted at this point in the string, e.g.
;-) var name = "Kevin";
There are 0 results
;-) "hello, \(name)!";
There is 1 result
"hello, Kevin!"
;-) "5 + 5 = \( 5 + 5 )";
There is 1 result
"5 + 5 = 10";
;-) "A sequence of numbers: \( 1,2,3,4,5 )";
"A sequence of numbers: 12345"
|
In MillScript there are two regular expression literals: traditional
and native. The traditional syntax is based on the Java
java.util.regex package, while the native is designed
to fit in with the normal string syntax. Currently MillScript only
supports traditional regular expression, so those are the only ones
we will discuss here.
Traditional regular expression literals start with the
// sequence, followed by a sequence of characters up to
a closing /. The sequence follows the syntax laid down
in the Java java.util.regex
package. The sequence terminates at the first unprotected
/, which you can escape by preceeding it with a
backslash character.
//a regex/
//a regex with a \/ in it/
A regular expression can be modified with certain flags that change
how the regular expression matches against a character sequence. e.g.
it can be made case insensitive. These flags are specified by various
modifier characters that are placed after the closing /
character. The following table explains each modifier:
| Modifier | Description |
|---|---|
| d |
Enables Unix lines mode, in which \n is the only
character to be considered a line terminator in the behaviour of
., ^ and $.
|
| i |
Enables case insensitive matching, where the case of ASCII
characters in the regular expression will be matched in a case
insensitive way. Non-ASCII Unicode characters will still be
matched in a case sensitive way, see the u
flag to change this.
|
| m |
Enables multi line mode, where the character ^
matches just after a line terminator and $ just
before. By default these characters match at the begining and end
of the input sequence.
|
| s |
Enables single line, or dotall, mode, where the .
character matches any character, including line terminators.
|
| u |
Enables Unicode case mode, used in conjunction with the
i flag. This mode indicates the case insensitive
matching should be done in a way that is consistent with the
Unicode standard.
|
| x |
Enables comments mode, where whitespace and comments are
permitted in a regular expression. This mode indicates whitespace
should be ignored and comments beginning with # are
ignored to the end of a line.
|
Lets just have a few quick examples with regular expression flags:
//a regex/
//A rEgEx/i # Matches the same as the above line and more
//^.+$/m # Matches a single line from the input sequence
For more information on how to use regular expressions, see the datatypes section, specifically the Regular Expression and Binding sections.