QT Labs logo
QT Labs logo
SanUSB Project

Author

~ Stefan Jovanović

SanUSB Project

Author

~ Stefan Jovanović

SanScript, part 1 - overview

Learning to construct your own programming language (aka language hacking) can be a dangerous thing. Once you get a hang of it, an answer to most of your problems will be: “I should make a language for it”.

My journey in writing compilers started during my third year of faculty when I stumbled upon two books. First book, the more daunting one, is Compilers: Principles, Techniques, and Tools (The Dragon book for short) by Aho, Lam, Sethi and Ullman. It can be summarized as The Bible for compilers as it contains over 1000 pages of compiler fundamentals, different phases of compiler development, runtime environment, etc. Even though it’s an absolute treasure trove of knowledge, I couldn’t make myself read it as a normal book would be read, from start to finish. I look at it more as a reference guide that was really useful when having to clear up some concept that wasn’t so well explained in other places. As most of the people in this space I learn best by doing, rather than reading, but I also hate black boxes so I try to find the balance between theory and practice. The book that hit the spot for me was Crafting Interpreters by Robert Nystrom which takes more practical approach by developing compiler infrastructure for Lox, scripting language designed specifically for this book. First part of the book deals with development of the more primitive, tree-walk interpreter in Java. Second part is a more realistic scenario, developing a virtual machine in C that interprets compiled bytecode.

Previous paragraph explained the hows, but not the whys. Why did I even start researching this topic? Well, one of the main reasons is, of course, curiosity, but that alone wouldn’t justify the decision (or maybe it would) to spend months learning and developing compiler and VM for a language that maybe nobody would even hear about, let alone use. This was actually part of a bigger personal project, a keystroke injection platform, now known as SanUSB, which aims to be free and open source alternative to a commercial product Rubber Ducky. For those of you not familiar with the Rubber Ducky and keystroke injection attack, it is basically a device that looks like USB flash storage, but presents itself to the host as the keyboard (or mouse) and allows the security researcher (or malicious user) to program it with payload that will be executed upon insertion into USB port. Rubber Ducky payloads are written in proprietary language DuckyScript, shown in the following code block:

REM Windows Modifier Key Example

REM Open the RUN Dialog
GUI r

REM Close the window
ALT F4

Developing a platform like Rubber Ducky consists of several parts, including:

  • Construction of a language for payload development
  • Designing device and programming its firmware
  • Developing tools for payload scripting, compiling and flashing the firmware

Current state of the SanUSB will be showcased some other time, while this post mainly explains the first bullet, construction of a language for payload development - SanScript. SanScript is procedural, weakly typed language with specific functions and operators for constructing keystroke injection payloads, making it somewhat of a domain-specific language. Now, let’s start by exploring the features and syntax of the SanScript.

SanScript features and syntax

SanScript syntax can be learned in tens of minutes because most of it is similar to languages like JavaScript, C and even Rust.

Comments

As any sane person who wants to learn a new programming language would do, you have to start by learning how to write comments. Thankfully, I wasn’t tempted enough to reinvent a wheel and went with the obvious:

// This is the only way to write comments in SanScript

We can now tick the comments checkbox and move on:

  • comments
  • data types
  • expressions
  • functions and variables
  • control flow

Data types

As in most languages, there are several supported data types, including:

  • bool (containing two possible values, true or false)
true;
false;
  • number (can be decimal or integer)
2010;
20.10;
  • string
"Nixie tubes are cool"; // as well as flip-dot displays
  • nil (representing absence of a value)
nil; // this code block was absolutely necessary

The following four data types are the most important ones and are the building blocks of payloads:

  • key code (represents keyboard key, there are total of 169 supported key codes)
SPACE;
ENTER;
A;
  • key combination (represents group of key codes that can be injected simultaneously, syntax will be shown later)
  • key sequence (represents groups of key codes and key combinations that can be injected sequentially, syntax will be shown later)
  • mouse button (left click, right click or middle click)
LEFT_CLICK;
RIGHT_CLICK;
MIDDLE_CLICK;

Look, we are making progress!

  • comments
  • data types
  • expressions
  • variables and functions
  • control flow

Expressions

When talking about expressions, I’ll be mostly going over the operators, which can be grouped into following categories:

  • Arithmetic operators
2 + 3 // number addition
2 - 3 // number subtraction
2 * 3 // number multiplication
2 / 3 // number division
-2    // number negation
  • String operators
"String " + "concatenation"
  • Comparison and equality operators
2 < 3                // less than
2 <= 3               // less than or equal to
2 > 3                // greater than
2 >= 3               // greater than or equal to
2 == 3               // equal to
"Knight" == "Bishop" // checks if two strings have identical value
1 == "1"             // equality between different types always returns false
  • Logical operators
!true          // negation
true and false // logical and
true or false  // logical or
  • Grouping operator
(4 + 2) / 3 // expressions inside inner-most parenthesis have the priority

Bundled with the most important data types come the most important operators:

  • Key code operators
CTRL + ALT + DEL         // produces key combination
CTRL + ALT + DEL | ENTER // produces sequence of key combination and key code

As promised before, key combination and key sequence syntax are introduced here. The sequence above, when passed to the proper function, would inject key combination CTRL+ALT+DEL after which it would inject ENTER key.

Before we finish this section, let me mention statements and how they differ from expressions. Expressions are parts of code that produce some value, for example the following code produces value true as a result:

!(5 - 4 > 3 * 2)

Statements on the other hand are instructions that our program will execute and in that sense are “self-contained” and don’t produce a value. Every statement in SanScript ends with semicolon:

!(5 - 4 > 3 * 2); // the value false is dismissed since there is no assignment

For more detailed comparison between expressions and statements, check out this article by Josh Comeau. Now, we are one step closer to fully understanding SanScript (for those of you that came here expecting to learn Sanskrit, sorry to break the news for you, but maybe consider practicing your googling skills):

  • comments
  • data types
  • expressions
  • variables and functions
  • control flow

Variables and functions

Quick legal disclaimer before I introduce variable declaration and mutation: Quarks Team or any of its representatives shall not be liable for actions or non-actions taken by “Haskellers” who have read the post and seen the horrors of mutable state or any other form of side effects.

Variable declaration starts with keyword let, followed by the name of the variable:

let my_var;

Since SanScript is weakly-typed language, types are inferred upon assignment:

my_var = 3; // my_var is of type number
my_var = "Now it's a string"; // SanScript is dynamically typed language

We can also assign value to a variable when we declare it:

let terminal = CTRL + ALT + T;

Functions are defined with the fn keyword, followed by the function name, parameters and the function body:

fn add(a, b) {
	return a + b;
}

let result = add(2, 3); // function call is standard C-like syntax

Another important topic we should cover in this section is the SanScript standard library. Having all the data types and operators for keystroke construction would be useless without appropriate functions for keystroke injection. At the moment of writing, there are total of 13 functions in the standard library, shown in the table below:

Function definitionDescription
inject_keys(key_combination)Takes in key combination as an argument and emulates key press action
hold_keys(key_combination)Takes in key combination as an argument and emulates key hold action
release_keys()Emulates key release action
inject_sequence(key_sequence, delay, jitter)Injects a key sequence with desired delay between each injected combination and jitter that will introduce some randomness into delay for more human like typing
string_to_keys(keys_string)Takes in a string and returns key sequence consisting of keys in a string
mouse_move(x, y)Moves the mouse cursor to the passed x and y coordinates
mouse_click(mouse_button)Takes in mouse button as an argument and emulates mouse click action
mouse_hold(mouse_button)Takes in mouse button as an argument and emulates mouse hold action
mouse_release()Emulates mouse release action
sleep(duration)Suspends the thread for the given duration of time in milliseconds
random_int(min, max)Returns a random integer value in a given range between min and max
random_float(min, max)Returns a random decimal value in a given range between min and max

With these 13 functions we are able to construct both keystroke injection as well as mouse injection payloads. We’re in the endgame:

  • comments
  • data types
  • expressions
  • variables and functions
  • control flow

Control flow

SanScript doesn’t deviate much from the usual in this regard, offering the total of 3 control flow mechanisms:

  • if statement
if (condition) {
	// do something
} else if (other_condition) {
	// do something else
} else {
	// some third option
}
  • while loop
while (condition) {
	// do something as long as the condition holds
}
  • for loop
for (let i = 0; i < 10; i++) {
	// do something over the 10 iterations
}

And…that’s about it…a bit anticlimactic I guess:

  • comments
  • data types
  • expressions
  • variables and functions
  • control flow

In the next part we will dive into architecture of the language and implementation specifics regarding writing a compiler infrastructure in Rust. Until then, be free to check out the repository for the language and the whole SanUSB project, found on the following link.