Official Golf language blog: December 2024

Thursday, December 26, 2024

Encryption: ciphers, digests, salt, IV

What is encryption

Encryption is a method of turning data into an unusable form that can be made useful only by means of decryption. The purpose is to make data available solely to those who can decrypt it (i.e. make it usable). Typically, data needs to be encrypted to make sure it cannot be obtained in case of unauthorized access. It is the last line of defense after an attacker has managed to break through authorization systems and access control.

This doesn't mean all data needs to be encrypted, because often times authorization and access systems may be enough, and in addition, there is a performance penalty for encrypting and decrypting data. If and when the data gets encrypted is a matter of application planning and risk assessment, and sometimes it is also a regulatory requirement, such as with HIPAA or GDPR.

Data can be encrypted at-rest, such as on disk, or in transit, such as between two parties communicating over the Internet.

Here you will learn how to encrypt and decrypt data using a password, also known as symmetrical encryption. This password must be known to both parties exchanging information.

Cipher, digest, salt, iterations, IV

To properly and securely use encryption, there are a few notions that need to be explained.

A cipher is the algorithm used for encryption. For example, AES256 is a cipher. The idea of a cipher is what most people will think of when it comes to encryption.

A digest is basically a hash function that is used to scramble and lengthen the password (i.e. the encryption key) before it's used by the cipher. Why is this done? For one, it creates a well randomized, uniform-length hash of a key that works better for encryption. It's also very suitable for "salting", which is the next one to talk about.

The "salt" is a method of defeating so-called "rainbow" tables. An attacker knows that two hashed values will still look exactly the same if the originals were. However, if you add the salt value to hashing, then they won't. It's called "salt" because it's sort of mixed with the key to produce something different. Now, a rainbow table will attempt to match known hashed values with precomputed data in an effort to guess a password. Usually, salt is randomly generated for each key and stored with it. In order to match known hashes, the attacker would have to precompute rainbow tables for great many random values, which is generally not feasible.

You will often hear about "iterations" in encryption. An iteration is a single cycle in which a key and salt are mixed in such a way to make guessing the key harder. This is done many times so to make it computationally difficult for an attacker to reverse-guess the key, hence "iterations" (plural). Typically, a minimum required number of iterations is 1000, but it can be different than that. If you start with a really strong password, generally you need less.

IV (or "Initialization Vector") is typically a random value that's used for encryption of each message. Now, salt is used for producing a key based on a password. And IV is used when you already have a key and now are encrypting messages. The purpose of IV is to make the same messages appear differently when encrypted. Sometimes, IV also has a sequential component, so it's made of a random string plus a sequence that constantly increases. This makes "replay" attacks difficult, which is where attacker doesn't need to decrypt a message; but rather an encrypted message was "sniffed" (i.e. intercepted between the sender and receiver) and then replayed, hoping to repeat the action already performed. Though in reality, most high-level protocols already have a sequence in place, where each message has, as a part of it, an increasing packet number, so in most cases IV doesn't need it.

Prerequisites

This example uses Golf framework. Install it first.

Encryption example

To run the examples here, create an application "enc" in a directory of its own (see mgrg for more on Golf's program manager):

mkdir enc_example
cd enc_example
gg -k enc

Copied!

To encrypt data use encrypt-data statement. The simplest form is to encrypt a null-terminated string. Create a file "encrypt.golf" and copy this:

 begin-handler /encrypt public
     set-string str = "This contains a secret code, which is Open Sesame!"
     // Encrypt
     encrypt-data str to enc_str password "my_password"
     print-out enc_str
     @
     // Decrypt
     decrypt-data enc_str password "my_password" to dec_str
     print-out dec_str
     @
 end-handler

Copied!

You can see the basic usage of encrypt-data and decrypt-data. You supply data (original or encrypted), the password, and off you go. The data is encrypted and then decrypted, yielding the original.

Golf 136 released

Any number expression can now use string subscription as a number, for instance:
set-string str='hello' set-number num = 10+str[0]
A character is treated as an unsigned number ranging from 0-255 (i.e. unsigned byte).

Tuesday, December 24, 2024

Golf 132 released

Individual bytes of a string (binary or text) can now be set using set-string by specifying the byte with a number expression within []. Since Golf is a memory-safe language, setting a byte this way is subject to a range check. For instance:
set-string str[10] = 'a'
An individual byte of a string (binary or text) can now be obtained (as a number) with set-number using a number expression within []. Since Golf is a memory-safe language, getting a byte this way is subject to a range check. For instance:
set-number byte = str[10]

Note that Golf is a very high level language, and it generally does not start with low-level constructs, such as setting and retrieving bytes from memory; rather its statements perform tasks that take typically many lines of code in other languages. So it makes sense an addition like this would be a "side-note" undertaken later in the language; it's not the focus of it. Still, Golf is also a high performance language and so the above two new capabilities are implemented with that in mind, with the minimum of overhead.

Sunday, December 15, 2024

Distributed computing made easy

What is distributed computing

Distributed computing is two or more servers communicating for a common purpose. Typically, some tasks are divvied up between a number of computers, and they all work together to accomplish it. Note that "separate servers" may mean physically separate computers. It may also mean virtual servers such as Virtual Private Servers (VPS) or containers, that may share the same physical hardware, though they appear as separate computers on the network.

There are many reasons why you might need this kind of setup. It may be that resources needed to complete the task aren't all on a single computer. For instance, your application may rely on multiple databases, each residing on a different computer. Or, you may need to distribute requests to your application because a single computer isn't enough to handle them all at the same time. In other cases, you are using remote services (like a REST API-based for instance), and those by nature reside somewhere else.

In any case, the computers comprising your distributed system may be on a local network, or they may be worldwide, or some combination of those. The throughput (how many bytes per second can be exchanged via network) and latency (how long it takes for a packet to travel via network) will obviously vary: for a local network you'd have a higher throughput and lower latency, and for Internet servers it will be the opposite. Plan accordingly based on the quality of service you'd expect.

How servers communicate

Depending on your network(s) setup, different kinds of communication are called for. If two servers reside on a local network, then they would typically used the fastest possible means of communication. A local network typically means a secure network, because nobody else has access to it but you. So you would not need TSL/SSL or any other kind of secure protocol as that would just slow things down.

If two servers are on the Internet though, then you must use a secure protocol (like TSL/SSL or some other) because your communication may be spied on, or worse, affected by man-in-the-middle attacks.

Local network distributed computing

Most of the time, your distributed system would be on a local network. Such network may be separate and private in a physical sense, or (more commonly) in a virtual sense, where some kind of a Private Cloud Network is established for you by the Cloud provider. It's likely that separation is enforced by specialized hardware (such as routers and firewalls) and secure protocols that keep networks belonging to different customers separate. This way, a "local" network can be established even if computers on it are a world apart, though typically they reside as a part of a larger local network.

Either way, as far as your application is concerned, you are looking at a local network. Thus, the example here will be for such a case, as it's most likely what you'll have. A local network means different parts of your application residing on different servers will use some efficient protocol based on TCP/IP. One such protocol is FastCGI, a high-performance binary protocol for communication between servers, clients, and in general programs of all kinds, and that's the one used by Golf. So in principle, the setup will look like this (there'll be more details later):

Next, in theory you should have two servers, however in this example both servers will be on the same localhost (i.e. "127.0.0.1"). This is just for simplicity; the code is exactly the same if you have two different servers on a local network - simply use another IP (such as "192.168.0.15" for instance) for your "remote" server instead of local "127.0.0.1". The two servers do not even necessarily need to be physically two different computers. You can start a Virtual Machine (VM) on your computer and host another virtual computer there. Popular free software like VirtualBox or KVM Hypervisor can help you do that.

In any case, in this example you will start two simple application servers; they will communicate with one another. The first one will be called "local" and the other one "remote" server. The local application server will make a request to the remote one.

Local server

On a local server, create a new directory for your local application server source code:

mkdir $HOME/local_server
cd $HOME/local_server

Copied!

and then create a new file "status.golf" with the following:

 begin-handler /status public
     silent-header
     get-param server
     get-param days

     print-format "/server/remote-status/days=%s", days to payload
     print-format "%s:3800", server to srv_location

     new-remote srv location srv_location \
         method "GET" url-path payload \
         timeout 30

     call-remote srv
     read-remote srv data dt
     @Output is: [<<print-out dt>>]
 end-handler

Copied!

The code here is very simple. new-remote will create a new connection to a remote server, running on IP address given by input parameter "server" (and obtained with get-param) on TCP port 3800. URL payload created in string variable "payload" is passed to the remote server. If it doesn't reply in 30 seconds, then the code would timeout. Then you're using call-remote to actually make a call to the remote server (which is served by application "server" and by request handler "remote-status.golf" below), and finally read-remote to get the reply from it. For simplicity, error handling is omitted here, but you can easily detect a timeout, any network errors, any errors from the remote server, including error code and error text, etc. See the above statements for more on this.

Make and start the local server

How is memory organized in Golf

Golf is a high-performance memory-safe language. A string variable is the actual pointer to a string (whether it's a text or binary). This eliminates at least one extra memory read, making string access as fast as in C. Bytes before the string constitute the ID to a memory table entry, with additional info used by memory-safety mechanisms:

Length (in bytes) of the string,
"Ref count" (Reference count), stating how many Golf variables point to string,
Status is used to describe string, such as whether it's scope is process-wide, if it's a string literal etc,
"Next free" points to the next available string block (if this one was freed too),
"Data ptr" points back to the string, which is used to speed up access.

Memory is always null-terminated, regardless of whether it's text or binary. Here's what that looks like in a picture:

Each memory block (ID+string+trailing null) is a memory allocated by standard C'd memory allocation, while memory table is a continuous block that's frequently cached to produce fast access to string's properties.

Web file manager in less than 100 lines of code

Uploading and download files in web browser is a common task in virtually any web application or service. This article shows how to do this with very little coding - in less than 100 lines of code. The database used is PostgreSQL, and the web server is Nginx.

You will use Golf as an application server and the programming language. It will run behind the web server for performance and security, as well as to enable richer web functionality. This way end-user cannot talk to your application server directly because all such requests go through the web server, while your back-end application can talk directly to your application server for better performance.

Assuming your currently logged-on Linux user will own the application, create a source code directory and also create Golf application named "file-manager":

mkdir filemgr
cd filemgr
gg -k file-manager

Copied!

Next, create PostgreSQL database named "db_file_manager", owned by currently logged-on user (i.e. passwordless setup):

echo "create user $(whoami);
create database db_file_manager with owner=$(whoami);
grant all on database db_file_manager to $(whoami);
\q"  | sudo -u postgres psql

Copied!

Create database configuration file used by Golf that describes the database (it's a file "db"):

echo "user=$(whoami) dbname=db_file_manager" > db

Copied!

Create SQL table that will hold files currently stored on the server:

echo "create table if not exists files (fileName varchar(100), localPath varchar(300), extension varchar(10), description varchar(200), fileSize int, fileID bigserial primary key);" | psql -d db_file_manager

Copied!

Finally, create source Golf files. First create "start.golf" file and copy and paste:

 begin-handler /start public
    @<h2>File Manager</h2>
    @To manage the uploaded files, <a href="<<print-path "/list">>">click here.</a><br/>
    @<br/>
    @<form action="<<print-path "/upload">>" method="POST" enctype="multipart/form-data">
    @    <label for="file_description">File description:</label><br>
    @    <textarea name="filedesc" rows="3" columns="50"></textarea><br/>
    @    <br/>
    @    <label for="filename">File:</label>
    @    <input type="file" name="file" value=""><br><br>
    @    <input type="submit" value="Submit">
    @</form>
 end-handler

Copied!

Create "list.golf" file and copy and paste:

Fixed bug: process-scoped string would be freed at the end of a code block or request handler when --optimize-memory flag is used.

Wednesday, December 4, 2024

Golf 121 released

Added return-handler statement.
Better error message when request is not found.
Better error message when no .golf files found or no begin-handler statements found.

Monday, December 2, 2024

Passing parameters between local request handlers

In Golf, there are no "functions" or "methods" as you may be used to in other languages. Any encapsulation is a request handler - you can think of it as a simple function that executes to handle a request - it can be called from an outside caller (such as web browser, web API, or from a command-line), or from another handler.

By the same token, there are no formal parameters in a way that you may be used to. Instead, there are named parameters, basically name/value pairs, which you can set or get anywhere during the request execution. In addition, your request handler can handle the request body, environment variables, the specific request method etc. (see request). Here though, we'll focus on parameters only.

You'll use set-param to set a parameter, which can then be obtained anywhere in the current request, including in the current handler's caller or callee. Use get-param to obtain a parameter that's set with set-param.

Parameters are very fast - they are static creatures implemented at compile time, meaning only fixed memory locations are used to store them (making for great CPU caching), and any name-based resolution is used only when necessary, and always with fast hash tables and static caching.

Calling one handler from another

In this article, we'll talk about call-handler which is used to call a handler from within another handler.

Here you'll see a few examples of passing input and output parameters between requests handlers. These handlers are both running in the same process of an application (note that application can run as many processes working in parallel). To begin, create an application:

mkdir param
cd param
gg -k param

Copied!

You'll also create two source files ("local.golf" and "some.golf") a bit later with the code below.

Simple service

Let's start with a simple service that provides current time based on a timezone as an input parameter (in file "local.golf"):

 begin-handler /local/time
     get-param tzone // get time zone as input parameter (i.e. "EST", "MST", "PST" etc.)
     get-time to curr_time timezone tzone
     @<div>Current time is <<p-out curr_time>></div>
 end-handler