Tuesday, January 28, 2025

Fast JSON parser with little coding

Golf's JSON parser produces an array of name/value pairs. A name is a path to value, for instance "country"."state"."name", and the value is simply the data associated with it, for instance "Idaho". You can control if the name contains array indexes or not, for instance if there are multiple States in the document, you might have names like "country"."state"[0]."name" with [..] designating an array element.

You can iterate through this array and get names of JSON elements, examine if they are of interest to you, and if so, get the values. This typical scenario is how Golf's parser is built, since it uses a "lazy" approach, where values are not allocated until needed, speeding up parsing. That is the case in this example. The JSON document below is examined and only the names of the cities are extracted.

You can also store JSON elements into trees or hashes for future fast retrieval, or store them into a database, etc.

To get started, create a directory for this example and position in it:
mkdir -p json
cd json

Save this JSON into a file "countries.json" - we will get the names of the cities from it:
{ "country": [
    { 
        "name": "USA",
        "state": [
            { 
                "name": "Arizona",
                "city": [
                    {
                        "name" : "Phoenix",
                        "population": 5000000
            	    } ,
                    {
                        "name" : "Tuscon",
                        "population": 1000000
            	    } 

                ]
            } ,
            { 
                "name": "California",
                "city": [
                    {
                        "name" : "Los Angeles",
                        "population": 19000000
            	    },
                    {
                        "name" : "Irvine"
            	    }
                ]
            } 
        ] 
    } ,
    { 
        "name": "Mexico",
        "state": [
            { 
                "name": "Veracruz",
                "city": [
                    {
                        "name" : "Xalapa-Enríquez",
                        "population": 8000000
            	    },
                    {
                        "name" : "C\u00F3rdoba",
                        "population": 220000
            	    }
                ]
            } ,
            { 
                "name": "Sinaloa",
                "city": [
                    {
                        "name" : "Culiac\u00E1n Rosales",
                        "population": 3000000
            	    }
                ]
            } 
        ] 
    }
    ]
}

What follows is the code to parse JSON. We open a JSON file, process the document, check for errors, and then read elements one by one. We look for a key "country"."state"."city"."name" because those contains city names. Note use "no-enum" clause in json-doc (which is the Golf's JSON parser), so that element designations aren't showing (meaning we don't have [0], [1] etc. for arrays).

Save this code to "parse-json.golf":
 begin-handler /parse-json public
     // Read the JSON file
     read-file "countries.json" to countries status st
     if-true st lesser-equal 0
         @Cannot read file or file empty
         exit-handler -1
     end-if

     // Parse JSON
     json-doc countries no-enum status st error-text et error-position ep to json

     // Check for errors in JSON document
     if-true st not-equal GG_OKAY
         @Error [<<p-out et>>] at [<<p-num ep>>]
         exit-handler -2
     end-if

     // This is the JSON element we're looking for
     set-string city_name unquoted ="country"."state"."city"."name"

     // Read elements one by one - note you can then store them in a tree or hash for future fast searches
     start-loop
         // Read just a key
         read-json json key k type t
         // Exit if end of document
         if-true t equal GG_JSON_TYPE_NONE
             break-loop
         end-if
         // If matches key we're looking for, get the value, and output it
         if-true city_name equal k
             read-json json value v
             @Value [<<p-out v>>]
             @--------
         end-if
         // Move on to the next JSON element
         read-json json next
     end-loop

     // Optionally delete JSON object, or it will be automatically deleted
     json-doc delete json
 end-handler

Create "json" application ("-k") and compile it ("-q"):
gg -k json -q

Copy the above JSON file into application home directory so we can use it without a path (since this directory is your program's default directory) - you can, of course, also specify any absolute or relative path in Golf code above:
cp countries.json /var/lib/gg/json/app

Run it:
gg -r --req="/parse-json" --exec

The result is:
Content-Type: text/html;charset=utf-8
Cache-Control: max-age=0, no-cache
Pragma: no-cache
Status: 200 OK

Value [Phoenix]
--------
Value [Tuscon]
--------
Value [Los Angeles]
--------
Value [Irvine]
--------
Value [Xalapa-Enríquez]
--------
Value [Córdoba]
--------
Value [Culiacán Rosales]
--------

Golf programs work exactly the same when you run them from command line as they do as web services or applications, hence you get the HTTP header in the output. You can avoid it by using silent-header in your Golf code, or by using "--silent-header" option in the above gg execution call.

You can also omit "--exec" option in the above gg execution call, which will display environment variable settings and the path to your executable. Then you can run those directly. This is useful if you're running a Golf command line program in batch mode from command line repeatedly, as it's faster than using gg.