HPR4227: Introduction to jq - part 3
Overview
In this episode we will continue looking at basic filters. Then we
will start looking at the feature that makes jq very
powerful, the ability to transform JSON from one form to another. In
essence we can read and parse JSON and then construct an alternative
form.
More basic filters
Array/String Slice:
.[<number>:<number>]
This filter allows parts of JSON arrays or strings to be
extracted.
The first number is the index of the elements of the array or string,
starting from zero. The second number is an ending index, but it means
"up to but not including". If the first index is omitted it refers to
the start of the string or array. If the second index is blank it refers
to the end of the string or array.
This example shows using an array and extracting part of it:
$ x="[$(seq -s, 1 10)]"
$ echo "$x"
[1,2,3,4,5,6,7,8,9,10]
$ echo "$x" | jq -c '.[3:6]'
[4,5,6]
Here we use the seq command to generate the numbers 1-10
separated by commas in a JSON array. Feeding this to jq on
its standard input with the slice request '.[3:6]' results
in a sub-array from element 3 (containing value 4), up to but not
including element 6 (containing 7). Note that using the
'-c' option generates compact output, as we discussed in
the last episode.
For a string, the idea is similar, as in:
$ echo '"Hacker Public Radio"' | jq '.[7:10]'
"Pub"
Notice that we provide the JSON string quotes inside single quotes
following echo. The filter '.[7:10]' starts
from element 7 (letter "P") up to but not including element 10
(letter "l").
Both of the numbers may be negative, meaning that they are offsets
from the end of the array or string.
So, using '.[-7:-4]' in the array example gives the same
result as '.[3:6]', as do '.[3:-4]' and
'.[-7:6]'. This example uses the x variable
created earlier:
$ for f in '.[-7:-4]' '.[3:6]' '.[3:-4]' '.[-7:6]'; do
> echo "$x" | jq -c "$f"
> done
[4,5,6]
[4,5,6]
[4,5,6]
[4,5,6]
Similarly, using '.[-12:-9]' gives the same result as
'.[7:10]' when used with the string.
$ echo '"Hacker Public Radio"' | jq '.[-12:-9]'
"Pub"
As a point of interest, I wrote a little Bash loop to show the
positive and negative offsets of the characters in the test string -
just to help me visualise them. See the footnote1 for details.
Finally, here is how to get the last character of the example string
using positive and negative offsets:
$ echo '"Hacker Public Radio"' | jq '.[18:]'
"o"
$ echo '"Hacker Public Radio"' | jq '.[-1:]'
"o"
Array/Object Value Iterator:
.[]
This filter generates values from iterating through an array or an
object. It is similar to the .[index] syntax we have
already seen, but it returns all of the array elements:
$ arr='["Kohinoor","plastered","downloadable"]'
$ echo "$arr" | jq '.[]'
"Kohinoor"
"plastered"
"downloadable"
The strings in the array are returned separately, not as an array.
This is because this is an iterator, and its output can be linked to
other filters.
It can also be used to iterate over values in an object:
$ obj='{"name": "Hacker Public Radio", "type": "Podcast"}'
$ echo "$obj" | jq '.[]'
"Hacker Public Radio"
"Podcast"
This iterator does not work on other data types, just arrays and
objects.
An alternative iterator .[]? exists which ignores
errors:
$ echo "true" | jq '.[]'
jq: error (at <stdin>:1): Cannot iterate over boolean (true)
Ignoring errors:
$ echo "true" | jq '.[]?'
Using multiple filters
There are two operators that can be placed between filters to combine
their effects: the comma (',') and the
pipe ('|').
Comma operator
The comma (',') operator allows you to chain together
multiple filters. As we already know, the jq program feeds
the input it receives on standard input or from a file into whatever
filter it is given. So far we have only seen a single filter being
used.
With the comma operator the input to jq is fed to all of
the filters separated by commas in left to right order. The result is a
concatenation of the output of all of these filters.
For example, if we take the output from the HPR stats page which was
mentioned in part 1
of this series of shows, and store it in a file called
stats.json we can view two separate parts of the JSON like
this:
$ curl -s https://hub.hackerpublicradio.org/stats.json -O
$ jq '.shows , .queue' stats.json
{
"total": 4756,
"twat": 300,
"hpr": 4456,
"duration": 7640311,
"human_duration": "0 Years, 2 months, 29 days, 10 hours, 18 minutes and 31 seconds"
}
{
"number_future_hosts": 6,
"number_future_shows": 18,
"unprocessed_comments": 0,
"submitted_shows": 0,
"shows_in_workflow": 51,
"reserve": 20
}
This applies the filter .shows (an object
identifier-index filter, see part 2)
which returns the contents of the object with that name, then it applies
filter .queue which returns the relevant JSON object.
Pipe operator
The pipe ('|') operator combines filters by feeding the
output of the first (left-most) filter of a pair into the second
(right-most) filter of a pair. This is analogous to the way the same
symbol works in the Unix shell.
For example, if we extract the 'shows' object from
stats.json, we can then extract the value of the
total' key' as follows:
$ jq '.shows | .total' stats.json
4756
Interestingly, chaining two object identifier-index filters
gives the same output:
$ jq '.shows.total' stats.json
4756
(Note: to answer the question in the audio, the two filters shown can
also be written as '.shows .total' with intervening
spaces.)
We will see the pipe operator being used in many instances in
upcoming episodes.
Parentheses
It is possible to use parentheses in filter expressions in a similar
way to using them in arithmetic, where they group parts together and can
change the normal order of operations. They can be used in other
contexts too. The example is a simple arithmetic one:
$ jq '.shows.total + 2 / 2' stats.json
4757
$ jq '(.shows.total + 2) / 2' stats.json
2379
Examples
Finding country data #1
Here we are using a file called countries.json obtained
from the GitHub
project listed below. This file is around 39,000 lines long so it is
not being distributed with the show. However, it's quite interesting and
you are encouraged to grab a copy and experiment with it.
I will show ways in which the structure can be examined and reported
with jq in a later show, but for now I will show an example
of extracting data:
$ jq '.[42] | .name.common , .capital.[]' countries.json
"Switzerland"
"Bern"
The file contains an array of country objects; the one with index 42
is Switzerland.
The name of the country is in an object called "name",
with the common name in a keyed field called "common", thus
the filter .name.common.
In this country object is an object called "capital"
holding an array containing the name (or names) of the capital city (or
cities). The filter .capital.[] obtains and displays the
contents of the array.
Note that we used a comma operator between the
filters.
Finding country data #2
Another search of the countries.json file, this time
looking at the languages spoken. There is an object called
"languages" which contains abbreviated language names as
keys and full names as the values:
$ jq '.[42] | .name.common , .capital.[] , .languages' countries.json
"Switzerland"
"Bern"
{
"fra": "French",
"gsw": "Swiss German",
"ita": "Italian",
"roh": "Romansh"
}
Using the filter .languages we get the whole object,
however, using the iterator .[] we get just the
values.
$ jq '.[42] | .name.common , .capital.[] , .languages.[]' countries.json
"Switzerland"
"Bern"
"French"
"Swiss German"
"Italian"
"Romansh"
This has some shortcomings, we need the construction
capabilities of jq to generate more meaningful output.
Next episode
In the next episode we will look at construction - how new
JSON output data can be generated from input data.
Links
jq:
The jq
manual
Test data sources:
HPR
Statistics
Random User Generator
API
Github project
mledoze/countries
Romansh
language
Previous episodes:
Introduction
to jq - part 1
Introduction
to jq - part 2
Footnote: A Bash loop to show positive
and negative index values relating to an example string:
$ y='Hacker Public Radio'
$ for ((i=0,j=${#y}; i<"${#y}"; i++,j--)); do printf '%02d: -%02d: %s\n' "$i" "$j" "${y:$i:1}"; done
00: -19: H
01: -18: a
02: -17: c
03: -16: k
04: -15: e
05: -14: r
06: -13:
07: -12: P
08: -11: u
09: -10: b
10: -09: l
11: -08: i
12: -07: c
13: -06:
14: -05: R
15: -04: a
16: -03: d
17: -02: i
18: -01: o
Note that variable y doesn't contain a valid JSON
string; we are just counting characters here to demonstrate the offsets
that can be used when slicing a string! ↩︎