Thanks, but I haven't really done much in jq before. Doing the above script involved looking a lot through the manual and experimenting. This exercise was a learning experience for me. I was only able to do this because the manpage is well written.
I was expecting some mention of jq for this post, and you did not disappoint. Thank you for the script -- it works great and I'm adding it to my collection.
That's insane. Here I was thinking about how much more challenging it would be to parse and reconstruct the object from jq, and you got the idea to take advantage of the syntax similarity to parse it as jq code itself. Nice. And so, that means the inverse of the jq code I posted would simply be:
Parsing is awkward in jq, but setpath(PATHS; VALUE) will create necessary structure. PATHS uses the array form, like ["movie", "cast", 5] not .movie.cast[5]. Since 1.5, jq has PCRE regex, so could remove ], and separate by . and [.
This would be more suitable to large json files if used with the `--stream` flag. Here's my take on it:
jq -c --stream '
. as $in
| select(length == 2)
| (
$in[0] | map(
if type == "number"
then "[" + tostring + "]"
else "." + .
end
) | add
) + " = " + ($in[1] | tostring)'
Using `--stream` allows jq to start before parsing the entire json file. In my experience, a 700mb json file can take up 5gb of ram in either jq or python -m json.
You're right about `--stream`, but you didn't need the variable assignment. Also, `-c`, besides the fact that it's not available in jq-1.5 which some people are using, is very pointless in this situation, since we're looking to output text, and `-c` is for outputting objects/arrays in a compact format. The fact that `-r` wasn't used causes jq to output the text encoded as json strings. So, instead of outputting:
".movie.name = Interstellar"
".movie.year = 2014"
".movie.is_released = true"
".movie.else = Christopher Nolan"
".movie.cast[0] = Matthew McConaughey"
".movie.cast[1] = Anne Hathaway"
".movie.cast[2] = Jessica Chastain"
".movie.cast[3] = Bill Irwin"
".movie.cast[4] = Ellen \\\\ Burstyn"
".movie.cast[5] = Michael Caine"
Another point is how the strings at the right of the `=` are displayed. They should be quoted. The reason why they're not is because you piped the second element to `tostring` instead of `@json`.
A better version of your suggestion would've been:
jq -r --stream '
select(length > 1)
| (
.[0] | map(
if type == "number"
then "[" + tostring + "]"
else "." + .
end
) | add
) + " = " + (.[1] | @json)
'
The use of `length > 1` instead of `length == 2` is a minor point, but if a future version jq decides to sometimes put 3 elements in these arrays, your filter would ignore those when we're likely to also want those. `length > 1` ensures what we need, that there are at least the elements that we're going to be using, while `length == 2` might filter some of those out, even if it's not right now.
Your use of `add` is neat, though. I wouldn't have thought of that.
You whitelist against what the syntax allows for identifiers and then you blacklist reserved keywords. Writing it this way makes it easier to verify for correctness when comparing with the ECMAScript Specs. This is still a non-exhaustive blacklist and the whitelist regex lacks allowed unicode characters.
Hmm... I just downgraded to 1.5 and it seems to work. jq just has 2 numbers in its versioning[1]. The other number must be from your distribution's package building. Maybe the issue is with something your distribution did while building the package (like adding a patch)? It might also be that the syntax error is not with the script, but with the JSON you input. Sorry, without being able to reproduce the error, I can't help more.
Like another thread mentioned, shebang (#!) parsing is non-standard. In macOS, I think what you tried would work like you'd expect, but it'd work differently on linux. The reason is that in linux, after parsing the path to the executable and a space, everything else is taken as a single argument. So if you were in bash, what you did would be the equivalent of doing:
jq "--stream -rf" path/to/script
and jq doesn't know of any one option called "--stream -rf".
I haven't seen the discussions around these design decisions in the different OSes, but I imagine the crux of the matter is that you have to pick somewhere to stop, and where you chose to stop is largely arbitrary.
I mean, you can have the OS interpret shebangs with multiple arguments, but then you'll want to be able to put spaces in these arguments, so you'll want quoting, and then you'll want to put special characters like newlines inside, so you'll want escaping, etc.
The OS can implement all these things in execve()'s logic, but it might also be preferable to keep the logic simple in the interest of avoiding security-harming bugs. You know, less code, less bugs, less vulnerabilities.
If --stream had a single letter option equivalent, you could stick it together with the other ones. However, since it doesn't, your only option to make a portable script is to use a shell shebang like #!/bin/bash, and then do:
exec jq --stream -rf ...
You might feel that this single argument restriction sucks and is definitely inferior to any implementation of multiple argument shebangs. I don't know if macOS shebangs support quoting, but if they don't and simply split on spaces, then I can tell you they can't do hacky stuff like writing code in a shebang like this:
macOS appears to support multiple args just fine. Which is why it annoys me that Shellcheck bitches about using more than one arg even though I'm writing a script for macOS specifically.
ShellCheck is right insofar as compatibility is concerned. You can only rely on the shebang supporting one argument. I'd personally just ignore that warning if I were writing for MacOS specifically, but you can configure ShellCheck to ignore certain errors that you don't care about[1].
Yeah, but it's simpler just to move the additional args to `set` calls than to remember the syntax for disabling the directive. It's just irritating, and especially so because, for some reason, in VSCode it ends up highlighting literally the entire file as an error instead of just the shebang.
EDIT 2: For those who want to save this script, you can put just the jq code in an executable file with the shebang: