Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was curious if this could be doable with jq, and apparently it is:

  jq -j '
    [
      [
        paths(scalars)
        | map(
          if type == "number"
          then "[" + tostring + "]"
          else "." + .
          end
        ) | join("")
      ],
      [
        .. | select(scalars) | @json
      ]
    ]
    | transpose
    | map(join(" = ") + "\n")
    | join("") 
  '
EDIT: Got the string quoting and escaping.

EDIT 2: For those who want to save this script, you can put just the jq code in an executable file with the shebang:

  #!/usr/bin/jq -jf


Holy moly, Your jq skills are savage.


Thanks, but I haven't really done much in jq before. Doing the above script involved looking a lot through the manual and experimenting. This exercise was a learning experience for me. I was only able to do this because the manpage is well written.


Huh, that's the spirit.


And that's how your skills gradually become savage. Kudos.


I was expecting some mention of jq for this post, and you did not disappoint. Thank you for the script -- it works great and I'm adding it to my collection.


Would probably be shorter if you used `tostream`


You're right. Good tip:

  jq -r '
    tostream
    | select(length > 1)
    | (
      .[0] | map(
        if type == "number"
        then "[" + tostring + "]"
        else "." + .
        end
      ) | join("")
    ) + " = " + (.[1] | @json)
  '
EDIT: For those who want to save this script, you can put just the jq code in an executable file with the shebang:

  #!/usr/bin/jq -rf


Appending | to each line and a last line . is a jq program that reproduces the original json.

  jq -r '
   ( tostream
     | select(length > 1)
     | (
       .[0] | map(
         if type == "number"
         then "[" + tostring + "]"
         else "." + .
         end
       ) | join("")
     )
     + " = "
     + (.[1] | @json)
     + " |"
   ),
   "."
  '


That's insane. Here I was thinking about how much more challenging it would be to parse and reconstruct the object from jq, and you got the idea to take advantage of the syntax similarity to parse it as jq code itself. Nice. And so, that means the inverse of the jq code I posted would simply be:

  ( jq "$(sed 's/$/ |/;$a.')" <<< '{}' )
As in:

  catj example.json \
  | ( jq "$(sed 's/$/ |/;$a.')" <<< '{}' ) \
  > original.json


And a nice example of the path form being amenable to unix tools.

BTW the input json can be null, so -n works (also using process substitution):

  jq -nf <(sed 's/$/ |/;$a.')


Parsing is awkward in jq, but setpath(PATHS; VALUE) will create necessary structure. PATHS uses the array form, like ["movie", "cast", 5] not .movie.cast[5]. Since 1.5, jq has PCRE regex, so could remove ], and separate by . and [.


Everything you guys have posted in this thread looks like pure ninja magic to me.


looks like ancient Greek for us plebs


This would be more suitable to large json files if used with the `--stream` flag. Here's my take on it:

  jq -c --stream '
    . as $in 
    | select(length == 2) 
    | (
      $in[0] | map(
        if type == "number" 
        then "[" + tostring + "]" 
        else "." + . 
        end
      ) | add
    ) + " = " + ($in[1] | tostring)'
Using `--stream` allows jq to start before parsing the entire json file. In my experience, a 700mb json file can take up 5gb of ram in either jq or python -m json.


You're right about `--stream`, but you didn't need the variable assignment. Also, `-c`, besides the fact that it's not available in jq-1.5 which some people are using, is very pointless in this situation, since we're looking to output text, and `-c` is for outputting objects/arrays in a compact format. The fact that `-r` wasn't used causes jq to output the text encoded as json strings. So, instead of outputting:

  .movie.name = "Interstellar"
  .movie.year = 2014
  .movie.is_released = true
  .movie.else = "Christopher Nolan"
  .movie.cast[0] = "Matthew McConaughey"
  .movie.cast[1] = "Anne Hathaway"
  .movie.cast[2] = "Jessica Chastain"
  .movie.cast[3] = "Bill Irwin"
  .movie.cast[4] = "Ellen \\\\ Burstyn"
  .movie.cast[5] = "Michael Caine"
You're outputting:

  ".movie.name = Interstellar"
  ".movie.year = 2014"
  ".movie.is_released = true"
  ".movie.else = Christopher Nolan"
  ".movie.cast[0] = Matthew McConaughey"
  ".movie.cast[1] = Anne Hathaway"
  ".movie.cast[2] = Jessica Chastain"
  ".movie.cast[3] = Bill Irwin"
  ".movie.cast[4] = Ellen \\\\ Burstyn"
  ".movie.cast[5] = Michael Caine"
Another point is how the strings at the right of the `=` are displayed. They should be quoted. The reason why they're not is because you piped the second element to `tostring` instead of `@json`.

A better version of your suggestion would've been:

  jq -r --stream '   
    select(length > 1)  
    | (
      .[0] | map(
        if type == "number"
        then "[" + tostring + "]"
        else "." + .
        end
      ) | add
    ) + " = " + (.[1] | @json)
  '
The use of `length > 1` instead of `length == 2` is a minor point, but if a future version jq decides to sometimes put 3 elements in these arrays, your filter would ignore those when we're likely to also want those. `length > 1` ensures what we need, that there are at least the elements that we're going to be using, while `length == 2` might filter some of those out, even if it's not right now.

Your use of `add` is neat, though. I wouldn't have thought of that.



Following https://github.com/stedolan/jq/issues/243 i commonly use https://github.com/joelpurra/har-dulcify/blob/master/src/uti... to explore unfamiliar json, ie:

  $ docker inspect 620f55df9177| structure.sh |grep -i addr
   .[].NetworkSettings.GlobalIPv6Address
   .[].NetworkSettings.IPAddress
   .[].NetworkSettings.LinkLocalIPv6Address
   .[].NetworkSettings.MacAddress
   .[].NetworkSettings.Networks.bridge.GlobalIPv6Address
   .[].NetworkSettings.Networks.bridge.IPAddress
   .[].NetworkSettings.Networks.bridge.MacAddress
  
  $ docker inspect 620f55df9177| jq .[].NetworkSettings.IPAddress
   "192.168.0.2"


That's awesome work. The only problem is that it does not properly handle keys which are not valid JS identifiers (like 1foo, @foo, foo-bar, etc.).


Well, there's this option without the blacklist:

  jq -r '
    tostream
    | select(length > 1)
    | (.[0] | map("[" + @json + "]") | join(""))                          
      + " = " + (.[1] | @json)
  '
And this other option with the blacklist patterns:

  jq -r '
    tostream
    | select(length > 1)
    | (
      .[0] | map(
        if type == "number" or (tostring | test("[@-]|^[0-9]|^else$"))
        then "[" + @json + "]"
        else "." + .
        end
      ) | join("")
    ) + " = " + (.[1] | @json)
  '
(The blacklist here is non-exhaustive, but an example.)


It occurs to me that a better way to blacklist would be something like:

  jq -r '
    tostream
    | select(length > 1)
    | (
      .[0] | map(
        if tostring | (
          test("^[A-Za-z$_][0-9A-Za-z$_]*$")
          and (
            . as $property
            | ["if", "else"] | all(. != $property) 
          )
        )
        then "." + .
        else "[" + @json + "]"
        end
      ) | join("")
    ) + " = " + (.[1] | @json)
  '
You whitelist against what the syntax allows for identifiers and then you blacklist reserved keywords. Writing it this way makes it easier to verify for correctness when comparing with the ECMAScript Specs. This is still a non-exhaustive blacklist and the whitelist regex lacks allowed unicode characters.


  jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Unix shell quoting issues?) at <top-level>, line 3:
  jq -j '      
  jq: 1 compile error


This, uh, doesn't work for me on jq-1.5.1.


Hmm... I just downgraded to 1.5 and it seems to work. jq just has 2 numbers in its versioning[1]. The other number must be from your distribution's package building. Maybe the issue is with something your distribution did while building the package (like adding a patch)? It might also be that the syntax error is not with the script, but with the JSON you input. Sorry, without being able to reproduce the error, I can't help more.

[1] https://github.com/stedolan/jq/releases


#!/usr/local/bin/jq -rf

tostream | select(length > 1) | ( .[0] | map( if type == "number" then "[" + tostring + "]" else "." + . end ) | join("") ) + " = " + (.[1] | @json)


i need to understand how #! works, ie `#!/usr/bin/jq --stream -rf" errors with `/usr/bin/jq: Unknown option --stream -rf`

`#!/usr/bin/jq -rf ` with tostream wrapper in code works fine


Like another thread mentioned, shebang (#!) parsing is non-standard. In macOS, I think what you tried would work like you'd expect, but it'd work differently on linux. The reason is that in linux, after parsing the path to the executable and a space, everything else is taken as a single argument. So if you were in bash, what you did would be the equivalent of doing:

  jq "--stream -rf" path/to/script
and jq doesn't know of any one option called "--stream -rf".

I haven't seen the discussions around these design decisions in the different OSes, but I imagine the crux of the matter is that you have to pick somewhere to stop, and where you chose to stop is largely arbitrary.

I mean, you can have the OS interpret shebangs with multiple arguments, but then you'll want to be able to put spaces in these arguments, so you'll want quoting, and then you'll want to put special characters like newlines inside, so you'll want escaping, etc.

The OS can implement all these things in execve()'s logic, but it might also be preferable to keep the logic simple in the interest of avoiding security-harming bugs. You know, less code, less bugs, less vulnerabilities.

If --stream had a single letter option equivalent, you could stick it together with the other ones. However, since it doesn't, your only option to make a portable script is to use a shell shebang like #!/bin/bash, and then do:

  exec jq --stream -rf ...
You might feel that this single argument restriction sucks and is definitely inferior to any implementation of multiple argument shebangs. I don't know if macOS shebangs support quoting, but if they don't and simply split on spaces, then I can tell you they can't do hacky stuff like writing code in a shebang like this:

> https://unix.stackexchange.com/questions/365436/choose-inter...

Granted, it's bad practice, but a little cool nevertheless.


IIRC in a hashbang isn't posix compatible


A hashbang is not defined by posix, so using it without args isn't "posix compatible".

From https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

> If the first line of a file of shell commands starts with the characters "#!", the results are unspecified.

There is no more specification for "#!/usr/bash" than "#!/usr/bin/jq -jf"

The exec page provides even fewer words about how to interpret shebangs if you thought perhaps I was linking to the wrong portion of the posix spec


Args in a hashbang?

Maybe not, but I'm pretty sure every system supports a single arg. And very few (none?) support more.


macOS appears to support multiple args just fine. Which is why it annoys me that Shellcheck bitches about using more than one arg even though I'm writing a script for macOS specifically.


ShellCheck is right insofar as compatibility is concerned. You can only rely on the shebang supporting one argument. I'd personally just ignore that warning if I were writing for MacOS specifically, but you can configure ShellCheck to ignore certain errors that you don't care about[1].

[1] https://github.com/koalaman/shellcheck/wiki/Ignore


Yeah, but it's simpler just to move the additional args to `set` calls than to remember the syntax for disabling the directive. It's just irritating, and especially so because, for some reason, in VSCode it ends up highlighting literally the entire file as an error instead of just the shebang.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: