Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Making single-purpose utilities example: filter URLs from input
1 point by textmode on June 6, 2019 | hide | past | favorite | 4 comments

    /* ---
    Made for use with http clients as described in https://news.ycombinator.com/item?id=17689165 and https://news.ycombinator.com/item?id=17689152
    Assuming code below is saved as "030.l", one might compile program "yy030" with something like:
     flex -8iCrfa 030.l
     cc -pipe lex.yy.c -static -o yy030
     --- */


     #define p(x) fprintf(stdout,x,yytext);
     #define jmp BEGIN
    %s xa xb xc
     int e,b,c;
    xa "http://"|"https://"|"ftp://"
    %%
     /* non-printable */
    \200|\201|\204|\223|\224|\230|\231|\234|\235

    {xa} p("%s");jmp xa;
    <xa>[^ \n\r<>"#'|)\]\}]* p("%s\n");jmp 0;

     /* http:\/\/[^ \n\r<>"#'|]*    fprintf(stdout,"%s\n",yytext); */
     /* https:\/\/[^ \n\r<>"#'|]*    fprintf(stdout,"%s\n",yytext); */
     /* ftp:\/\/[^ \n\r<>"#'|]*    fprintf(stdout,"%s\n",yytext); */
    .|\n
    %%
    int main(){ yylex();}
    int yywrap()
    {
    }


Uh, I cannot imagive why one would prefer this to a single “grep -o” invocation.

Not only “grep” command will be simpler to understand later, it will also be trivially customizeable/extendable


Of course I use grep -o too. This is not a "correct" filter. It is not a perfect regexp for 100% of urls.

However for something as simple and essential (for the author) as filtering urls I do not want to always have to worry about potential differences in shells, different versions of grep or the absence of a grep as I use different computers, different OS or OS versions. I find this more predictable and portable.

Neither customization nor extensibility are goals. For that a scripting language is better suited.


Change that "grep" to "sed", and you will get a solution that works even on ancient machines, like HP-UX from 1990's. Grab msys, and you'd have your solution for Window-based systems as well.

At the same time, installing "flex" and "cc" on a random machine would be much harder. Old Solaris boxes, for example, come without any C compilers, not to mention lexers.

And finally, what are you going to do with the results? It is very likely that you'd want to pass them through sed/grep anyway. So you will have to worry about differences in shells and versions anyway.

So sorry, I see no advantages of this, just disadvantages. Of course no one cares if you run them yourself, but posting them for other people is just evil.


Yeah, I am pretty good with sed. Probably better than you. I have sed versions of all these programs.

I am not using any computers that cannot run flex and cc.

Results usually go to yy025, a program that makes http from urls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: