More

copyconstruct · on Sept 28, 2019

Netflix also wrote about the drawbacks on this approach https://medium.com/netflix-techblog/netflix-edge-load-balanc...

copyconstruct · on Oct 26, 2017

I'm the author of that post. The goal wasn't to mislead - like I mentioned, I'm learning these things myself and definitely could've gotten several things wrong.

I meant file offsets are per process, not that every process gets its own table entry.

> when two processes perform reads through a shared file table entry for a regular file, the order of reads will matter as between the two processes because of the shared cursor; the sequence of data each process reads could differ based on random scheduling latencies.

Not sure I follow. won't the two processes still have their own descriptors which point to the same file entry but maintain their own offsets? I think what I understood from your comment is that descriptors are _shared_ by the parent and child with share by reference semantics? So both the parent and the child _are using the same descriptor_ which in turn has an offset in the file table entry.

wahern · on Oct 26, 2017

But file offsets _aren't_ per process. File offsets (aka I/O position cursors) are kept in the file table entry data structure, and those are shared for descriptors that have been dup'd or fork'd. If the cursor wasn't shared then this program

  #include <stdio.h>
  #include <stdlib.h>
  
  #include <err.h>
  #include <unistd.h>
  
  int
  main(void) {
  	FILE *fh = tmpfile();
  	if (!fh)
  		err(1, "tmpfile");  
  	int fd = fileno(fh);
  	if (fd == -1)
  		errx(1, "fileno: no descriptor");  
  	const char digits[] = "0123456789";
  	if (sizeof digits != write(fd, digits, sizeof digits))
  		err(1, "write");
  	if (-1 == lseek(fd, 0, SEEK_SET))
  		err(1, "lseek");  
  	if (-1 == fork())
  		err(1, "fork");
  	char ch;
  	switch (read(fd, &ch, 1)) {
  	case -1:
  		err(1, "read");
  	case 0:
  		errx(1, "read: EOF");
  	}
  	printf("%ld: %c\n", (long)getpid(), ch);  
  	return 0;
  }

would print '0' twice. However, it actually prints '0' then '1'.

Descriptor tables are per process, but the only thing a descriptor table entry stores is a flags field (basically, O_CLOEXEC/FD_CLOEXEC, plus maybe some esoteric platform specific flags), and a pointer to a file table entry data structure. Most state, like the O_NONBLOCK flag and file offsets, are kept in the [often shared] file table entry. The file table and its entries are completely independent from any particular process; in fact, traditionally there's only one global file table, just like there's only one process table.

These errors can usually be avoided if one always cite to a primary source (e.g. POSIX standard, vendor source code) for every assertion and/or validates the assertion with actual code. Maybe it's my legal training, but whenever I make an assertion, especially a technical assertion, I make it a habit of following those two rules, even when posting comments. And quite often I end up learning something new in the process.

copyconstruct · on Aug 28, 2017

I wonder what's the breakdown between unique files delivered as opposed to files delivered from the CDN cache. Also, what's the breakdown for file uploads, manipulation and delivery? The 350M API requests per day would make more sense if we get this brakdown

dmitrymukhin · on Aug 28, 2017

Cached/uncached file delivery is close to the universal 80/20 ratio. Cached operations are not included in that number.

Unfortunately, I can't say anything more than that.

copyconstruct · on Aug 28, 2017

Curious - does that mean you serve close to 1.75 billion requests per day, out of which 350M are unique requests that exercise your stack instead of being served from a CDN. It'd be interesting to know more about what's the number of transformations you do at peak, if you can talk about it.

copyconstruct · on Aug 11, 2017

Author of the article here.

That's primarily the reason why there are no "concrete examples", because one person's concrete example is another person's definition of contrived. Splitting hairs over some toy example wasn't something I thought would buttress the ideas presented, though I can imagine why some might need that scaffolding to follow along.

Re resting - here are some points the article makes about how smaller functions can in some cases hurt testing:

1) "Furthermore, when the dependencies aren’t explicit, testing becomes a lot more complicated into the bargain, what with the overhead of setting up and tearing down state before every individual test targeting the itsy-bitsy little functions can be run."

Disambiguating this for you, one of the myths of smaller functions is that they are easier to test. The article makes the claim that this isn't always true, because many who wax lyrical about the beauty of smaller functions also champion for fewer arguments to be passed to the function. The book Clean Code very explicitly states this (read the book, not as a how to guide but as a cautionary tale), and many a time what I've seen happen in the wild (especially in Ruby) is that programmers who like small functions and don't like making deps explicit - which then means the code (and ergo tests) rely on setting up shared global state.

Of course one can write smaller funcs with fewer args and not do this, but that's not the point here. The main argument is that making functions smaller doesn't always make it easier to test.

The article also does provide two examples when having the smallest possible function does help in testing. I'm not going to rehash those arguments again here.

sqeaky · on Aug 14, 2017

I have never read clean code and bringing it up as an argument against me and my points is a straw man. I don't think you meant for that though. Most of my knowledge was hard earned from a couple of decades of cleaning up disgusting long functions.

As for testing it is trivial to create examples of larger functions that cannot be tested each line of code introduces another possible thing that gets in the way of testing. I provided a contrived example (which is better than no example). The summary of it was that if a function creates connection to an external resource, uses it, the disconnects there is no way to test that function. Simply breaking it into three functions allows testing of the logic that uses the resource. Adding parameters that allows the resource or resource creator to be passed in allow testing of all three functions.

By skipping examples because someone will complain about how contrived it seems is to throw the baby out with the bathwater. As it stands your point is not falsifiable because it is possible for people on your side to just say "that's not a good long function" just as if it weren't a true scotsman. With examples you can at least ignore people who complain about the contrivance of them just as most successful technical speakers at conventions do.

> The main argument is that making functions smaller doesn't always make it easier to test

Only a Sith argues in absolutes... Sorry, I had to.

Seriously though know one seriously argues that it "always" makes it better. It just makes it better such a preponderance of the time that arguing against is foolish. It is like arguing for reasonable uses of goto, they might exist, but we as an industry have moved onto better designs.

copyconstruct · on Aug 11, 2017

Author of the article here.

Nope, the article doesn't conflate DRY and small functions. But the quest for DRY can lead to an explosion of small functions, which isn't necessarily a good thing.

The vice versa is true as well - many programmers, in their quest to make functions as small as possible, end up DRYing it up to the fullest extent as well.

There's a relationship between the two, but DRY and small functions aren't synonymous themselves, and neither does the article suggest that anywhere.

copyconstruct · on Aug 11, 2017

Author of the article here.

>The goal is not to make small functions for the sake of making small functions, but it's to compartmentalize some functionality into a nice, easier to reason about thing (function).

This tendency is exactly what the article highlights -- this need to compartmentalize, when taken to extremes, makes code a lot harder to read.

Not everyone does take it to extremes, but many programmers are partial to what I call "the smallest viable function" syndrome, and ergo don't stop compartmentalizing until they've abstracted away every last piece of logic. The article states that:

"Thus, a “single level of abstraction” isn’t just a single level. What I’ve seen happen is that programmers who’ve completely bought in to the idea that a function should do “one thing” tend to find it hard to resist the urge to apply the same principle recursively to every function or method they write."

>For people like me who struggle to maintain multiple layers of complex abstractions in our minds, being able to see a small function and say, "Ok, I trust this one - it does X." makes it easier to navigate up and down through the abstractions.

And that's the problem - needing to maintain what you very aptly call "multiple layers of complex abstractions". This especially hurts programmers new to the codebase (or worse, the language), since they have to juggle so many different layers of complexity. The article calls for reducing this complexity, instead of stacking more and more layers of abstractions in the name of "clean code".

copyconstruct · on Aug 11, 2017

Author here.

> "The claims that "all abstractions leak" and that adherence to DRY makes code "hard to follow" is what gives me this impression."

That's a simplistic -- and cherry-picked -- interpretation of my post, I'm afraid. DRY inherently doesn't make code harder to follow, but an explosion of ultra-small functions (sometimes done in the name of DRY) as advocated for by Fowler and Martin and their ilk most certainly makes the code a lot harder to read.

I'm afraid I'm not against abstractions either - I'm only questioning whether the bottom up form of thinking we generally tend to use is the best mental model around and whether it's doing us a disservice.

copyconstruct · on Aug 11, 2017

Author of the article here.

I find it really interesting you mention this:

>>"when you find yourself violating them, you're supposed to question why"

This is essentially what I see a lot of programmers (including myself) who've internalized these rules tend to do. However I wonder if we've got it backwards - in that, should we be thinking more in terms of about how we design our abstractions upfront and optimize for allowing ourselves enough wiggle room instead of applying the so called "best practices" right away and only stopping to think if something might be wrong when we explicitly violate some of these "best practices" like DRY or small functions or what have you.

I find a lot of us tend to lose sight of the forest for the trees when we focus on cosmetic things like function length. It's a bottom-up view of the abstractions we've built, and maybe we actually need to think about the top-down design more thoroughly?

copyconstruct · on Aug 11, 2017

Author of the article here.

A few things. First, I'm not sure you actually bothered reading the article, because the conclusion very obviously states what you're suggesting here, in that:

"This post’s intention was neither to argue that DRY or small functions are inherently bad (even if the title disingenuously suggested so). Only that they aren’t inherently good either."

As for the lack of examples, I'd imagine most people can extrapolate and draw analogies from their long and storied programming careers of the sort they stake a claim to. If there's something you'd like explained in more detail, let me know and I can try cooking up a contrived example, but it will remain somewhat contrived all the same.

This article isn't for duplication either - as a matter of fact it's against absolutes and blanket generalizations like "Code smells if your functions are longer than 3 lines", which is something I often come across.

I think what you call "mediocre programmers" -- personally though, I'd like to be more charitable and think of this demographic as the average programmer or the vast majority of programmers -- are also the ones most likely to cargo cult a piece of advice that's sold as "programming wisdom". Any "advice" needs to be taken with a grain of salt. The article goes on to state how important this is, as well:

"As with most other things, “the ideal” lies somewhere in between. There is no one-size-fits-all happy medium. The “ideal” also varies depending on a vast number of factors — both programmatic and interpersonal — and the hallmark of good engineering is to be able to identify where in the spectrum this “ideal” lies for any given context, as well as to constantly reevaluate and recalibrate this ideal."

whack · on Aug 11, 2017

Hi. I'm not sure you actually bothered to understand my comment, because it's making 2 main points, which you have glossed over.

1st: The importance of concrete examples in illustrating your point. Sure, we can all think of and agree on extreme examples of short-functions or long-functions which are bad, but what about more realistic examples. In Clean Code, Robert Martin presents many realistic and reasonable code snippets, featuring long-functions, which he then refactors into a form (ie, short-functions) which he claims is more readable and maintainable. For those specific examples, do you agree or disagree that his changes are an improvement? If you happen to agree, can you present other realistic examples of your own to illustrate your point? Doing so would allow us to have a more grounded discussion, comparing and contrasting two realistic alternatives. It would also allow us to better understand where you draw the line between too short and just-right.

2nd: Yes, too-short-functions and too-long-functions are both bad, but I disagree with your false equivalence between them. In my experience, the latter is a bigger problem than the former. I say this because this is the mistake that most mediocre programmers make. In my intro CS classes, I invariably see most people default towards writing their entire code in a single main function, with lots of code duplication, and this tendency tends to stick even afterwards. At my recent companies, I've even seen senior developers, people with fancy degrees and 200k+ salaries, write code that's horrendously hard to understand and maintain, because it's endlessly duplicated and squashed into a single long function. I don't doubt that there are those who take short-functions to an extreme as well, but in my experience, long-duplicated-functions are a much more prevalent problem, with greater downsides as well. Hence my point that short-functions and DRY works as a much better guideline than the converse.

Granted, my 2nd point above is my subjective opinion, and I agree with you that it's better to aim for the ideal, than to settle for erring on either side. So if you don't want to go down that rabbithole, I understand. However, I do think that presenting concrete realistic examples will go a long way towards enhancing this discussion. Looking forward to your follow-up blog post.

copyconstruct · on Aug 11, 2017

>At my recent companies, I've even seen senior developers, people with fancy degrees and 200k+ salaries, write code that's horrendously hard to understand and maintain, because it's endlessly duplicated and squashed into a single long function.

Ever wondered that they might be senior and commanding those salaries because they think and program a certain way? Have you ever discussed this with them? Many senior developers tend to shy away from abstractions in my experience, and they do it for a reason.

whack · on Aug 11, 2017

We're going far off into a tangent but to answer your question:

- the code in question constantly produced production bugs

- the code in question was extremely hard to debug when said production bugs surfaced

- everyone complained about that code being problematic, including other more senior engineers with 300k+ salaries

- all the above problems went away and everyone was pleased when I broke it up into smaller pieces

copyconstruct · on Jan 27, 2017

There was a good talk at Strangeloop about userland thread runtime scheduling in both the Go and the Erlang VM.

https://www.youtube.com/watch?v=8g9fG7cApbc