I will die on that hill, so here it goes again: The problem with NPM is not the amount of runtime dependencies.
Most Javascript projects would actually fare pretty well when compared to other languages if only runtime dependencies were taken into account.
Javascript staples like React, Vue, Svelte, Typescript and Prettier actually have zero runtime dependencies. Also, the ES6 standard library is not as bad as people claim.
The real problem is with development dependencies. The amount of dependencies required by Babel, Webpack and ESLint are the cause for 99% of the dependency bloat we complain about in JS projects. Those projects prefer to have monorepos, but when it's in your machine they're splitted into tens or hundreds of dependencies. Also remember that left-pad was only an issue in 2015 because a Babel package required it. If those projects were able to get it together we wouldn't even be having this conversation. Solve this and you'll solve the biggest complaints people have about JS.
I really would like to see a discussion on this, as most people seem to put a lot of blame on JS as a whole, while it's mostly a handful of popular projects generating all the complaints.
> I really would like to see a discussion on this, as most people seem to put a lot of blame on JS as a whole, while it's mostly a handful of popular projects generating all the complaints.
Discussion on what? You're right.
I know we joke about left-pad, but like you pointed out, a lot of big js hitters don't have many if any dependencies. That's true, but irrelevant.
Those dev-dependencies are still potential security threats, with all the minification and other crap, it's really hard to know what gets injected into the final runtime. And if not security, it's still development hell. Development, yes, but that's a if not the thing that programmers really care about.
And even if runtime dependencies are less common, there are a lot of developers that still do ascribe to using as many deps as possible especially because the web can be quite fragmented and they have to support a myriad of different target platforms. So even if it's a lesser issue, I think it's fair to talk about the js ecosystem as a whole when making criticisms about it's dependency disasters.
Development dependencies are especially terrifying because we generally don't use any sort of sandboxing. Any one of these dependencies could append to .bashrc to get your computer to run literally anything, and then hide the evidence.
And developer machines are particularly juicy targets because they often have ssh keys to production machines lying around.
This is precisely why I created https://gitlab.com/mikecardwell/safernode - I can run "npm start" or "npm install" just like any other nodejs developer, but node is not installed on my host, and my .bashrc and any other files in my homedir are not at risk or being read or modified
> If those projects were able to get it together we wouldn't even be having this conversation. Solve this and you'll solve the biggest complaints people have about JS.
So the way to go would be
1. compile a list of opensource-dependencies by usage/complexity
2. go through their dependencies
3. find easy fixes and create PRs for the projects.
That actually sounds like a good idea, but you need step two-and-a-half: convincing the maintainers of those dependencies that using this strategy is hurting the ecosystem.
> Solve this and you'll solve the biggest complaints people have about JS.
Or, perhaps, we can just tell people that their complaints are nonsensical when most of the deps are not actually packaged into the final app or being used at runtime, instead of contorting ourselves to fit their silly worldview.
I'm not losing any money over this. Why should I care?
Not saying this is automatically a problem, but you will lose money over this. When a build dependency is unavailable or broken, you cannot create production artifacts. You'll lose money over this because broken builds lead to incidents, longer time-to-recovery during incidents, and slower development. All of these cost money.
Your exposure to defects in your build dependencies will depend on a lot of factors, but the exposure will always be there.
Deps aren't unavailable or broken if I cache and pin them, and I do, because not having done so would have cost me a lot of time and money a long time ago.
People did lose money when left-pad was not available. Sure, it was only a day and the whole class of issue solved now, but there are other modes of attack.
And dev dependency bloat might be nonsensical to you but it's not nonsensical to me. Different people have different priorities, that's it.
So yeah, maybe you can afford not to care, but that doesn't mean you should give us that can't a hard time about it :)
I'm pretty sure then that if you actually knew what was inside your C compiler, you might lose your mind. There is something akin to an entire LISP inside gcc. And it uses its own host of deps, which themselves have deps... https://packages.gentoo.org/packages/sys-devel/gcc/dependenc.... It's not like these compilers are large for no reason, there's definitely code in them.
What do you consider the preprocessor if not bloat? CMake or ninja or whatever Google's build system is, is that bloat?
Maybe you just haven't seen a fun enough pipeline. But the web is sufficiently complex that you need a sufficiently complex set of developer tools to build for it in an optimal way. You can try and skip steps there but you'll likely just end up reinventing what others have built.
If you know the leftpad class or issues is gone, why bring it up? Why not just start with the "other modes of attack"? Why do your priorities, whatever they are, matter?
I'm not the one giving others a hard time here, the people who maintain that it is fun and trendy to hate on a technology they think is "bloated" (whatever that means) are giving me the hard time.
Yeah, it's called pinning. When you use yarn or npm, they generate a lock file that pins the exact state of your downloaded set of node modules, so you can't accidentally download a different set of artifacts that might be poisoned.
As for "but what if they hide it?" that's just a problem with compilers. A very "Reflections on Trusting Trust" sort of thing. I'm usually a few versions behind, so if anything like that were to exist it would have been caught by someone else.
> the ES6 standard library is not as bad as people claim.
I don't know how bad some people claim it is, but I claim it's pretty bad when it lacks basics such as string formatting, str{f,p}time, others that I can't bring to mind right now but I'm sure I'd find if I trawled through my old code.
1. Since they're hurting the perception of the community, maybe Babel/Webpack/ESLint could admit that using multiple packages is the cause of most complaints against the Node.js/NPM community, and revert to single packages? Babel already use a monorepo, so I don't really see any disadvantage from their side. Most Webpack loaders are maintained by the same people who maintain the core packages. We need the whole community to push for this. I've done my part but I'm only a single person. We need more people reproducing my claim, seeing with their own eyes and complaining to those projects.
2. The community could rally around projects that don't use multiple packages. Sebastian McKenzie (writter of Babel) is working on a transpiler/bundler/linter/tester called Rome. There's also SWC written in Rust, which is a transpiler, but they're working on a bundler.
3. Using ES6 modules directly in the browser instead of using transpiled/bundled output. I'm in Europe, which is a big holdout on modern browser adoption, but even considering this, 99% of our customers are on evergreen browsers. With HTTP2 and gzip on the server you get the same advantages of using Webpack+Babel+minifiers without the bloat on node_modules.
I’ve been playing with ESBuild, and it has definitely influenced my thinking that dev tools should be a single binary if even remotely feasible. It’s just so much easier to deal with!
Your comments makes it sound like you don't realize Javascript runs on the server where large stacks of transitive runtime deps are actually common, and you're suggesting that because there's another problem (development deps) then the other problem (runtime deps) doesn't exist.
Any programming language runs on a huge stack of dependencies. For most languages it‘s just named „stl“ which makes it okay for most people, funnily enough.
Hell, when is the last time you saw a company C++ project without boost and at least a dozen other libraries?
I do run a lot of JS on the server and the number of dependencies we normally run is not even in the same ballpark of the number of dependencies we have on the frontend... except when the backend project also uses Webpack/Babel/ESLint/Jest/etc.
There's no reason to have Babel/ESLint/Terser live in different packages: they're just parsers with slightly different outputs. There's no reason to have to parse your code 3 times during compilation.
Also, a bundler operating in the AST is much more efficient, as has been proven by Rollup. But parsing a 4th time is also a waste, so let's put it together with the same tool.
And while you're also there, your test suite also needs a "bundler" of sorts inside itself to transform the code and run tests. Why isn't it the same tool? Jest and Webpack have so many small differences that makes life difficult. So put it in the same tool, too.
However... another thing that I hope to see becoming popular is not using a bundler/transformer at all. Except for special situations, or for some regions, you don't really need transpiling/bundling it at all, since 99% of your visitors are on evergreen browsers.
By Evergreen Browser I assume you mean that the browser will have latest support for language syntax? That does not get rid of the problem of bundling though. And modern SPAs are so complex that bundling your own JS would be a nightmare. This is why NextJS is so great, they split the JS into SPA "pages" and let you dynamically load it from the server. I don't know how you could build that into native JS comfortably, since modules are smaller slices by far.
I would also add that we should be striving to release uncompiled JavaScript, since it is more auditable, more debuggable, and is naturally distributed. Dev dependencies are often basically compilers (Babel, React, TypeScript, Angular, etc), and JS compilers should be avoided if possible.
I still remember working for an old school .Net shop where we wrote raw es5 with one dependency: jQuery. We used discipline to get around the limitations of es5 which came naturally to a team of C++ developers. I probably have my rose tinted glasses on too tight but it felt pretty good and we were very productive.
We built our app on node and typescript, and I would never choose it again at this point because of the package ecosystem. We do a lot to validate integrity of packages (including checking in vetted archives to our repo), but it’s hard. Our images are ballooned to like 500-600MB (we’ve hit past the GB mark because of certain packages messing up dependencies before) based on a pretty conservative list of dependencies because of node_modules. I’m constantly fighting a battle against image size increases. The sheer amount of files in node_modules ensures that io is always a problem for image size and build speed on CI.
Solutions like yarn berry hardly help: zipfs and patched tsservers is annoying in many editors still. Often packages break because package maintainers include implicit dependencies or the packages their packages depend on do so. Arc has frozen emacs for me several times when jumping to definition in a zip.
So we’re using multi-stage builds (totally awesome feature!), only including production dependencies, and I’m —squash ing the final image.
The issue is that packages often balloon in size from an accidental transitive inclusion of Babel or webpack. Because JS packages depend on so much, the fanout virtually guarantees someone in your dependency tree will accidentally a dependency every now and then. It’s compounded by the fact that I have several different versions of the same package installed because folks don’t use peerDependencies when they should (and I’m afraid to pin most packages because I don’t want to introduce subtle bugs).
I’d imagine I could solve this for us if I had enough time, but startup life leaves little time for battling with the package manager. And it creeps up on you slowly.
(Also: we use next-transpire-modules @martpie, thanks for a great library)
Yarn 2’s Plug and Play has actually been a dream dep size wise. Once you get over the initial hurdles of setup, you can do things like vend your reps for zero installs. core-js alone (two major versions because @rjsf felt compelled to polyfill in a library) is 3K files worth of node_modules.
Once you get through unspecified dependencies of dependencies (and pnpifying tools), my last big hiccup with yarn 2 is that zipfs doesn’t work with vs code and typescript’s language server. So you can’t go to code or type definitions within zips.
What specific issues are you seeing? I was just trying out Yarn v2 for the first time over the last couple days. I ran the `yarn dlx @yarnpkg/pnpify --sdk vscode` command suggested on their migration guide, and that generated a config file. VS Code then suggested that I add a ZipFS-related extension, and once I did that, I could view typedef files from libraries.
> Some have argued that the ill health of the npm registry is a social, rather than a technical problem
In some cases it is, yes, for packages that require so many access privileges that they can subvert the entire system they run on.
But this is not the case for (I'd estimate) the majority of libraries, because they are purely computational, they only transform data and do not need access rights to any external interfaces (filesystem, network, user input, displays, ...). Malicious data generated by sandboxed programs is still a problem, still the problem would be localized.
There are efforts underway that would allow Javascript programs to effectively and economically sandbox each other and grant only the minimum number of privileges they need to perform their tasks: https://medium.com/agoric/pola-would-have-prevented-the-even...
So basically, rely on about 5% of JavaScript (my copy of JavaScript: The Good Parts is looking slimmer every day) and hope that everything you’re either directly or transitively exposed to has exactly the same standards you do and will continue to do so in perpetuity, and/or build tons of additional scaffolding to try to sandbox violators, because that has always been such a sure fire path to secure code.
The language, and it’s ecosystem, is a baroque Gormenghast of curiosities built on an ancient sewer where nightmare beasts still roam, and you’ll never stop it stinking just by holing up in the throne room and hoping a few trusted paladins will decontaminate the rest.
We keep throwing new shit at the wall, eventually something sticks. To a whole generation of developers it might like like JavaScript is the One True Web Programming Language but anyone whose lived through a few transitions knows that we replace entrenched technologies on the scale of decades, it has come and it will go like everything else, sic transit gloria lingua.
The usual (but not universal) trigger is a technological arms race between three or more competing firms attached to a compelling new idea.
Not sure how a new language can help with an ecosystem problem. In the old days, people write your own code and relied on a vendor provided standard lib, for example C++ stdlib, Java platform or Python's batteries included. Since software is expensive, to save time and money people started to rely on 3rd party libraries, conveniently delivered by package repositories, for example CPAN, PyPI or npm. A new language will be subject to the exact same cost and delivery deadline pressures. If anything, newer languages tend to have ecosystems even more dependent of 3rd party modules. The PL problem is largely solved, the ecosystem problem is not.
I think the fragmented chaos of the JS ecosystem arises as much from the structure of the language as it does the dismal standard library.
That said, I don't buy that language vendors skimp on their standard library due to some marginal cost issue. Quite the opposite, commercial PLs like Go, Kotlin, Swift and the .NET CLI family come with extensive and often surprisingly well-considered standard libraries, and even open-source projects do better than JS (the standout being probably Elixir since it inherits Erlang/OTP). The idea that JS's ecosystem is the template for future languages seems unsound, which is thankfully a relief since it would also be so disheartening.
> chaos of the JS ecosystem arises as much from the structure of the language as it does[...]
Not really. It arises from Sturgeon's law and the matter of accessibility/popularity. The problems of "the JS ecosystem" (correctly stated: the problems with the NPM and its community) are the same problems that plagued Java 15 years ago. (On the other hand, Java at least attempted in its design to enforce good practices at the language level instead of giving everyone an empty canvas, which in the JS world has been considered to be an endorsement that one can and should go absolutely nuts.)
> Sturgeon's law is universal, and it doesn't explain differences in the distribution of the crap.
What differences in distribution? You're either not absorbing what I wrote, or what Sturgeon said, or some combination of both. Sturgeon was responding to the criticism that sci-fi as a genre is bad because of how much of it is crud. Sturgeon's retort was that "Ninety percent of everything is crud."
JS is incredibly accessible and, as a result, massively popular (just like Java). 90% of a large number is a large number.
> I put it to you that this is recognising that the structure of the language is significant in the emergent behaviours and consequences.
Have a reread. Java attempted to enforce good practices by language design. And yet, Java is the posterchild of the sentiment that goes, roughly, "Java sucks—after all just look at its programmers and the ecosystem is has produced". (I.e., the same thing people say about JS.) But Sturgeon's law is inescapable. Despite the attempt, the Java ecosystem looks like crud. Why? Because Java is extremely popular, and 90% of everything is crud, and 90% of a large number is a large number.
> The major browser vendors are all horrendously self-serving, for example.
They are. It has very little to do with the NPM mess. NodeJS and the browser are at odds, with NodeJS having forked the language. (Just look at modules: NodeJS has a known-bad, non-standard module system, and there was serious discussion about whether it would even ever support ECMAScript's standard modules.)
I don't think that the same community producing huge quantities of single use libraries for the sake of padding their resumes will get involved with sandboxing. I recently installed a relatively simply piece of software using NPM and was stunned when it downloaded hundreds dependencies from god knows where, there's simply no ability for anybody to ever evaluate the security risk of NodeJS applications.
This module use widely used packages in node. In any non-trivial node project you would already have these. Whole point of small single use packages is to prevent from re-inventing a wheel. People are bashing node community without understanding.
Typical you would have people publish hundreds similar packages to solve a specific problem. Over time, best maintained, feature complete would "win" and become a standard, at this point other packages would converge and use these 1/2 top solutions. This process allow exploring large space of possible solutions and prevent app developers from NIH. There is more churn, but also more innovation and productivity.
> there's simply no ability for anybody to ever evaluate the security risk of NodeJS applications
So don't. Tell people that you're not going to run their NodeJS crud and let them know it would be better to write their scripts wherever possible to be able to run on the sandboxed JS runtime that everyone already happens to have installed: the browser.
I really like this idea, but it misses the appeal of nodejs and its ilk...that you can speak the same JSON-native language on the backend as the front.
I think it’s a VERY interesting idea to use a browser for running some tooling, but that’s really just what the idea behind nodejs and deno etc are, aren’t they? It’s just that node tries to be convenient by providing access to things that would need in a non-user context (like filesystem and C interop), thereby breaking the sandbox. Deno tries to give you the best of both.
Running this stuff in the browser is still “just running on V8”, but with extra steps. If you don’t like the npm ecosystem, I get it, but then running nodejs without any packages is comparable to what you’re describing in the link, isn’t it?
> that’s really just what the idea behind nodejs and deno etc are, aren’t they?
Yeah, I guess they're pretty much the same thing, except for all the stuff I mentioned.
> Running this stuff in the browser is still “just running on V8”, but with extra steps.
I don't know what this is supposed to mean. I don't know where this quote comes from, or what it's even trying to say. Nobody starts out with a computer with NodeJS and Deno installed and then has to go install a browser. Everyone already has a browser, on the other hand, and it's standardized and forward-compatible by design/commitment. From that perspective, NodeJS and Deno are '"just running on V8", but with extra steps' (truly—V8 exists for the Web browser and has only incidentally been lifted out and made to drive NodeJS, too—after a lot of contortions).
> you can speak the same JSON-native language on the backend as the front
Huh? Not only does this not make sense, but the backend appeal of NodeJS is completely irrelevant in matters of metatooling, aka "devDependencies" in NodeJS terms.
It looks like the library you are referencing touches lots of different proprietary hardware? While I agree with your overall point, a “universal wrapper” for lots of different things is exactly the sort of thing I would expect to have dependencies from who the heck knows where.
More languages should really be doing this and encouraging it. The JVM can sandbox pretty well using a security manager, but most people don't use the sandbox.
Only very few language also provide the type of security the JVM (partially) protects against: resource exhaustion attacks. Being able to prevent time (e.g. infinite loops) and space (memory allocation) exhaustion by being able to specify absolute or relative limits on these.
The principle of least privilege/authority has been around for a while, and the reason we don't see much adoption of it in real-world systems is not because it's unknown.
The first question is overhead: it's true that the majority of libraries are purely computational, but that means that there's frequent interaction between code written by the end developer and code from the library. If every call to, say, lodash's _.filter goes through a process to marshal the programmer's list, send it to a separate execution environment, and then marshal it right back in the other direction to call the predicate, people would choose not to use it. I do agree that the proposal in the post you link to seems to be on the right track - directly run the code in the current execution environment if it can be statically demonstrated that the code has no access to dangerous capabilities.
The second question is making the policy decision about whether to grant privileges. You might be familiar with this from your mobile phone: the security architecture is miles better than that of your desktop OS, but still, most people do say "yes" when asked to let Facebook, Twitter, Slack, etc. access their photos and their camera and their microphone, because they intentionally want those apps to have some access. What do you do in the above model when, say, the "request" library wants access to the network? Now it can exfiltrate all of your data. (The capability-based model is that you pass into the library a capability to access the specific host it should talk to, instead of giving it direct access, but again, if it did this, people would choose not to use it - the whole point of these libraries is to make writing code more convenient.)
The other problem, and perhaps the most important, is that purely-computational libraries can still be dangerous. Yes, _.filter (and perhaps all of lodash) is purely computational, but if you're using it to, say, restrict which user records are visible on a website, and someone malicious takes over lodash, they can edit the filter function to say, "if the username is me, don't filter anything at all." Or if you had a capability-based HTTP client that only talked to a single server, the library could still lie about the results that it got from the server.
I think the way to think about it is that the principle of least privilege is a mitigation strategy, like ASLR or filtering out things that look like SQL statements from web requests. ASLR mitigates not being able to guarantee that your code is memory-safe; if you could, you wouldn't need it. SQL filtering mitigates making mistakes with string interpolation (but it comes with a significant cost, so you really want to avoid it if you can). Least privilege mitigates the reality that you cannot code-review all of your code and its dependencies to ensure that it's free of bugs. But, on the other hand, a mitigation is not a license to stop doing the thing you can't do perfectly - it's just a safety measure. You can still have serious security bugs from buffer overflows even with ASLR; you just have fewer. You should not use ASLR as an excuse to write memory-unsafe code. You can still have SQL injection attacks from people being clever about smuggling strings. You should not use a WAF as an excuse to not use parametrization in SQL queries. And you can still have malicious dependencies cause problems even in a least-privilege situation, because they still have some privilege. You should not use it as a reason to run dependencies you don't trust.
> there's frequent interaction between code written by the end developer and code from the library
Microkernel operating systems that use capability security like KeyKOS or one of its distant successors, sel4 focus on optimizing the call procedure as best as they can. A small performance overhead for guaranteed security properties shouldn't be seen as a tradeoff.
>send it to a separate execution environment, and then marshal it right back[...]
Current runtimes of all kinds always need new instances of something to sandbox things, which incurs a big overhead everytime. But it doesn't have to be this way. The KeyKOS kernel was of fixed size, I built my https://esolangs.org/wiki/RarVM to be so, too, it is stateless. No need for several instances of the interpreter itself, and minimal overhead in the process snapshots.
>policy decision about whether to grant privileges [...] people would choose not to use it
The advantage of capability systems (that start with all rights given in every call by default) is that even if people do not initially use it, they can restrict rights later - "hollow out the attack surface", when this makes economical sense - when a library becomes popular, laws or contracts require it etc.
While users may not interact with it directly or choose not to use it, such systems grant developers at least the ability to secure their software internally, something that is not even possible now.
> [...] it's just a safety measure [...] purely-computational libraries can still be dangerous
But this argument doesn't attack the central point. Yes, this is true, but has nothing to do with the security properties offered by POLA architectures like capability security. It's an orthogonal problem, which has to be mitigated by other mechanisms, possibly social.
I think we're disagreeing on what the central point is, then. I think least-privilege architectures are great, and I use them for many things. I think they do not save you from the problem being addressed in this article. That is, do not read what I'm saying as an argument against least-privilege architectures: read it as an argument against using that hammer to drive in this screw.
In turn, I think that means that there isn't enough justification for using them in this case that users will feel like the additional complexity of wiring through least-privilege across their libraries is worth it. Even if you take the approach of incrementally adding the security to the existing design, the implication is it won't actually be securing end users for a long time, and only against minor and unlikely threats at first, but it will impose increasing complexity all along.
KeyKOS is great and I've read about it and tried to adopt its lessons in my own designs, but the fact remains that KeyKOS is dead. And I certainly agree that a small performance overhead for guaranteed security properties shouldn't be seen as a tradeoff (assuming it is in fact solely a performance overhead, and not a developer mental burden, nor a reviewer mental burden, nor an operational burden) - but I'm not commenting on what I see, I'm commenting that the vast majority of NPM users will see it as a tradeoff, regardless of what you and I believe.
While most Smalltalk-like languages are good candidates for such security mechanisms, they have never been integrated, perhaps because the mostly academic environment in which they are developed is more cooperative than malicious.
Trying to solve the halting problem are we. Remember that one of the most dangerous JavaScript APIs turned out to be a sub-millisecond monotonically increasing time source.
You can do it the right way, by using total functional programming. Or you can do it the wrong way, by providing a time budget or "gas", and yanking the process if and when it's exceeded.
No, this is more of a Rice's Theorem kind of situation...
Why do you think using 'gas' is the wrong way? Not only does it solve this problem, it solves it in a deterministic way, which has many applications, for example deterministic debugging and replicable computation!
Here's my hot take: supply chain attacks are a low risk for your organization - they are both low likelihood and low impact.
1) Low likelihood: when popular packages get subverted it is caught quickly due to how widely packages are distributed. After it's caught the problem is also heavily publicized for folks to take action, and registries remove the affected versions immediately so there is a very small exposure window.
2) Low impact: people who write malicious code into these packages don't have a specific target, they are writing dragnet malware, which typically means mining cryptocurrency or ransomware. If you're going to get hacked then that's the best possible outcome (as opposed to, e.g. a data breach).
Your security posture would have to be superb if supply chain attacks were anywhere near the top of your list - for the majority of companies they have more basic and targeted issues to worry about.
Eh... I don’t share your cavalier attitude. You assume these attacks aren’t targeted just because we haven’t seen them, but it wouldn’t be hard at all for an attacker to take control of a package through some means (purchase, social engineering, or just solving a problem more efficiently than others do and aggressively asking others to adopt it), then publishing to npm a minified version of the package which includes some targeted exploit that doesn’t activate except in a specific environment. The source on GitHib would ofc not include the exploit, and there’s no push for reproducible builds in the npm world so verifying that npm’s minified JS was built from the GitHub source is nontrivial and not something most shops would bother with.
Unfortunately, targeted attacks have been seen in the wild. The `event-stream` attack linked in the post was one example. Alternatively, look at the attack on the Agama cryptocurrency wallet —- the attackers even managed to exfiltrate private wallet keys there: https://komodoplatform.com/update-agama-vulnerability/
This is part of the reason why current languages and operating systems simply do not have security properties that would inhibit or entirely prevent these risks: it never mattered economically enough to implement them. Big corporations insure themselves against these risks financially (if at all), not technologically.
The other big reason has been having to maintain backward compatibility, personal computers and programming languages built for them were only networked late compared to some mainframe systems. There have been very interesting historical networked operating systems that were far more secure in their architecture than current contenders: https://github.com/void4/notes/issues/41
There's a peculiar dynamic in the npm ecosystem that folks who publish libraries naturally fully embrace the ecosystem, and thereby have a lot of other library dependencies themselves.
I think most engineers would not have _directly_ introduced something like left-pad into their production application dependencies since that's something people would typically implement themselves, but people who publish open source libraries and embrace the ecosystem would gladly use someone else's package for that since they're also publishing with the expectation that someone will do the same with their own work.
It seems wrong to blame open source producers for using the work of other producers and thereby introduce a deep dependency tree, and yet the security concerns are completely valid. I personally don't have any ideas for a solution, but it's worth thinking about.
It's worse than this, not NPM specifically, rather github's atrocious permission system. Tons of github integrations ask for way to big of permissions basically allowing any of those companies, or disgruntled employees, bribed employees, breach data holders, to hack your repos.
This isn't just npm, it's any dev who runs a library hosted from github who signed up to allow random 3rd parties write access to their repos. Could be C++ library, C#, a VSC plugin, a Unity asset. Tons of devs sharing code and giving out write access to that code.
The Node standard library doesn't do enough compared to Python. Python in a locked down environment ( you can't just install whatever you want ) isn't bad.
Node is a nightmare without being able to install various packages from npm. Thus someone can remove Left Pad and it's the end of the world. I switched from React Native to Flutter for mobile app development and it was one of the best decisions I've ever made
This is an outdated perspective, unfortunately. Left-pad is a 2015 problem. In 2020 we already have padStart in every evergreen browser and in Node.js, and it's been there for years.
The reason the JS library is slimmer than Pythons is because it's mostly a client-side language. It doesn't have to handle Unix filesystems, web crawling or even security, because the client-side is not really the place for it, so comparing Python to JS is not really useful, because Python is mostly used in the server-side, and most server-side JS projects already use very few dependencies.
Most of the complex things like URL Requests and HTML parsing are handled by the browser or the DOM.
And it's the client-side projects that are bloated and using hundreds (if not thousands) of dependencies. But those dependencies are not there because the language is lacking. They're there because popular packages like Babel, Webpack, ESLint and others chose to use NPM modules to organize themselves, instead of using function and classes like everything else.
> The reason the JS library is slimmer than Pythons is because it's mostly a client-side language.
This comment is a bad argument. The web API in the browser is substantial, from USB communication to XML parsing.
None of that is shipped with Node.js directly for instance.
The problem is not the Javascript spec, the problem is that one has to download a package to parse XML with Node.js. Node.js standard library is too barebone.
You don't have to do that in the client. You have loads of web API, perhaps more than Python's standard library.
> Web API is sandboxed with a permission model in the browser.
And? your comment has absolutely nothing to do with my point which is that JS in the browser has access to a huge standard API compared to Node.js which relies too much on third party packages, unlike Python or Go.
>They're there because popular packages like Babel, Webpack, ESLint and others chose to use NPM modules to organize themselves, instead of using function and classes like everything else.
You can't build anything without Babel for the most part since ES6 code needs it to transpile down to ES5.
The end result is JavaScript's entire ecosystem is a fragile mess.
As other posters have already answered you: no, you don't need it.
You can even use ES6 modules directly in the browser, including import/export syntax. When I measured back in 2018 100% of our users of the company I was I was in had support for that. With brotli and HTTP2 the performance is comparable to bundled code.
The only holdover is JSX, of course, and maybe some things like the ?? operator that don't work in the client-side yet. But Typescript can take care of that. You can even tell Typescript to target ES6 so you can use ES6 modules in the browser, and this will let you eschew Webpack.
No. Typescript is its own entirely separate standalone compiler toolchain.
If you are using Babel as a compiler, you can enable the TS syntax plugin to have Babel strip TS type syntax out of its processing output, although this does not do any typechecking during the compilation process.
Every major web framework's standard build tools (Create-React-App, Vue-CLI, etc), are based on Webpack + Babel, and most folks who are using one of those frameworks but setting up their own build toolchains still base them around Webpack + Babel.
There's some potentially interesting technical alternatives out there, but Babel is a keystone in the modern JS ecosystem.
npm needs to sort its quality issue first, but this could also be fixed with there being a better core javascript library.
Take https://www.npmjs.com/package/is-odd for example. This should not be a package. Why it is even allowed to be one is insane. Do the developers importing it don't know how to test for that themselves? Should it be part of core javascript?
Does that package exist because JavaScript lacks a modulus operator (I feel like I remember it having one), or because the operator does/doesn’t coerce things into numbers the way you’d expect? Or is it honestly just laziness?
>Does that package exist because JavaScript lacks a modulus operator (I feel like I remember it having one), or because the operator does/doesn’t coerce things into numbers the way you’d expect?
The package is written in javascript and uses the modulus operator.
It's just laziness and a cargo-cult mentality around package granularity that's gotten way out of hand. There's no rational basis for it.
The point is that you don’t need “is_odd” in the standard library, you have modulus. You were arguing that JavaScript needs a better standard library to keep people from importing is_odd
Yeah, I tried to make an invalid point. My bad. The main point still stands though, I'm sure there's hundreds of other examples of pointless libraries that are easily covered by something that is trivial.
You were right - Javascript lacks a modulo operator. Javascript does have a `%` operator, and in many languages that performs the modulo operation ... but in JS it actually calculates the remainder. This is confusing because in some situations remainder and modulo look similar, but they're not. So if we compare JS and Python (where `%` means "modulo") ...
$ python3
Python 3.7.3 (default, Jul 25 2020, 13:03:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 10 % 3
1
>>> 10 % -3
-2
>>>
$ node
> 10 % 3
1
> 10 % -3
1
In the case we're talking about - is_odd - I don't think it matters, but there is a difference and it could cause some surprises if you're not prepared.
edit: I called this operation "remainder" ... but you don't exactly say "10 remainder 3" for this operation. I don't know what to call it, "divide-by-and-return-absolute-of-whats-left" is not very catchy
Outlier is a bit of a strong word. Ruby and many others[0] implement it the same way. Personally, I think the way Ruby does it is the most intuitive. The % operator is a modulo, and there is a separate method for remainder. Even though I prefer this way, I'm dismayed that there are multiple ways of doing it. Like so many things created by humanity, we just can't ever seem to take one definition and stick to it.
Brainstorm: Could it help to run 'coverage' through the web of dependencies and shake out all the code that is not explicitly exercised by an app test? Is that technically feasible? Is that cost effective?
I like the idea - it would at least help to visualize how much useless code you pull in and hopefully to make you realize how the few parts actually used could be implemented without those dependencies.
the question of trust is the wrong question to ask imo. the bigger threat is unmaintained packages and maintainers that themselves use risky code management practices (e.g., lack of 2fa on npm). that a dependency is heavily relied on does not mean its maintainer is following best practices security-wise or that they arent just as susceptible to social engineering as anyone else.
The strong-set of such nature doesn't come with much guarantees beyond past-history of the said users. For eg, having commit rights to Debian requires a certain level of security know-how, being an Arch Trusted User has similar requirements (they moved to yubikeys everywhere a while back for eg).
We don't even know if all these users have 2FA enabled for their NPM accounts. Building a software distribution ecosystem that offers trust guarantees post-facto is a really hard challenge, and I think that the right answer is in providing developers better sandboxes. That's not to say this can't be used as a signal as the author suggests, just that the "strong-set-user/package=safe" guarantee doesn't have an underlying basis as of yet.
> just that the "strong-set-user/package=safe" guarantee doesn't have an underlying basis as of yet.
Author here —- I agree. There can be no guarantees about the safety of a package based only on its maintainer(s); their accounts could be taken over, or they could be paid off, and so on. I’m hopeful about initiatives like Deno that provide better security controls built-in to the language.
A significant hurdle to overcome is getting npm (and all open-source) developers to think about trust in the first place. The event-stream incident happened when the previous maintainer handed over control to a random stranger that showed up. We’ve seen similar things happen in other attacks. The thought at this point is that by making trust more explicit, we might start a move in the right direction.
I will bet a lot of the NPM dependency problems can be solved if Node directly implemented many of the Web APIs. If PHP can implement Dom Parser, there's no reason Node can't implement it as well, for example.
There’s a pretty good reason, actually, and it’s exactly npm.
Because dependencies are managed so easily in Node, it makes no sense for Node core developers to implement more and more of the ever-expanding APIs offered by the browser. They’re better off spending that time tightening the system and perhaps offering low-level interfaces.
> There’s a pretty good reason, actually, and it’s exactly npm.
You mean whoever created NPM the business profited from Node JS barebone standard library, which Node JS creator himself regretted creating an opportunity for a commercial package manager and registry by making his creating depend on such commercial product.
The description of that dependency used by the BBC makes me wonder why trust is somehow based on popularity. What if the BBC got duped into using a dependency from a bad actor? Is that package trustworthy now?
I wonder if the package repos could come up with some type of standardized, domain verified organization namespaces. I was able to register a decent .com a couple years ago and immediately ran around registering the matching namespace everywhere. That feels a bit dumb when I have a globally unique identifier (the domain) sitting right there.
Why can't I have `example.com` as my organization on NPM? I realize there would be a little complexity in domains changing ownership or being abandoned, but I feel like that's already an issue with first come, first served namespaces. It's just glossed over with the assumption no one will ever give away their account / namespace which isn't true. Is there a way to tell if an organization's owner has changed in NPM?
A domain verified namespace could be on equal footing pretty quickly IMO. If it's limited to organizations, which makes sense to me, have a requirement for the domain owner to declare the official owner of the namespace via DNS or a text file under `/.well-known/`. Ex:
npmjs._dvnamespace.example.com TXT ryan29
Now `ryan29` can claim or take ownership of the `example.com` organization. Every time an artifact is published, that record could be checked to ensure `ryan29` still owns the organization. If it doesn't match, refuse to publish the artifact.
In effect, it's saying "example.com is delegating ultimate trust for this namespace to the user ryan29". If the domain expires, no one can publish to that namespace. If someone new registers the domain and claims the namespace by delegating trust to a new owner, that works as a good indicator that everyone pulling artifacts from the namespace should be notified there was a change in ownership.
It seems like a waste to me when I'm required to register a new identity for every package manager when I already have a globally unique, extremely valuable (to me), highly brandable identity that costs $8 / year to maintain.
Edit:
To add one more thought, I've always been of the opinion that ultimate trust needs to resolve to an individual, not an organization. That probably needs to be done via certificates or key signing and should be done by a local organization.
If I could dictate a system for that, I'd use local businesses to verify ID and sign keys. For example, I'm from Canada and would love to go into Memory Express with my ID and have them sign my GPG key.
I don't think you can get a real WoT like what I think was originally the intent for GPG. There are just too many bad actors these days. I think verifying identity and tying stuff back to a real person is the best you'll get.
An no, I don't want the current code signing style verification. It sucks and the incumbents are nothing more than a bunch of rent seeking value extractors.
I don't like where this is going. Especially using number of dependents as a measure of trust. Popularity has nothing to do with trustworthiness (it just makes a problem less likely to occur, but when a problem does occur, it will be a lot worse; and npm has in fact encountered such issues in the past).
Just look at the real world: Is the Federal Reserve Bank a trustworthy institution? Sure, there are a lot of people using its product (the US dollar) so it's extremely popular, but is it trustworthy? Is the product actually what its users think it is?
Power structures are very much the same in open source. The ecosystem has been highly financialized; a library is popular because its author has a lot of rich friends who helped them to promote it on Twitter or elsewhere. So if you don't happen to have rich friends, does that make you untrustworthy?
This would lead to censorship of good projects from trustworthy people who have genuinely good intentions.
I think that such algorithms have done enough damage to society already.
I mean... I would consider building a business based off the assumption that the Fed will operate how it documents itself to operate and not do things fraudulently or covertly, to be a lot lower risk than, say, building a business based off assuming the same of, say, Tether. Yeah, I'd say the Fed is pretty trustworthy, and the fact that a lot of people depend upon it is a signal of that(not a proof, or a guarantee, but a signal, same as in the library dependency example)
I agree that money = popularity = trust is a risky system. Fraud and scams are high margin activities, so bad actors can end up with more money to spend than a lot of legitimate developers.
It's pretty ridiculous that we have real name policies for social networks, but the dev dependencies for a basic web app can have thousands of unnamed contributors. We really need a low friction system where individuals can start signing their code with verified identities.
If I pull in 1k dev dependencies via NPM, I should be able to get a list of the 1k developers that signed off on those packages. If no one is willing to step up and put their name on a package it shouldn't be used by major projects like React, Vue, etc. IMO.
Most Javascript projects would actually fare pretty well when compared to other languages if only runtime dependencies were taken into account.
Javascript staples like React, Vue, Svelte, Typescript and Prettier actually have zero runtime dependencies. Also, the ES6 standard library is not as bad as people claim.
The real problem is with development dependencies. The amount of dependencies required by Babel, Webpack and ESLint are the cause for 99% of the dependency bloat we complain about in JS projects. Those projects prefer to have monorepos, but when it's in your machine they're splitted into tens or hundreds of dependencies. Also remember that left-pad was only an issue in 2015 because a Babel package required it. If those projects were able to get it together we wouldn't even be having this conversation. Solve this and you'll solve the biggest complaints people have about JS.
I really would like to see a discussion on this, as most people seem to put a lot of blame on JS as a whole, while it's mostly a handful of popular projects generating all the complaints.