Criticizing Zotero for privacy, of all things, is a bit bizarre. Zotero is an open-source project from a nonprofit organization with no financial interest in people's research data. It's designed as a local tool specifically to give people complete control over their data, and it's developed in the open. Most similar tools are proprietary programs owned by major publishers or analytics companies with voracious appetites for data.
The page you linked to explains the reasons for every single network connection that Zotero makes and how to disable it. Every one enables a specific Zotero feature — push-based auto-sync, fast translator updates as sites change to minimize save failures, open-access PDF retrieval. When we implemented retraction notifications, we even did it using k-anonymity to avoid sending up library data from people who don't use syncing.
We're always happy to discuss design decisions in our forums, but I'd argue pretty strongly that privacy is one of the main reasons one should use Zotero, not the other way around.
You do realize that (unless something changed recently) all your metadata is still on their servers? Last time I checked, a Docker image of the sync server was in the works though.
...and you don't automatically upload pirated scihub papers to Elsevier cloud storage, like Mendeley does. That feature alone makes you a winner on the privacy front as far as I'm concerned!
This conversation is now days old, and people have moved on, but here's one quick belated thought. No reply needed, just fyi, fwiw. I was at an embassy party years ago, and apparently a friendly European embassy, in a US city, had a couple of people keeping track of local research, and doing heads-up for their national industry and academia. That's just one embassy, in one city. I suggest Zotero has very different threat profiles across different fields and research topics. And for some, the possibility of state actors should be included. Which suggests a need for users to easy notice and adjust their exposure profiles. And is a reminder that the user data Zotero servers are least likely to compromise, is data that's never seen at all. Fwiw. Thanks again for your work.
Love Zotero. I'd love to install it on my own server instance, although, I think by renting the Zotero storage space now, I'm helping support you and the whole Zotero project.
Nod, a local tool. I have various expectations of my local tools. And if I, say, start Zotero in the morning to read a paper, then exit it for a meeting, then return to it afterward, and then exit for lunch, then at least my own expectations for a local tool are, for example, in tension with those four centralized timestamps. As are the varying tcp routes as I move my laptop among buildings. As is the request when I surf to the NYTimes during lunch.
So what does privacy best practice look like? One comment here suggests the ability to fork and edit the code. Another notes the linked documentation, and being more ethical than Elsevier. The linked page notes the existence of scattered opt-out options. And also "You can avoid these requests by keeping Zotero open while you browse the web."
My own understanding of privacy best practices, includes data exposure being opt-in rather than opt-out, and those privacy preferences being easily seen and changed in one place. My impression is Zotero doesn't do these.
And that's just Microsoft-style privacy practice. It would be even nicer to have knobs, like "check for updates every <start/day/week/...>".
> Criticizing Zotero for privacy, of all things, is a bit bizarre.
I'd be fine with "we have limited resources; know privacy is important; are improving; know we have work to do to implement best practices, are working towards it".
But my own fuzzy long-term impression has been, that such recognition has not been proportional to the potential degree of privacy exposure.
I think it's important to look at these things in the context of the features they're enabling and user expectations. The fact that Zotero is a local, configurable, open-source tool is what gives you complete control over it, but it's not just a local database. It's deeply connected to a world of constantly changing websites, metadata sources, and services, and using most of Zotero's features implies relying on those things. If you want to save metadata from a website, Zotero might need to retrieve metadata from Crossref. If you want it to find an open-access PDF, it needs to connect to an online database to check for one. And if you want saving to continue working as sites changes, it needs up-to-date translators. From a normal user's perspective, the alternative is just Zotero not doing the things they downloaded it to do.
> My own understanding of privacy best practices, includes data exposure being opt-in rather than opt-out
Surely you don't expect software to default to not receiving updates automatically? As the linked section says, if you disable translator/style updates and don't use auto-sync, there won't be a persistent connection. But if a high-profile site breaks and we roll out a fix, the longer the delay the more people will just get an error trying to save.
> those privacy preferences being easily seen and changed in one place
We document every single network request that Zotero makes. Expecting them to all be configurable in one place in the software just isn't reasonable. Normal users think of features, not HTTP requests, and auto-sync doesn't have anything to do with translator update checks.
> I'd be fine with "we have limited resources; know privacy is important; are improving; know we have work to do to implement best practices, are working towards it".
OK, but I'm not saying that. I'm saying we consider privacy in all our decisions and believe we've made the right calls (and, for what it's worth, I can't recall a single complaint about our approach to privacy in many years). If you disagree with a specific decision, that's fine — come to the forums and we can discuss. But let's be clear about the features that would break for users as a result.
Thanks for your thoughtful replies. I see one clear disagreement, and speculate about a more-root divergence.
> configurable in one place in the software just isn't reasonable [...] auto-sync doesn't have anything to do with translator update checks
Microsoft has in one place (something very vaguely like) toggles to control the uploading of web history, hand writing, voice commands, and more. Different features of different apps. With explanations of the functionality lost if the user doesn't opt-in to each. One place, for privacy preferences.
The Zotero privacy documentation page similarly gathers in one place, recipes for opting-out of network-based features, with descriptions of use.
Software preferences having a privacy section is a thing. Firefox, chromium, etc.
I'm unclear on why it isn't reasonable for Zotero software to have similar.
> we consider privacy in all our decisions and believe we've made the right calls [...] If you disagree with a specific decision, that's fine — come to the forums and we can discuss
I suggest there's currently a shift in privacy best practices, from one-size-fits-all "make the right calls", to having user preferences for privacy.
So that's the sort-of clear disagreement.
But part of it may be a deeper difference in perspectives...
perhaps call it network minimalism.
When using Zotero, I'd spend more time grovelling over previously collected papers, than collecting new ones. A task that could be done, without loss of functionality, with the net disconnected. My expectation then is, that this local tool, working with local data, will not then start using the net merely because it becomes available. Or rather, that I can easily dissuade such behavior.
Now perhaps that expectation is becoming "old fashioned", as we switch from desktop, to phone apps with only lightly bridled communication lives of their own.
Which might be an underlying issue. Zotero might be thought of as a phone app which just happens to run on desktop-local data. Or it might be thought of as a traditionally desktop application. Design decisions appropriate to the former, might feel a bit odd in the latter. "Local tool" might mean different things.
> I can't recall a single complaint about our approach
In this thread, there was someone suggesting my short paraphrasing of the linked docs was getting it totally wrong. I'm not sure how widely your users are even aware of the approach. It seems users generally aren't. Which, tying things back around, is one of the motivations for having clearly explained privacy preference options.
Thanks for an interesting conversation. Just in case you haven't seen it, the subthread with jmiserez might also be of interest.
> Software preferences having a privacy section is a thing. Firefox, chromium, etc.
Yes, and Firefox's Privacy & Security section doesn't cover Firefox Sync, the default search engine, search bar suggestions, the new tab pane, the default homepage, or app update checks. Those all make network requests to various services, and they're all controlled in their own sections in the preferences where they make more sense. And you can't turn off loading a website when you enter a URL.
Grouping a few more prefs together in Zotero might make sense, but in a modern, web-connected tool, there's just a lot of functionality where the network connectivity is implicit. The main difference in Zotero is that we document it all and tell you how to turn it off.
Specifically switched over to Zotero because of the non-profit status. The only privacy feature request I'd make is to allow some kind of self-hosted sync, i.e. a deployable TLS sync server + preferences entry to specify a sync server ip/port. I imagine it would take load off you guys for syncing, and people would end up hosting sync servers for small groups on university networks.
Criticizing Zotero for privacy, of all things, is a bit bizarre. Zotero is an open-source project from a nonprofit organization with no financial interest in people's research data. It's designed as a local tool specifically to give people complete control over their data, and it's developed in the open. Most similar tools are proprietary programs owned by major publishers or analytics companies with voracious appetites for data.
The page you linked to explains the reasons for every single network connection that Zotero makes and how to disable it. Every one enables a specific Zotero feature — push-based auto-sync, fast translator updates as sites change to minimize save failures, open-access PDF retrieval. When we implemented retraction notifications, we even did it using k-anonymity to avoid sending up library data from people who don't use syncing.
We're always happy to discuss design decisions in our forums, but I'd argue pretty strongly that privacy is one of the main reasons one should use Zotero, not the other way around.