Hacker Newsnew | past | comments | ask | show | jobs | submit | mweidner's commentslogin

I'm surprised to see the emphasis on tracking lines of text, which ties in to the complexity of merge vs merge-the-other-way vs rebase. If we are committed to enhancing the change history, it seems wiser to go all in and store high-level, semantically-meaningful changes, like "move this code into an `if` block and add `else` block ...".

Consider the first example in the readme, "Left deletes the entire function [calculate]. Right adds a logging line in the middle". If you store the left operation as "delete function calculate<unique identifier>" and the right operation as "add line ... to function calculate", then it's obvious how to get the intended result (calculate is completely deleted), regardless of how you order these operations.

I personally think of version control's job not as collaborating on the actual files, but as collaborating on the canonical order of (high-level) operations on those files. This is what a branch is; merge/rebase/cherry-pick are ways of updating a branch's operation order, and you fix a conflict by adding new operations on top. (Though I argue rebase makes the most sense in this model: your end goal is to append to the main branch.)

Once you have high-level operations, you can start adding high-level conflict markers like "this operation changed the docs for function foo; flag a conflict on any new calls to foo". Note that you will need to remember some info about operations' original context (not just their eventual order in the main branch) to surface these conflicts.


You can think of the semantics (i.e., specification) of any CRDT as a function that inputs the operation history DAG and outputs the resulting user-facing state. However, algorithms and implementations usually have a more programmatic description, like "here is a function `(internal state, new operation) -> new internal state`", both for efficiency (update speed; storing less info than the full history) and because DAGs are hard to reason about. But you do see the function-of-history approach in the paper "Pure Operation-Based Replicated Data Types" [1].

[1] https://arxiv.org/abs/1710.04469


While this is technically correct, folks discussing CRDTs in the context of text editing are typically thinking of a fairly specific family of algorithms, in which each character (or line) is assigned an immutable ID drawn from some abstract total order. That is the sense in which the original post uses the term (without mentioning a specific total order).

The rebasing step is indeed a transformation. Some info in the "rebasing" link here [1].

Unlike traditional Operational Transformation, though, there are no "transformation properties" [2] that this rebasing needs to satisfy. (Normally a central-server OT would need to satisfy TP1, or else users may end up in inconsistent states.) Instead, the rebased operations just need to "make sense" to users, i.e., be a reasonable way to apply your original edit to a slightly-further-ahead state. ProseMirror has this sort of rebasing built in, via its step mappings, which lets the collaboration-specific parts of the algorithm look very simple - perhaps deceptively so.

[1] https://prosemirror.net/docs/guide/#collab [2] https://en.wikipedia.org/wiki/Operational_transformation#Tra...


Author here, just chiming in to say that Matt has an actual PhD on the subject so rather than explain it worse, I will just let him say the probably-actually-correct thing here.

The PowerSync folks and I worked on a different approach to ProseMirror collaboration here: https://www.powersync.com/blog/collaborative-text-editing-ov... It is neither CRDT nor OT, but does use per-character IDs (like CRDTs) and an authoritative server order of changes (like OT).

The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.

This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.


Hi Matt! Good to see you here. For those who don't know, Matt also wrote a blog about how to do ProseMirror sync without CRDTs or OT here: https://mattweidner.com/2025/05/21/text-without-crdts.html and I will say I mostly cosign everything here. Our solution is not 100% overlap with theirs, but if it had existed when we started we might not have gone down this road at all.

Your part 1 post was one of the inspirations for that :)

Specifically, it inspired the question: how can one let programmers customize the way edits are processed, to avoid e.g. the "colour" -> "u" anomaly*, without violating CRDT/OTs' strict algebraic requirements? To which the answer is: find a way to get rid of those requirements.

*This is not just common behavior, but also features in a formal specification [1] of how collaborative text-editing algorithms should behave! "[The current text] contains exactly the [characters] that have been inserted, but not deleted."

[1] http://www.cs.ox.ac.uk/people/hongseok.yang/paper/podc16-ful...


This was my impression as well. If you ignore the paper and just look at the source code - and carefully study Seph Gentle's Yjs-like RGA implementation [1] - I believe you find that it is equivalent to an RGA-style tree, but with a different rule for sorting insertions that have the same left origin. That rule is hard to describe, but with some effort one can prove that concurrent insertions commute; I'm hoping to include this in a paper someday.

[1] https://josephg.com/blog/crdts-are-the-future/


Yes, I think it would be a good paper.

I made a tiny self contained implementation of this algorithm here if anyone is curious:

https://github.com/josephg/crdt-from-scratch/blob/master/crd...

FugueMax (or Yjs) fit in a few hundred lines of code. This approach also performs well (better than a tree based structure). And there's a laundry list of ways this code can be optimised if you want better performance.

If anyone is interested in how this code works, I programmed it live on camera in a couple hours:

https://www.youtube.com/watch?v=_lQ2Q4Kzi1I

This implementation approach comes from Yjs. The YATA (yjs) academic paper has several problems. But Yjs's actual implementation is very clever and I'm quite confident its correct.


Managing "a flat-ish collection of nodes" that can be moved around (without merely deleting and re-inserting nodes) is tricky because of how paragraphs can be split and merged. Notion tackled this for their offline mode: https://www.youtube.com/watch?v=AKDcWRkbjYs

If you take that as a solved problem, do your concerns change?

> Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both?

You could render them in the DOM as an old-fashioned tree, while internally manipulating your "flat" IR, to make selections work nicely.

This is not too different from how Yjs-ProseMirror works already: Yjs has its own representation of the state as a CRDT tree, which it converts to a separate ProseMirror tree on each update (& it uses a diff algorithm to map local user edits in the other direction).

> Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?

This might be how Dato CMS works? https://www.datocms.com/docs/content-modelling (I say this based off of 5 minutes spent watching someone else use it.)

> Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?

For lists specifically, I would argue the latter. It's natural to think of a list as a flat sequence of list items, in parallel to any surrounding paragraphs; forcing you to wrap your list items in a UL or OL is (to me) a browser quirk.

I made some progress fighting this in Tiptap: https://github.com/commoncurriculum/tiptap-extension-flat-li... Quill.js already models lists in this "flat" way.


Your reply hits the real tension: a flat model simplifies layout changes, but it shifts complexity into how you map edits and selections. That trade‑off feels worth it if the goal is “safe structure changes” and AI‑driven transforms.

On the split/merge issue: in a flat model, the split/merge doesn’t have to be a structural operation at all. It can live entirely inside the block’s text content. The block keeps the same ID, and only its content changes. That avoids the “delete/reinsert” problem and keeps a stable identity for AI or history.

On selection: the cleanest route is to render a normal DOM tree for interaction and treat the flat IR as the truth. So the DOM is just a projection. That buys you native selection and IME behavior without building a custom cursor engine. The only hard part is deciding a consistent reading order (left‑to‑right, top‑to‑bottom, region order), so selection feels predictable even when layout is spatial.

On syncing/CRDT: a flat model can be simpler in a different way. You’re syncing text inside blocks plus lists of IDs in regions. That’s two clear problems instead of one giant nested tree. It doesn’t remove the complexity, but it makes it easier to reason about where conflicts live (content vs layout).

On lists: a flat list of items is closer to how people think. UL/OL is a browser artifact. Quill’s model already shows this is workable, and it makes the “content pool + layout map” idea more consistent.

Using TipTap/ProseMirror as the editing surface (selection, IME, rich text behavior) while keeping a separate IR is a reasonable split: the view stays tree‑shaped, the data stays flat.

So overall: this approach looks less like “throw away trees” and more like “trees become a rendering tool, not the canonical structure.” That’s a meaningful shift, especially if AI or layout transforms are first‑class.


> More specifically, if you can edit different parts of a same document on different devices, then the document should be split across multiple files that can be synced independently

A more robust idea is to store a log of changes in files that are synced with Dropbox/OneDrive/etc. To prevent conflict warnings from Dropbox, you'll want each device to write to a separate file. Readers re-assemble the logs into (some) canonical total order, then reduce over that to get the document state.

The hard part is re-architecting your app to record all changes, instead of just writing out the current state to disk. However, once you do that, it can form the basis for other features like undo/redo, a view of the file's history, etc.

(The changes don't need to be CRDT/OT messages - anything deterministic works, though it's a good idea to make them "rebase-able", i.e., they will still do something reasonable when replayed on top of a collaborator's concurrent changes.)


Indeed, Replicache works this way, using server reconciliation (one part of client-side prediction): https://doc.replicache.dev/concepts/how-it-works


As the author, same.

My best guess is:

- Central-server collaborative editing work focuses on Operational Transformation (OT), likely due to inertia (studied since 1989) and the perception that storing an ID per character is inefficient. In fairness, it is, absent the optimizations introduced by RGASplit and Yjs (~2015).

- For decentralized editing, OT is very complicated, and CRDTs took over as the solution of interest (studied since 2005). Storing every operation permanently in a log - needed to use the linked approach without a server - feels inefficient, as does server reconciliation's undo/re-apply process. So CRDT research has focused on avoiding those inefficiencies, sacrificing simplicity along the way, instead of just embracing them as the easy way out.

To me, the "inefficiencies" seem quite manageable. Storage is cheap, text is small, and you probably want a complete op log anyway for auditing and document histories (cf. git). Server reconciliation's undo/re-apply process can be batched aggressively, e.g., only do it a few times per second; that just makes remote ops take a little longer to show up.

Granted, I have not built a complete app around server reconciliation or the linked approach, so perhaps there is a hidden catch. But I am encouraged by the success of Replicache (https://doc.replicache.dev/concepts/how-it-works), which is where I learned of server reconciliation.


I wonder if you could sidestep the inefficiencies of storing ID's for every character by using redis streams to store the characters with id's "represented as delta-compressed macro nodes that are linked together by a radix tree" where the ids are "monotonically incrementing and has two parts: <time>-<counter>. The time is in milliseconds and the counter increases for entries generated in the same milliseconds"

This would seem to avoid clashes and also compress the identifier storage in an efficient way.

https://antirez.com/news/128


This sounds similar to the idea behind articulated (though with ids UUID-counter instead of time-counter): https://github.com/mweidner037/articulated

I will check out Antirez.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: