One drawback I can think of is that you loose the "permanent paths" to files in ...

gnosis · on May 22, 2011

"If you start twiddling with relationships between tags, I can imagine how many paths previously stored in software will get broken."

Twiddling with tags is no worse than twiddling with directories.

Few people consider it a fault of the design of ordinary filesystems that if you mess with the underlying filesystem layout, software that relied on that layout might break.

The same is really the case for any kind of dependency on a certain kind of organization of your data.

The fault for breaking software that's tightly coupled to a certain underlying organization or layout lies with the software itself (for not tolerating changes) and with the user for making the changes in the first place.

"I can also imagine 'identity problems'. Not counting symlinks and hardlinks, the full file path serves as its URI. How can I be sure if /photos/europe/DSCN0001.JPG and /photos/london/DSCN0001.JPG are the same files? What's the file URI here?"

But symlinks and hardlinks are the critical bit of filesystem functionality that makes ordinary filesystems subject to the very question. So why would you not consider them?

There are various solutions to this problem on ordinary filesystems: first, your tools (like "ls") could show you that a file or directory is symlinked (though you might have to traverse through the parent directories to find out whether there is a symlink). Second, you could also use stat to check the inode of the files in question to see if they're the same.

It should not be difficult to add similar functionality to a tag-based filesystem.

mnzaki · on May 22, 2011

Another related ramification of doing away with heirarchy and unique identifiers (tree paths) is not being able to have files with the same filename.

Say you have: /photos/london/DSCN0001.JPG and /photos/berlin/DSCN0001.JPG And the relationships: europe contains london, europe contains berlin

Now what does the 'path' /photos/europe/DSCN0001.JPG resolve to?

tx0 · on May 22, 2011

Tagsistant 0.2 does not allow to store two files with the same name, exactly as you say. But Tagsistant 0.4 will! Well, at the little compromise of having a small unique number prepended to each filename.

Tagsistant 0.4 has a broader vision (tagging of entire directories) but is still under development. If you have suggestions or doubts, I'll be very happy to discuss it.

spoondan · on May 22, 2011

Can you provide meaningful prefixes for conflicting files? When you detect a file name conflict, construct a distinguishing prefix for each conflicting file from the difference in tags on the conflicting files. (If all the tags are the same, then fallback to a synthetic prefix or overwrite the file or error out or whatever.)

For example, let's say you have /photos/london/DSCN0001.JPG and /photos/vienna/DSCN0001.JPG, where "london" and "vienna" are both included in "europe". This could yield paths like /photos/europe/london:DSCN0001.JPG and /photos/europe/vienna:DSCN0001.JPG.

The big trouble here (and, if I understand, with what you're suggesting as well) is that changing the name or tags of one file can alter the path to another as a side effect. So if I started with just /photos/vienna/DSCN0001.JPG, I might reference it as /photos/europe/DSCN0001.JPG somewhere. But when I go back and add /photos/london/DSCN0001.JPG, my reference to the photo of Vienna breaks because its name is no longer unique. As TeMPOral points out, this is a general class of problems afflicting a system like this.

tx0 · on May 23, 2011

It does not work exactly this way.

When you create a file "DSCN0001.JPG", it receive a prefix, even if it's not conflicting, becoming, lets say, "123_DSCN0001.JPG".

But both you and your software (say: a filemanager) are presuming the file is named "DSCN0001.JPG", not "123_DSCN0001.JPG". To solve that, Tagsistant 0.4 provides an aliasing layer that maps the original name to the prefixed one.

It's still something under development, so both the idea and the implementation can change. For example: how long should an alias exists? Just after the first access? Up to an extimated expiration time?

I'm oriented to the latter solution. Being aliases implemented as an SQL table, adding a expiration column and a garbage collecting thread should be all that is needed.

Of course, using expiring aliases is just like postponing the problem. But, in my opinion, Tagsistant is primary a personal tool, nothing that automated procedures or batches are supposed to rely on. I hope that, in this perspective, the alias workaround is an acceptable compromise.

tx0 · on May 22, 2011

I always thought at Tagsistant as an archiving tool, but you are totally right.

But files are also accessible from the archive/ directory where nothing is supposed to change as a consequence of tagging.

Can be a reasonable compromise?