Here’s an out-there take, but one I’ve held loosely for a long time and haven’t shed yet: dicts are not appropriate for what people mostly use them for, which is named access to member attributes.
dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.
A single record is more
like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).
Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress).
NamedTuples are great, but they let you do too much with the objects. You probably don't want users of your GitHubRepo class to be able to do things like `repo[1]` or `for foo in repo`. Dataclasses have more constrained semantics, so I reach for them by default. In my ideal world they would default to frozen=True, kw_only=True, slots=True, but even without those they're a big improvement.
Dicts in python are for when you have a thing and you aren't sure what the keys are. Dataclasses are for when you have a thing and you're sure what the keys (attributes are). The trouble is when you have a thing and you're sort of sure, but not entirely sure, and some things are definitely there but not everything you might be thinking of.
I think most modern Python codebases are using dataclasses/ something like Pydantic. I think dicts are mostly seen, like the author suggests, because something which you hacked up to work quickly ends up turning into actual software and it's too much work refactor the types
dicts are used internally in the language to look up class and module attributes. They are optimized for this use case. How can it be wrong to use them that way when the very fabric of the language depends on it?
namedtuple is widely used in Python code, especially before the introduction of dataclasses.
A hash function will always be more expensive than a pointer lookup, specially concidering a pointer lookuo is still needed after the hash function.
No matter what you do, a lookup into an array will always be quicker than a hash lookup if you don't need to do a linear search, even in a lot of cases the linear search will be quicker.
Structs in other languages is a lookup of pointer + and offset. Which to my knowledge is also true in python classes using __slots__. There's no reason to use a dict if you know the contents of the data, use a dataclass with slots=True purely because there's no hash function run on every lookup into the datastructure.
It’s not wrong to use dicts, it’s just bad practice when you could use something like a dataclass or pydantic model instead.
Dicts are useful for looking things up, like if you have a list bunch of objects that you need to access and modify, you should use a dict.
If you are using the dict as a container like car={“make”:”honda”,”color”:”red”}, you should use a proper object like a class, dataclass, or pydantic model based on whether you need validation, type safety, etc. This drastically reduces bugs and code complexity, helps others reason about your code, gives you access to better tooling etc.
Subclassing NamedTuple is very ergonomic, and given they're immutable unlike data classes I often reach for them by default. I still use Pydantic when I want custom validation or when it ties into another lib like FastAPI.
dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.
A single record is more like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).
Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress).