At runtime, GADTs are actually represented the same way as plain ADTs (that is, a tagged union). The tighter types make it possible to remove some unused branches in pattern matching, so you may remove a few tests, but memory-wise it should be the same.
In my experience, a memory win takes place because you can use something lighter weight (ADTs) to model your data as opposed to something that consumes additional memory (objects or a record of bound functions) simply to get the type system to accept your code.
Furthermore, when using ADTs to store your data (and constructing modules that operate on them) I believe the resolution to the functions that operate on them are entirely done at compile time, not run time - whether or not that is being capitalized on is a different story, but I believe it has benefits now and will have even more later as the compiler authors continue to add more optimizations.
> OCaml classes allow row polymorphism, but disallow immutable updates (immutable updates meaning create a new object with everything the same, but one element different.)
There's a functional update construct for objects:
That syntax works for records not objects. The {< _ = _ >} syntax is sort-of equivalent except it can only be used within method definitions (!) and is therefore too restricted.
Faker is python-specific, and not really about creating databases, but of course you can tack that on easily. It also gives you specific values instead of a set schema and supports different languages. I use faker too, and my favourite is to tack it on to factory_boy (http://factoryboy.readthedocs.org/en/latest/) for unit testing data with random but reasonable values.
Maybe a cool idea is to merge these two projects or use faker to do the value generation part here, but add more stuff in this project that's about generating a consistent schema of related values - fake users that post fake posts from fake locations in a consistent manner.
Btw, they have counterparts in the Ruby world: https://github.com/stympy/faker (which states to be a port from Perl) and https://github.com/thoughtbot/factory_girl
I use them in a Rails db/seed.rb file to create a new dev db every time I need new data to play with, especially after changes to the schema.
Oh, makes sense. I did not see that this is a bridge between DBs and faker. Mixer does something similar directly on top of sqlalchemy if I recall correctly.
Thank you for what you're doing, that's awesome!
I'd like to enlighten why choosing a license is important.
Many people can be picky about the license, because even though you may be full of good intentions, the data you provide is non-free as-is in that it can't be reused elsewhere. Having a license is the clean, boring way to tell everyone "do whatever you want with this data".
That's even a problem for Github, or more precisely Github users, because many people push some code without any explicit license, making this code hardly usable by default: it may be available but you don't know if you have the right to reuse it, even if the developer said "yeah, use this as you want". That's not enough. Just because the developer didn't take a few minutes to select and put a LICENSE file.
It is also important to think about it in the other direction: If you're going to take content from Wookieepedia, their license is CC-BY-SA (http://starwars.wikia.com/wiki/Wookieepedia:Copyrights), which means that you MUST say it comes (at least partially) from them (not that you wouldn't anyway) and that you MUST redistribute your content with a "similar" license. Even though you may think that everything is fine, there are some actions to do on this side.
Now, I'd understand that if only "amateurs" come and play with the data, we're all among good behaving adults and all is fine. But you may not be able to predict what will happen with how data is used, so unfortunately this step is important.
Personally I don't like restricting people, so I try to use CC0 as much as possible. Be aware that Creative Commons licenses may not be best suited for data (they were designed for work of art primarily), as can be seen by OSM's decision on the matter (http://www.osmfoundation.org/wiki/License/We_Are_Changing_Th...) so you may want to avoid them.
Note: I'm not even talking about money, which is an orthogonal problem.
My background is in security & formal methods. In my PhD thesis I wrote a type inference system to detect security bugs in the Linux kernel. I'm passionate about open source and contribute quite a lot to Debian.
Funny how it triggered a bug in Firefox. When the tab is unfocused, its title in the handle is "𝑼𝒏…", but when it gets the focus it becomes "𝑼<D835>…" (in a square box). The next codepoint is U+1D48F whose UTF-16 BE encoding is d8 35 dc 8f.
I'd say that the truncation algorithm operates on bytes and that it can't make sense of d8 35, but I'm not too sure how to fix that since graphemes can have arbitrary length (right?). Do you have to compute the width in advance?
>I'd say that the truncation algorithm operates on bytes
This seems likely, as another notable weirdness is that even with full width tabs, where there's plenty of space for at least "𝑼𝒏𝒊𝒄𝒐𝒅𝒆 𝑻𝒆𝒙𝒕..." it still only shows "𝑼𝒏𝒊𝒄𝒐...".
Unikernels can be a nice solution for this: it's possible to have a termination proxy mirage kernel sitting directly on Xen that only does TLS using ocaml-tls. It does not support everything either but it's coming.