More

emillon · on March 31, 2015

At runtime, GADTs are actually represented the same way as plain ADTs (that is, a tagged union). The tighter types make it possible to remove some unused branches in pattern matching, so you may remove a few tests, but memory-wise it should be the same.

jordwalke · on March 31, 2015

In my experience, a memory win takes place because you can use something lighter weight (ADTs) to model your data as opposed to something that consumes additional memory (objects or a record of bound functions) simply to get the type system to accept your code.

Furthermore, when using ADTs to store your data (and constructing modules that operate on them) I believe the resolution to the functions that operate on them are entirely done at compile time, not run time - whether or not that is being capitalized on is a different story, but I believe it has benefits now and will have even more later as the compiler authors continue to add more optimizations.

emillon · on March 31, 2015

Yes, exactly. ADTs are entirely known statically so there's no vtable etc.

emillon · on March 16, 2015

> OCaml classes allow row polymorphism, but disallow immutable updates (immutable updates meaning create a new object with everything the same, but one element different.)

There's a functional update construct for objects:

http://caml.inria.fr/pub/docs/manual-ocaml/objectexamples.ht...

It works like { r with field = value }.

tel · on March 16, 2015

That syntax works for records not objects. The {< _ = _ >} syntax is sort-of equivalent except it can only be used within method definitions (!) and is therefore too restricted.

Instead, try taking a look at the store-comonad mechanism I've got below: https://news.ycombinator.com/item?id=9213482

emillon · on Feb 15, 2015

> Also, when your programs use abstract data types you lose the benefits of pattern matching.

In some cases you can also use "private ADTs". Modules seeing such a type can use pattern matching on them but can not apply the constructors.

emillon · on Dec 17, 2014

Hi! Looks interesting. I usually use faker for this (https://github.com/joke2k/faker). How does fake2db compare to it?

makmanalp · on Dec 17, 2014

Faker is python-specific, and not really about creating databases, but of course you can tack that on easily. It also gives you specific values instead of a set schema and supports different languages. I use faker too, and my favourite is to tack it on to factory_boy (http://factoryboy.readthedocs.org/en/latest/) for unit testing data with random but reasonable values.

Maybe a cool idea is to merge these two projects or use faker to do the value generation part here, but add more stuff in this project that's about generating a consistent schema of related values - fake users that post fake posts from fake locations in a consistent manner.

edit: Actually it looks like this does use faker!

pmontra · on Dec 17, 2014

Btw, they have counterparts in the Ruby world: https://github.com/stympy/faker (which states to be a port from Perl) and https://github.com/thoughtbot/factory_girl I use them in a Rails db/seed.rb file to create a new dev db every time I need new data to play with, especially after changes to the schema.

emillon · on Dec 17, 2014

Oh, makes sense. I did not see that this is a bridge between DBs and faker. Mixer does something similar directly on top of sqlalchemy if I recall correctly.

emillon · on Dec 5, 2014

Under which license is this data available?

paulhallett · on Dec 5, 2014

The data is being scraped from sources online and by watching the films about a thousand times.

As the data is already available freely elsewhere, I won't be charging or keeping the data, I'll provide it just like I do on http://pokeapi.co

rakoo · on Dec 5, 2014

Thank you for what you're doing, that's awesome! I'd like to enlighten why choosing a license is important.

Many people can be picky about the license, because even though you may be full of good intentions, the data you provide is non-free as-is in that it can't be reused elsewhere. Having a license is the clean, boring way to tell everyone "do whatever you want with this data".

That's even a problem for Github, or more precisely Github users, because many people push some code without any explicit license, making this code hardly usable by default: it may be available but you don't know if you have the right to reuse it, even if the developer said "yeah, use this as you want". That's not enough. Just because the developer didn't take a few minutes to select and put a LICENSE file.

It is also important to think about it in the other direction: If you're going to take content from Wookieepedia, their license is CC-BY-SA (http://starwars.wikia.com/wiki/Wookieepedia:Copyrights), which means that you MUST say it comes (at least partially) from them (not that you wouldn't anyway) and that you MUST redistribute your content with a "similar" license. Even though you may think that everything is fine, there are some actions to do on this side.

Now, I'd understand that if only "amateurs" come and play with the data, we're all among good behaving adults and all is fine. But you may not be able to predict what will happen with how data is used, so unfortunately this step is important.

Personally I don't like restricting people, so I try to use CC0 as much as possible. Be aware that Creative Commons licenses may not be best suited for data (they were designed for work of art primarily), as can be seen by OSM's decision on the matter (http://www.osmfoundation.org/wiki/License/We_Are_Changing_Th...) so you may want to avoid them.

Note: I'm not even talking about money, which is an orthogonal problem.

emillon · on Dec 1, 2014

Location: Paris, France

Remote: Possible

Willing to relocate: Not immediately but we can discuss it.

Technologies: OCaml, Haskell, Python, C

Resume: http://www-apr.lip6.fr/~millon/cv-emillon.pdf

Github: https://github.com/emillon

Blog: http://blog.emillon.org/

Email: me AT emillon DOT org

My background is in security & formal methods. In my PhD thesis I wrote a type inference system to detect security bugs in the Linux kernel. I'm passionate about open source and contribute quite a lot to Debian.

Happy to chat with you about cool opportunities!

emillon · on Nov 19, 2014

Funny how it triggered a bug in Firefox. When the tab is unfocused, its title in the handle is "𝑼𝒏…", but when it gets the focus it becomes "𝑼<D835>…" (in a square box). The next codepoint is U+1D48F whose UTF-16 BE encoding is d8 35 dc 8f.

I'd say that the truncation algorithm operates on bytes and that it can't make sense of d8 35, but I'm not too sure how to fix that since graphemes can have arbitrary length (right?). Do you have to compute the width in advance?

jabiko · on Nov 19, 2014

It seems like this is a known bug: https://bugzilla.mozilla.org/show_bug.cgi?id=921528

anon1385 · on Nov 19, 2014

http://www.unicode.org/reports/tr29/

There are libraries for doing it in Javascript: https://www.npmjs.org/package/grapheme-breaker (is that part of the Firefox UI done in Javascript? I've no idea)

aninhumer · on Nov 19, 2014

>I'd say that the truncation algorithm operates on bytes

This seems likely, as another notable weirdness is that even with full width tabs, where there's plenty of space for at least "𝑼𝒏𝒊𝒄𝒐𝒅𝒆 𝑻𝒆𝒙𝒕..." it still only shows "𝑼𝒏𝒊𝒄𝒐...".

pwnna · on Nov 19, 2014

Hm.. i'm on nightly and seems to be unaffected by this problem.

emillon · on Nov 19, 2014

It depends on the size of the tab headers.

pc86 · on Nov 19, 2014

I am using FF Dev Edition and see "Unico<D835>..." regardless of focus. Weird.

emillon · on Nov 19, 2014

> You must be a registered Apple Developer to use these fonts. Do not download if you don't have a paid Apple Developer Program account.

I presume that this is against github's TOS.

emillon · on Nov 19, 2014

Org mode has a feature for that:

http://orgmode.org/manual/Stuck-projects.html

emillon · on Nov 11, 2014

Unikernels can be a nice solution for this: it's possible to have a termination proxy mirage kernel sitting directly on Xen that only does TLS using ocaml-tls. It does not support everything either but it's coming.

dsl · on Nov 11, 2014

Two new Xen vulnerabilities will be dropping soon.