Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bloom: A Language For Disorderly Distributed Programming (speakerdeck.com)
83 points by ColinWright on Oct 13, 2013 | hide | past | favorite | 19 comments


The paper on the underlying Dedalus logic (a datalog variant with explicit time) is much more insightful: http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-17...

The fundamental idea is to build a language out of primitives which compose monotonically. A very simple example is getting a distributed count of "yes votes". The naive solution is to have each server send a "incrementVote()" RPC calls. But what happens if an RPC is retried? Will your count accidentally be too high? A better solution is to send a "vote(myUUID)" message, which adds the UUID to a set. The number of votes is now the cardinality of the set of UUIDs. Voting is idempotent and the count is monotonic. The goal of Bloom is to create a language in which the right properties are ensured by construction.


Needless to say this is becoming a hot area due to distributed database research. Everyone is talking about the CAP theorem. Those that picked the AP side are trying to have their cake and eat it too. INRIA has been sponsoring a couple of papers in it (here is a popular one by Mark Shapiro: http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-695...).

The idea is to have auto-convergent types in an AP database cluster. There are some datatypes that can do this: max function, set union, boolean or. [Say one node union-updates set {} with {'a'} and the other with {'b'}. The two operations will both auto-converge so to speak to set {'a','b'}.

If you can detect conflicts (divergent data, aka "siblings" in Riak-speak, ), you could also build you own, app specific, conflict resolution for your application. It is nice if you don't have to and your db provides it by using auto-convergent types.


Naive question: what techniques are used for reliably generating UUIDs? Can you do it without giving each machine a unique identifier first?


The are a few types of standard UUIDs.

https://en.wikipedia.org/wiki/UUID

UUID1 and UUID4 are common. UUID1 are the ones that use machine id and timestamp. UUID4 are supposed to be randomly generated. UUID1 can be traced to a machine that created them and are sortable somewhat, UUID4s are nice because you don't have to worry if machines somehow ended up with non-unique machines (bad VM cloning can easily do that).

I would just use UUID4 for example.


Thanks. That description says:

Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B

What is the purpose of the 4 and y?


Versioning. The 4 says it is "version 4" in the RFC 4122 taxonomy, and the fact that the first two bits of y are "10" is meant to indicate that this is a UUID following that standard. Certain Microsoft GUIDs start with "110" in that field, and apparently there's another legacy format (Network Computing System) characterized by the first bit of y being "0". Therefore, an RFC 4122 compliant UUID will not overlap with either of these namespaces.


Just generate random UUIDs. The probability of their colliding is negligibly low.


Related talks, by the creators of the language:

Joe Hellerstein's RICON 2012 talk: http://vimeo.com/53904989

Neil Conway's RICON|East 2013 talk: http://www.youtube.com/watch?v=HqErn9acbto


These slides aren't very useful without the talk.


Chris just gave the talk today at Wicked Good Ruby Conf http://wickedgoodruby.com Confreaks recored the video, it might be a few weeks until it is available.


Here is a good talk about Bloom (not related to OP slide deck): http://vimeo.com/53904989


Notes to self:

http://en.wikipedia.org/wiki/CAP_theorem

  In theoretical computer science, the CAP theorem, also    
  known as Brewer's theorem, states that it is impossible 
  for a distributed computer system to simultaneously 
  provide all three of the following guarantees:[1][2]


  * Consistency (all nodes see the same data at the same time)
  * Availability (a guarantee that every request receives a response about whether it was successful or failed)
  * Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
http://es.wikipedia.org/wiki/Teorema_CAP

http://en.wikipedia.org/wiki/ACID

  ACID (Atomicity, Consistency, Isolation, Durability)


From the website: http://www.bloom-lang.net/bud/ BUD is Bloom as a Ruby DSL, and http://www.bloom-lang.net/faq/ FAQ tells more.


Well, the purpose of distributed programming is to be distributed first. Is this language targeting desktops/servers only? What about Android/iOS/WP/BB and rest of the mobile world?


I think "distributed" in this context is about multiple machines/servers. Not multiple plataforms.


Languages don't target platforms imho.

Implementations may, but that is made nearly impossible by the incredible amount of vendor lock in built into every platform you listed.


Languages are platforms.

Vendor lock-in exist for specialized areas (eg. documents, protocols). There are areas when some producers _must_ support common standards. Microsoft for ex. had to introduce support for native code in WP8. HTML/JS is a second area where support from vendor is a must. OpenGL is next one (not for WP, but its a minor platform, but... surprise, surprise Microsoft supports WebGL!).

For ex. nobody is forcing anyone to use native libraries for GUI (Qt, Swing, WPF). There are GUI libs completely independent from OS ( http://l33tlabs.org/ http://kivy.org/ http://www.pharo-project.org/about/screenshots ).

Smalltalk/Pharo application developed on Windows or Linux Machine can be deployed on iPad or Android and it will work without a single byte change. These things are made by single individuals as hobby projects with a great success.

You are wearing a corporate blindfold.


Language is not a platform, it is a way to describe a computation. In this particular case, the researchers discovered that using a relational language to describe state transitions in a distributed systems helps to reason about the distributed system and reduces the amount of code required to describe it's behavior significally. It is in no way dependent on any particular platforms, as it is some kind of libary you have to plug somewhere, it is a /language/.


bounded join semilattices arise in map-reduce calculations?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: