More

jethroalias97 · on July 7, 2015

Figured I'd dust off my old constraint programming language (https://github.com/dbunker/SABR/blob/master/test/Real/Multip...)

with result (https://github.com/dbunker/SABR/blob/master/test/Real/Multip...). Fifth position being E.

A bit on the verbose side, but might be preferable to some since it explicitly enumerates all the possibilities for each rule.

jethroalias97 · on June 7, 2013

Turning thoughts into software is what software engineers do every day. At every point along the technology trajectory technologists have created more and more powerful abstractions. When transistors where first created: print "Hello World" would have seemed a magical melding of the minds of machine and man, whereas today we can create customized eCommerce social networks with a few keystrokes.

However, as powerful as the tools get, you still have to tell the machine exactly what you want it to do. Depending on the tool, sometimes that is easy, sometimes that is hard. I feel like there should be a fundamental law of "ability to easily express" vs "flexibility of expression." I'm not sure how much machine learning can serve as our savior in this regard, even if we can eliminate "make an Amazon clone" posts on cheap-outsourced-developer.com by just using ML to classify it and spit out the boiler plate.

In order for something to be profitable it has to be customized, and when this happens the abstractions inevitably seem to break down. Based on this I think turning thoughts into software will happen around the same time thinking becomes mass-produced. How can you create, "eBay, but for car buying, with a twitter messaging component built for the Brazilian market" without first beating the Turing test as a sub-problem?

jethroalias97 · on June 5, 2013

I have been using Riak's secondary indexes for my latest project and have generally found them a joy to use. However, I do have to have to question a bit the way they are architected.

Assuming you are using Riak's default configuration each range query hits 1/3 of the cluster, which could get pretty hairy on large clusters that have lots of requests. Also, there is no pagination, so if an index has a million objects you'll have to be prepared to wait even if you only want the first part of the query.

You could solve this by putting a sort value in the key and using a range query, but this wouldn't work if you want the most recent items keyed with time, because the items could be unevenly spaced back in time. Also, Riak, like many databases based on Dynamo, thrives on fat data which one would think would favor lists. LevelDB is also supposedly slower than Bitcask, the default backend, but I'm not sure if this is still true.

I've been trying to think of ways around these problems. A simple thought I had was to simply cache the response as pages in Riak. Although this introduces new problems like how to know how often to reset the cache, too often and I may as well not have this cache, too infrequently and users get stale data. I would also have to handle this using worker threads because I wouldn't want the odd 100th user to get a big latency hit. The database would also either have to be continually polled, wasting CPU, or potentially not have the data cached when needed.

Another solution I've been considering is to write a secondary index layer on top of Riak using a skiplist or btree to know where to add and remove data when it gets to be very large. This seems like a cool idea, but might be tricky to implement and do conflict resolutions on.

My last idea was the most ambitious, which was to implement a separate distributed database specifically for secondary indexes and range queries which would not be bound by Dynamo. The idea here is to have each node in charge of a segment of the key space (like Big Table) and then have it split and coalesce not only based on size, but also on frequency of reads and writes to handle the bottleneck problem.

I initially was going to have this paired with the Dynamo database (https://github.com/dbunker/Dynago) I was experimenting with using Go and LevelDB, but there is no reason it couldn't work with any Key-Value eventually-consistent hyper-reliable database to provide light-weight secondary indexes. Having it constantly check the core key-value database would mean it wouldn't have to be super reliable in its own right and so could be kept relatively simple.

But again the simplest solution may ultimately be the way to go, I'm not sure, all these seem to have pretty big trade offs.

bonzoesc · on June 6, 2013

Assuming you are using Riak's default configuration each range query hits 1/3 of the cluster, which could get pretty hairy on large clusters that have lots of requests. Also, there is no pagination, so if an index has a million objects you'll have to be prepared to wait even if you only want the first part of the query.

You could solve this by putting a sort value in the key and using a range query, but this wouldn't work if you want the most recent items keyed with time, because the items could be unevenly spaced back in time.

Pagination is coming soon; it's in riak_kv master already, but in buyer-beware #yolo territory.

LevelDB is also supposedly slower than Bitcask, the default backend, but I'm not sure if this is still true.

Bitcask is faster when all the keys fit in memory: it's designed to load any value with a single disk seek. LevelDB can't make that guarantee, but neither can Bitcask with too many keys for available memory.

I've been trying to think of ways around these problems. A simple thought I had was to simply cache the response as pages in Riak. Although this introduces new problems like how to know how often to reset the cache, too often and I may as well not have this cache, too infrequently and users get stale data.

Caching is one of the two hard problems in software engineering (along with "naming things" and "off-by-one errors"), so good luck :) If you're not opposed to running a separate service, Memcache is what I'd use.

jethroalias97 · on June 6, 2013

Any thoughts on when the pagination goes live? I can't find any information on it online. Memcache would be a good choice, but I am wondering, if I have a few secondary indexes with over a million indexes each, wouldn't continually recreating this cache irreparably bog down the cluster?

bonzoesc · on June 6, 2013

I believe it's part of Riak 1.4, which is our next release; no date yet.

The number of entries in a 2i isn't going to bog down querying it any more than lots of objects bog down LevelDB. Make sure your indexes have the right content with the right cardinalities and it shouldn't be a problem.

If you want to drop in to #riak on freenode tomorrow (I'm in the America/New_York time zone) I'm brycek in there.

jethroalias97 · on June 4, 2013

Assuming you are using a distributed architecture, there is no way to verify a user without at least one database lookup because the request could be coming into any API server. So in most cases we're not avoiding cookies and sessions just for the sake of it.

shaydoc · on June 4, 2013

You don't need to do a database look up if you stuff some context into your token and encrypt it with a secret key. When the server receives the request it can simply decrypt the token and deserialize it into some sort of strongly typed usercontext

gizzlon · on June 4, 2013

Doesn't this open you up to replay attacks though? Since you can't store that a token was already used..

shaydoc · on June 4, 2013

I failed to say that your token context should have a "time based expiration", in that a new token is reissued periodically as defined by you and your needs. I would refer to the ASP.NET Forms Auth mechanism with its sliding expiration.

TylerE · on June 4, 2013

Sure you can. Just include a timestamp, and expire the token, at, say, time + 90 seconds, or whatever makes sense for the application.

gizzlon · on June 4, 2013

I get that you can expire it, and that helps, but it's not the same as use-once. Of course, just using a timeout is probably fine in many cases, especially if it's used with SSL. But replay attacks are still possible since there's a windows where it can be re-used.

shaydoc · on June 4, 2013

Replay attacks are always gonna be possible unless you use a one time token or signature, thems the break's..., unless you wish to get into the something you have and something you know model. How can you do a use once token making concurrent requests without a strong authentication mechanism client side such as issuing private keys to clients....and all the PKI admin overhead. I think its safe to say, that a restful api should be stateless, and bottlenecks such as session state are not necessary.

gizzlon · on June 5, 2013

I'll take that as a "yes" ;)

AFAIK, neither signatures or "something you have, know" alone fixes replay attacks. Since this is a well known problem in cryptography, many solutions exists. All of which are probably overkill for this use.

shaydoc · on June 5, 2013

At least with the use of a digital signature and nonce you can guarantee that the request hasn't been tampered with!

jethroalias97 · on June 1, 2013

Assuming you are using only "reads HN" as your a priori, all of us with below average pay should expect higher salaries. If you were to argue that those who read HN and also have high salaries are the only ones reporting however, then it would be fair to rap our knuckles with the stats 101 textbook.

briandear · on June 1, 2013

Maybe HN readers get paid less because they waste coding time reading HN instead of working.

jethroalias97 · on June 1, 2013

People always say that remote working will be the way of the future, but humans remain extremely social animals. So much of what we do is guided by genetics and it seems to be human nature that if you don't see someone's face on a regular basis it is very difficult to create an intimate or trusting relationship.

This seems to be one of the main pain points when it comes to outsourcing work abroad, people seem to work best and collaborate the most when they interact frequently in person. In Pixar Story, I recall Jobs saying he designed the headquarters specifically for unplanned collaboration. As much as the world changes, people remain the same.

jethroalias97 · on June 1, 2013

When I first read 'GitHub-maintained client libraries' I initially thought they meant built in libraries or templated stub functions, sort of like what they offer for .gitignore for various programming environments and languages. I was a little disappointed when I saw it was just for the Github API, although I expect this to still be useful.

I feel like this is something they could offer in the future though as many IDEs automatically build your environment when a new project is initialized. Assuming you use github as your IDE (i.e. not really using any IDE) project initialization might make sense.

roryokane · on June 1, 2013

Giter8 (https://github.com/n8han/giter8) sounds like what you’re looking for. “Giter8 is a command line tool to generate files and directories from templates published on github or any other git repository.”

jethroalias97 · on June 1, 2013

I expect people would be reluctant to use a client not sanctioned by bitcoin.org officially, although I'm not sure how difficult that is. It may be as simple as sending an email to get them to add it to the list.

I would think that any new bitcoin wallet could be introducing a substantial security risk before it has been extensively field tested. If someone where to sneak some code in, or even just screw up a security protocol, all the stored bitcoins would be at risk.

1) http://bitcoin.org/en/choose-your-wallet

2) https://en.bitcoin.it/wiki/Wallet

nwh · on June 1, 2013

If this were a real client, it would be an interface to bitcoind rather than a replacement.

achalkley · on June 1, 2013

Right. I've been working on an RPC client for iPhone.

ianlevesque · on June 1, 2013

That's awesome, I've been working on a native Cocoa one for OS X.

jethroalias97 · on Jan 3, 2013

x86 also has been compiled to javascript (http://bellard.org/jslinux/) QEMU style.

iso-8859-1 · on Jan 3, 2013

people usually use the word "emulator" for this. I do not believe jslinux actually outputs anything. So to call it a compiler would be misleading, IMHO.

jethroalias97 · on Jan 2, 2013

It's a shame he doesn't mention his friend's name at the end of the piece, but if anyone has the right to refer to themselves as "beloved" I'd say it's David Rakoff.