I don't understand the resistance to using multiple languages in one project. Ev...

mojuba · on Sept 26, 2007

You just translated SQL to an ugly C-ish syntax, which says nothing to me. I believe if we forget the SQL cliche, a different approach, I mean syntactically, is quite possible.

nostrademons · on Sept 26, 2007

I didn't. SQLAlchemy did. ;-) (http://www.sqlalchemy.org/)

You mean simply using normal CS data structures and keeping application data in memory? That approach certainly works - PG uses it for this site, and I believe it's used for Mailinator and Bloglines.

But you should think of the services that a database gives you and consider whether you're going to use them. That includes:

1.) Indexing by arbitrary column

2.) Subranges & slices, again by arbitrary column

3.) Joins

4.) Sorting

5.) Quick summarizing, over any combination of columns

6.) Transactions

7.) Persistence

8.) Concurrency control

9.) Access from multiple languages

10.) Distribution

If you aren't going to use any of these, then by all means Keep It Simple. If you're only going to use one or two - say you need persistence, indexing, and slicing - you can probably roll something up with your language's normal libraries. But if it looks like you'll need a good chunk of them, you'll drive yourself nuts trying to implement them yourself. Even maintaining multiple indexes via your hashtable approach can get very complicated when you have multiple requests inserting, deleting, and changing rows.

mojuba · on Sept 26, 2007

keeping application data in memory

No, I leave that to optimization. Could be in memory, on disk or wherever else my compiler and the run-time system decide them to be. When I see performance isn't good I may start fine-tuning my run-time system.

Indexing by arbitrary column

That's a matter of optimization. Thinking in indexes is clearly premature optimization that can be done automatically.

Subranges & slices, again by arbitrary column

That's map/reduce with some clever infrastructure that can optimize things for me.

Joins

Pointers.

Sorting

Sorting is sorting :)

Quick summarizing, over any combination of columns

Map/reduce.

Transactions, Persistence and Concurrency control

Possible without the DBMS.

Access from multiple languages

Honestly, I give up here. If we are going to have a programming language that itself handles large data structures, then other languages are probably out.

Distribution

What you mean by distribution?

fauigerzigerk · on Sept 26, 2007

Hard-coding access paths is what is premature optimisation. RDBMS defer decisions about what loops to process until runtime. So the optimiser can make good decisions based on statistics it has about the data. Decisions that programmers do not have to make prematurely.

I think it's pretty clear that if you know exactly how your data is going to be used, then performance is better if you hard code everything and store it in a way that is optimised for retrieval. However, if data is used for multiple purposes, then hard-coding access paths is horrible in terms of performance and maintainance.

anamax · on Sept 26, 2007

I don't know what the poster meant by distribution, but one of the nice properties of a DBMS is that it's available when your application is down and can be accessed by other applications. Everyone is a client, so no one else has to be a server (for that data).

nostrademons · on Sept 26, 2007

"What you mean by distribution?"

Ability to spread your data across multiple physical machines.

Goladus · on Sept 26, 2007

Can you really say that's easier to read than this?

I think it's generally less about readability and more about flexibility and error-checking (when it's not about being stubborn). If you want to generate SQL queries on the fly, it may be faster and less error-prone to use python objects rather than strings.

olavk · on Sept 26, 2007

My theory is that there is two kinds of people. Those that like everything to be in the same language and environment, and those that like to combine lots of different languages where each language is optimized for a specific task.

nostrademons · on Sept 26, 2007

Could be.

I'm reminded of the two competing camps for large-scale systems design. The UNIX philosophy is "small pieces loosely joined": each component does one thing well, and the system just provides a common metaphor and architecture for them to work together (eg. pipes and line-oriented text). The Windows philosophy is "big chunks tightly coupled": everything just works out of the box, but if you need to do something the designers didn't think of, you're out of luck. Kinda like "I'll do it myself; show me how" vs. "Do it for me."

Nearly every computing system can be placed somewhere along this continuum. In operating systems, we've got MacOS (pre-X) and Windows on one side and UNIX on the other. In languages, we've got Scheme and Smalltalk on the UNIX side and Java and .NET on the other (ironically, Scheme and Smalltalk have a "small pieces loosely coupled" architecture internally, but are big monolithic chunks when it comes to the outside world). For IDEs, Emacs is all the way over on the UNIX side, vim not far behind, Eclipse is in the middle, and IntelliJ and Visual Studio all the way on the Windows side. For JavaScript libraries, YUI/Mootools/Prototype are all on the Windows side and JQuery is on the UNIX side, though it's moving closer to the Windows philosophy. Among web frameworks, Rails and Django are on the Windows side with Pylons on the UNIX side. The web itself is based on the UNIX philosophy, but many of the largest websites follow the Windows philosophy.

I'm not so certain that people fit rigidly into one category though. Up until about 3-4 years ago, I was firmly in the Windows camp: I did Java programming only, I did it on a Windows machine, I used JBuilder and Eclipse to do it. Then I started branching out, mostly because I started learning about all this stuff that Java wouldn't let me get at. I think many people may follow a similar path: "do it for me" until you want to do something that the designers didn't think of, and then you have no choice to move to a "do it yourself; we'll show you how" environment.

davidw · on Sept 26, 2007

It's difficult to be good at many languages. Knowing many is not too hard, but being really quick with them is. IMO, at least. And the more you add, the messier things get. Google, who seems to know a thing or two about programming, only allows four: Java, Javascript, Python and C++.

nostrademons · on Sept 26, 2007

In production code (i.e. deployed to the publicly-facing website). I've heard that for internal-facing apps, 20% projects, and non-Google-branded stuff, you can use any language you like. Orkut was initially written in .NET, for example.

brlewis · on Sept 26, 2007

That's interesting. I'm curious as to where your theory places Lisp hackers. We could arguably be placed in either category.

olavk · on Sept 26, 2007

There are probably all kinds. If you are the type that love DSL's, but _only_ if they are implemented in lisp, then I would place you in the first group.