Huge datasets in Frontier

Weblogs.com is an all-Frontier application, and it handles absolutely huge amounts of data, and as it's grown it's had to do more and more bizarre things to cope with the enormity of it. To give you an idea these days its getting over a half a million pings a day. And it seems to work. How about that.

Anyway, I can't do anything fancy with it because it's deployed, but it has a baby brother, audio.weblogs.com, which is doing nowhere near the volume, so I was able to build-in a better approach to managing data, assuming that at some point it will be handling as much as the main weblogs.com site.

I wanted to share the technique here, and the code, it's really quite simple, and if you're having trouble managing enormous tables you may want to consider this approach.

Many files instead of one

The key idea is instead of storing all the data in one big table, break it up into many tables spread across many files. Then let the file system help Frontier manage the large data set.

Also, instead of re-inventing the wheel, we build on Frontier.openDataFile, which manages a pool of files. Before it opens a database file, it closes another, the least-recently-used one.  The size of the pool is determined by a pref, user.prefs.maxDataFiles.

Code

The Frontier script: workspace.accessHugeDataSet.

Text listing for people without Frontier.

Yes, you can use it in Radio as well, or any app built out of the Frontier codebase.

What's huge?

What's huge depends on the performance of your computer. On most modern systems Frontier can handle a table with up to 50,000 elements in it without too much of a performance hit. Over time as machines get faster that number goes up. Certainly if you anticipate having millions of data objects, you should use a technique like this.

Aren't relational databases better for this?

Yes, we know this is something relational databases do better, but sometimes you want an all-Frontier app and happen to have a huge dataset. ;->

# Posted by Dave Winer on 4/11/05; 3:52:39 PM - --


It Worked!

Only geeks allowed here. Everyone else, get out a here! ;->

# Posted by Dave Winer on 4/11/05; 3:37:32 PM - --