A little silence doesn’t hurt

on September 16, 2009 in Personal with 2 comments by

Well, I haven’t posted in about a week and that’s because I’ve been spending a lot of time on a little project.

Someone on the OVH forums made mention of using Amazon’s S3 services for external storage and he’s using  s3fs for this purpose.

Interested, I took a peek at this project and was immediately taken back by the lack of >5 GB file support. I have files that cover 10x that size. So what do you do in this case? Fix it!

I’ve seen another (abandoned) project called S3NBD, which comes closer to what I had in mind of how things should work. But as the website noted — and the reason for abandonment — sending a 1 GB file in 4K chunks is rather costly.

Right now I’ve solved the issues between s3fs and S3NBD on paper and have started with the groundwork. I’ll host the project at SourceForge once I feel confident there are no catastrophic issues. They [SourceForge] may advice to “release early, release often”, but I know that generally people will installing things without reading the documentation. So if someone end up with a trillion billable Amazon S3 requests, who’ll they blame first? Yes, me. So I’d rather want to do some initial testing done before I release it in the open.

What’s different

So what’s going to be different? Well, for starters it will not be a NBD but use FUSE, similar to s3fs. It will also support 0-byte files, networked POSIX locks and most importantly, files larger than 5 GB. Yes, files larger than 5 GB at Amazon S3 – in fact, its current upper limit is 16 Exabytes (16,384 Petabytes or 16,777,216 Gigabytes). You never know when this alien drops by to give us more than just Velcro, like a 10 Petabyte hard drive or so.

There are some technical hurdles to overcome because Amazon S3 wasn’t particularly meant to be used in this way. It’s not just overcoming the file limit barriers, but also trying to keep performance at reasonable levels whilst also making sure it remains cost effective opposed to having your own multi-terabyte SAN array. Simpler said than done, really.

So when should you expect this to appear here / at SourceForge? Well, the short answer is: Never ask an open source developer “is it done yet?”.  But the way things are progressing so far, I expect at least something to be released in a week or two. Want to be the guinea pig?

Tags: ,

2 comments

  1. Anonymous
    posted on Sep 25, 2009 at 9:43 PM  |  reply

    You are super man of c++ ? Or is it python ? Or c ? Or HTML !!!?!? :D

    • posted on Sep 27, 2009 at 9:15 PM  |  reply

      I’m actually programming it in GWBASIC ;-)

Join the discussion

Your email address will not be published. Required fields are marked *