Can you give morespecifics about what you were running on and what you purchased...

Can you give morespecifics about what you were running on and what you purchased for your own gear?

I run an environment that scales to around 1,000 EC2 instances daily. Primarily we run C3.2Xlarge and R3.2xlarge for the core of our application.

We have ~12 nodes in our mongo cluster, and havent had a single issue with these nodes.

I occasionally get a zombie (totally hung VM) but thats very infrequent. I was aggressively using spot instances previously, but have switched to all 12-month reservations (We would lose many machines to a spot outage, new machines - more than those on Richess) and the recovery time for our system is 35 minutes (due to the R3 boxes needing to download their in-memory index from other machines) - so our service is degraded in capacity until the relaunch of these machines completes.

[aside: if youre looking to use spot, do two things - over-provision by a factor of 1.8 and spread across zones, and go look into using ClusterK.com for their balancer product]

Anyway, Just curious what was causing "sometimes daily" outages - I can't imagine that this would be due to AWS and not lacking ability of your application to handle instance losses.