Can you give morespecifics about what you were running on and what you purchased for your own gear?
I run an environment that scales to around 1,000 EC2 instances daily. Primarily we run C3.2Xlarge and R3.2xlarge for the core of our application.
We have ~12 nodes in our mongo cluster, and havent had a single issue with these nodes.
I occasionally get a zombie (totally hung VM) but thats very infrequent. I was aggressively using spot instances previously, but have switched to all 12-month reservations (We would lose many machines to a spot outage, new machines - more than those on Richess) and the recovery time for our system is 35 minutes (due to the R3 boxes needing to download their in-memory index from other machines) - so our service is degraded in capacity until the relaunch of these machines completes.
[aside: if youre looking to use spot, do two things - over-provision by a factor of 1.8 and spread across zones, and go look into using ClusterK.com for their balancer product]
Anyway, Just curious what was causing "sometimes daily" outages - I can't imagine that this would be due to AWS and not lacking ability of your application to handle instance losses.
I run an environment that scales to around 1,000 EC2 instances daily. Primarily we run C3.2Xlarge and R3.2xlarge for the core of our application.
We have ~12 nodes in our mongo cluster, and havent had a single issue with these nodes.
I occasionally get a zombie (totally hung VM) but thats very infrequent. I was aggressively using spot instances previously, but have switched to all 12-month reservations (We would lose many machines to a spot outage, new machines - more than those on Richess) and the recovery time for our system is 35 minutes (due to the R3 boxes needing to download their in-memory index from other machines) - so our service is degraded in capacity until the relaunch of these machines completes.
[aside: if youre looking to use spot, do two things - over-provision by a factor of 1.8 and spread across zones, and go look into using ClusterK.com for their balancer product]
Anyway, Just curious what was causing "sometimes daily" outages - I can't imagine that this would be due to AWS and not lacking ability of your application to handle instance losses.