Hacker News

Ask HN: Is S3 down?

2589 points by iamdeedubs 9 years ago

I'm getting

{ "errorCode" : "InternalError" }

When I attempt to use the AWS Console to view s3

1140 Comments
Disclosure: I work on Google Cloud.
Apologies if you find this to be in poor taste, but GCS directly supports the S3 XML API (including v4):
https://cloud.google.com/storage/docs/interoperability
and has easy to use multi-regional support at a fraction of the cost of what it would take on AWS. I directly point my NAS box at home to GCS instead of S3 (sadly having to modify the little PHP client code to point it to storage.googleapis.com), and it works like a charm. Resumable uploads work differently between us, but honestly since we let you do up to 5TB per object, I haven't needed to bother yet.
Again, Disclosure: I work on Google Cloud (and we've had our own outages!).

By boulos 9 years ago
Apologies if this is too much off-topic, but I want to share an anecdote of some some serious problems we had with GCS and why I'd be careful to trust them with critical services:
Our production Cloud SQL started throwing errors that we could not write anything to the database. We have Gold support, so quickly created a ticket. While there was a quick reply, it took a total of 21+ hours of downtime to get the issue fixed. During the downtime, there is nothing you can do to speed this up - you're waiting helplessly. Because Cloud SQL is a hosted service, you can not connect to a shell or access any filesystem data directly - there is nothing you can do, other than wait for the Google engineers to resolve the problem.
When the Cloud SQL instance was up&running again, support confirmed that there is nothing you can do to prevent a filesystem crash, it "just happens". The workaround they offered is to have a failover set up, so it can take over in case of downtime. The worst part is that GCS refused to offer credit, as according to their SLA this is not considered downtime. The SLA [1] states: "with respect to Google Cloud SQL Second Generation: all connection requests to a Multi-zone Instance fail" - so as long as the SQL instance accepts incoming connections, there is no downtime. Your data can get lost, your database can be unusable, your whole system might be down: according to Google, this is no downtime.
TL;DR: make sure to check the SLA before moving critical stuff to GCS.
[1]: https://cloud.google.com/sql/sla

By NiekvdMaas 9 years ago
The GCS being referred to by the GP is Google Cloud Storage, not Cloud Sequel. You really do need failover set up though. That's true for basically any MySQL installation, managed or not.

By fidget 9 years ago
That isn't just a Google issue though. You'd have had the exact same trouble with AWS/RDS if you're running with no replica. The lack of filesystem access is a security "feature" for both. If you have no HA setup then you have no recourse but to restore to a new server from backup, or wait for your cloud provider to fix it.

By adwf 9 years ago
RDS has snapshot backups you can create an instance from iirc so you can self fix this kind of issues.
Sure you get downtime all the same but not the waiting for support to solve an instance crash part

By avereveard 9 years ago
Yes, and RDS offers point in time recovery at that.
We've had to use it and can confirm that it works as advertised.

By unclebucknasty 9 years ago
Not using a failover is a bold choice (not stupid, just bold). A failover is like a good insurance policy: you pay for it, you hope that you'll never need it, but when shit happens you are very happy to have it!

By lbill 9 years ago
21 hours sounds pretty long to me. What type of data was it and how long would you have waited until you continued with a backup of the data on a different machine?

By TekMol 9 years ago
We were definitely prepared to recover from a backup, but the support team told us: "the issue with the file system will likely persevere over a backup/restore". So this, in combination with the data loss you have when recovering from a backup, means we basically had no choice other than to wait till the issue was resolved.

By NiekvdMaas 9 years ago
I've used both Google Cloud and AWS, and as of a year or so ago, I'm a Google Cloud convert. (Before that, you guys didn't at all have your shit together when it came to customer support)
It's not in bad taste, despite other comments saying otherwise. We need to recognize that competition is good, and Amazon isn't the answer to everything.

By JPKab 9 years ago
We were on GCP for around a year, it was my decision I really wanted to love GCP and I initially did. But we recently switched to AWS.
I think there is little GCP does better than AWS. Pricing is better on paper, but performance per buck seems to be on par. Stability is a lot worse on GCP, and I don't just mean service outages like this one (which they had their fair share) but also individual issues like instances slowing down or network acting up randomly. Also lack of service offerings like no PostgreSQL, functions never leaving alpha, no hosted redis clusters etc... Support is also too expensive compared to AWS.
Management interfaces are better on GCP and sustained use discount is a big step up against AWS reservations. Otherwise, I think AWS works better.

By eknkc 9 years ago
I haven't used AWS, but my experience with AppEngine and by extension GCP is similar.
Just last week I got an email saying that they'd discovered an issue on Google Cloud Datastore where certain (strongly consistent!) queries could have been returning incorrect results for a week long period and that I should check my logs to see if anything important had been affected in my application.
That's not the sort of behaviour that inspires confidence in a service.

By aardshark 9 years ago
Functions are going beta this week

By fidget 9 years ago
And discontinued next year?

By Moru 9 years ago
The standard "lol google will kill it in 6 months anyway" troll doesn't really apply to Google Cloud services. They know better than to be fickle with infrastructure offerings.

By benley 9 years ago
[deleted]

By 9 years ago
I, for one, would love to see Redis and Postgres on GCP. That would be enough to get me to switch I think.

By dstroot 9 years ago
Wait for the NEXT event. Radis will be announced.

By doubleorseven 9 years ago
Me too. We switched to Google Cloud years ago at its inception and have never looked back -- always viewed it as a competitive advantage due to its solid, more advanced infrastructure -- faster network, reliable disks, cleaner UI that's easier to manage. Just a cleaner operation all the way around.

By espeed 9 years ago
What indeed is bad taste is your choice of Google Cloud over AWS. No I really like GCP, use it at core of many apps, but if people really want a decentralized web we need to use more than one provider. Don't "convert". Use booth, redundancy ffs.

By snackai 9 years ago
Pity I can't upvote more than once! :)
This whole idea of being angry at a vendor for deprecating something with 1yr notice is just ridiculous!
People need to realize they are choosing lock-in, and are choosing the risk of deprecation every time they decide to use a cloud service with no drop in competition/open source/etc.
Own your choices people, don't blame others...

By kiallmacinnes 9 years ago
Sounds great on paper, but this is infrastructure level stuff with real world constraints.
The expectation of stability beyond a year is certainly not unreasonable when you're asking people to build their businesses/infrastructure on your platform.
And, building redundancy across providers can be impractical, owed to learning curve, cost duplication, higher outbound bandwidth costs, effort duplication, solution complexity, etc.

By unclebucknasty 9 years ago
What's the point of decentralizing by putting 50% on one, 50% on the other, and no overlap of the groups? You used the word redundancy, but who is willing to actually do that true redundancy?

By adjkant 9 years ago
I work in GCP support. I'm really curious: what do you feel changed that led to such improved support? I'd like to make sure we keep doing it.

By advisedwang 9 years ago
Chiming in as I noticed the change too. For a long time it was almost impossible to speak with a human - every query was directed to the extensive but often useless support pages. If a human did respond it often seemed like they weren't savvy enough to handle a microwave let alone solve infra issues.
Then, about a year or two ago - humans actually started responding to and fixing problems. A welcome change!

By anotherturn 9 years ago
Do you have one of the paid support packages, or is this your experience of our google groups/stack overflow etc?

By advisedwang 9 years ago
My experience of support with Google Apps for Business makes me very wary of using anything Google for critical business infra. Google products are nice, but as soon as you hit a problem or edge case, you're on your own in my experience.

By vacri 9 years ago
This.
I used to work on the Azure Portal Team. As much negative things as I can say about Microsoft, they take making things just work for developers seriously, despite high prices and misc. service issues.
The since nixed compute container project I initially worked on really exemplified this.
I tend to use Colo or AWS when possible but I have a client that insisted on Google GCE and Endpoints.
I've spent so much time time digging through source code and working around broken dev tooling, and dealing with incorrect or out of date documentation thanks to that requirement.
In my personal opinion Google has a way to go in mature tooling. Silent failures, or worse failures that don't result in build failures are not acceptable. Requiring paid support contracts to resolve an issue in google infra is not acceptable. Incredibly poor support for local dev environments is not acceptable.
After dealing with this stuff, I find it unlikely that I will ever rely on their systems in the future. AWS/Colo or, with reservations, Azure all the way.

By keithnoizu 9 years ago
Wish I could +1 this more. Any time I get some error, I spend hours sifting through old documentation and forum posts.

By spuiszis 9 years ago
Why not just open a support ticket?

By DerpyNirvash 9 years ago
Exactly. Because they've usually only experienced support for Google's free services, people assume all Google support is minimal - but it isn't. We pay $150 a month for silver support, and in the extremely rare (several years apart) case we need help, we get it.

By foxylad 9 years ago
In my GApps support experience, it's slow and inexperienced.

By vacri 9 years ago
Spot on.
And good luck getting accurate documentation.

By dbg31415 9 years ago
Spot on.
And good luck getting accurate documentation.

By dbg31415 9 years ago
Honestly, if you're a big service that millions of people use, you should not put all your eggs in a single basket and should probably use a mix, in case one of the clouds goes down like in this case.

By ehsankia 9 years ago
One of the biggest reasons you go for a cloud is because you don't want to deal with reliability & scaling issues, and there's a premium price attached to that. I think most companies using S3 in this case believed they put their eggs in different baskets when they put their data in there.

By Svenskunganka 9 years ago
I can't believe anyone would have thought a dependency on a single AWS region, or even single service provider, would count as having eggs in different baskets, at least, I really hope nobody could think that!
I suspect though that most people affected deemed the risks and costs of failure low enough to be acceptable, and for many people it still is - even with this outage. But that's a conscious decision, rather than plain ignorance.

By kiallmacinnes 9 years ago
That depends if you're willing to pay for the cost of hosting all your content twice and the development overhead of managing that. Twice the persistences means twice the chance of an issue occurring.

By Salgat 9 years ago
That's where tech like kubernetes help in making your app/service portable. Or having common APIs like between s3 and google cloud storage.
Twice the persistence means always having at least one backup and thus the occurance of downtime reduces not up

By nashadelic 9 years ago
With containers, I think the devops overhead would be minimal.

By w8rbt 9 years ago
That's if _everything_ is in containers. Also, don't undermine how much of a difference the host machine configuration can make... Docker uses its kernel.

By tejasmanohar 9 years ago
>(Before that, you guys didn't at all have your shit together when it came to customer support)
Sounds like it basically coincides with Diane Greene coming on board to run the show -- which is great news for all of us with increased competition on not just the technical front but also support (which is often the deal maker/breaker)

By hkmurakami 9 years ago
Is Diane really that good?
I was at a talk last year, where she spoke, and as much as I love Google, it was one of the boat boring talks I've ever heard in my life. So monotone and uninteresting... and I'm probably one of the biggest Google fans out there.

By 7ewis 9 years ago
I have no idea about the person in question, but stable and reliable infrastructure can be really boring. Unfortunately, it's also necessary.

By contingencies 9 years ago
I'm not familiar with her public speaking. But you want someone decidedly un-Google-like to run an enterprise software (non-engineering) operation.
Look at Safra Catz's public speaking (Oracle). Terrible public speaker, terrific operator [1].
[1] though we may easily disagree with their business practices.

By hkmurakami 9 years ago
Ability to give interesting talks and ability to run good products are two completely different talents, and your comment is therefore pretty meaningless. In fact, I'd argue a lot of people good at one are not good at the other.

By ocdtrekkie 9 years ago
I just wrote a piece reflecting on the s3 outage and the limitations of s3 metadata/replication:
https://medium.com/@jim_dowling/reflections-on-s3s-architect...

By jamesblonde 9 years ago
Or discuss it here: https://news.ycombinator.com/item?id=13760251

By jamesblonde 9 years ago
GCP has always felt like a forever beta product. On top of that you get a lot of lockin so I would never recommend GCP for a long term project.

By themihai 9 years ago
The brilliance of open sourcing Borg (aka Kubernetes) is evident in times like these. We[0] are seeing more and more SaaS companies abstract away their dependencies on AWS or any particular cloud provider with Kubernetes.
Managing stateful services is still difficult but we are starting to see paths forward [1] and the community's velocity is remarkable.
K8s seems to be the wolf in sheep's clothing that will break AWS' virtual monopoly on IaaS.
[0] We (gravitational.com) help companies go "multi-region" or on-prem using Kubernetes as a portable run-time.
[1] Some interesting projects from this comment (https://news.ycombinator.com/item?id=13738916)
* Postgres automation for Kubernetes deployments https://github.com/sorintlab/stolon
* Automation for operating the Etcd cluster:https://github.com/coreos/etcd-operator
* Kubernetes-native deployment of Ceph: https://rook.io/

By twakefield 9 years ago
Note that Kubernetes "builds upon 15 years of experience of running production workloads [on Borg] at Google" [0], but is different code than Borg.
In addition to Rook, Minio [1] is also working to build an S3 alternative on top of Kubernetes, and the CNCF Landscape is a good way of tracking projects in the space [2].
[0] https://kubernetes.io/ [1] https://www.minio.io/ [2] https://github.com/cncf/landscape
Disclosure: I'm the executive director of CNCF, which hosts Kubernetes, and co-author of the landscape.

By dankohn1 9 years ago
Yes, I was admittedly over generalizing with my statement regarding open sourcing Borg.

By twakefield 9 years ago
Well, you're in the ballpark. I might be wrong, but I've heard they're not averse to the idea of open sourcing Borg and Omega (it wasn't that long ago that the Borg paper would have been nigh unthinkable, interestingly), but the litany of Google specific stuff that is baked in makes refactoring for public release a nonstarter. It's a huge codebase with lots of little tendrils to other internal infrastructure.
Anyway, one needs an on-ramp to containers on Google Cloud. And one can't open source the one that one has, which despite being nearly mature enough to own a driver's license, wouldn't really fulfill the precise need that Kubernetes fills without some frontend work. So one writes Kubernetes. An almost entirely different fundamental architecture, by the way, so it's interesting for those who've seen both to compare.
In other words, you're not entirely off the mark even with the generalization.

By jsmthrowaway 9 years ago
K8s is a better borg! It leaps forward and build upon many years experience of operating the system.

By justicezyx 9 years ago
Is there any way built in to Kubernetes to go multi-AZ, multi-region, or even multi-cloud? Is federation the answer to this?
I remember reading somewhere in the K8s documentation that it is designed such that nodes in a single cluster should be as close as possible, like in the same AZ.

By 013a 9 years ago
Yes, see the blog http://blog.kubernetes.io/2016/07/cross-cluster-services.htm...

By qj_li 9 years ago
I have a component in my business that writes about 9 million objects a month to Amazon S3. But, to leverage efficiencies in dropping storage costs for those objects I created an identical archiving architecture on Google Cloud.
It took me about 15 minutes to spin up the instances on Google Cloud that archive these objects and upload them to Google Storage. While we didn't have access to any of our existing uploaded objects on S3 during the outage, I was able to mitigate not having the ability to store any future ongoing objects. (our workload is much more geared towards being very very write heavy for these objects)
It it turns out this cost leveraging architecture works quite well as a disaster recovery architecture.

By blantonl 9 years ago
Opportunistic, sure. But I did not know about the API interoperability. Given the prices, makes sense to store stuff in both places in case one goes down.

By sachinag 9 years ago
I am surprised more people don't know about it. I get questions like https://github.com/kahing/goofys/issues/158 every now and then and to be fair I don't think they market it well: https://cloud.google.com/storage/docs/migrating
Disclosure: I don't work for google but have an upcoming interview there.

By khc 9 years ago
"Disclosure: I don't work for google but have an upcoming interview there."
Disclosure: I took a tour there one time and have used google.
EDIT: I realized that I was being mean, but why was that disclaimer relevant?

By devmunchies 9 years ago
A few possible reasons, the most obvious being grandparent is disclosing a possible source of bias.
Also it could look suspicious if grandparent gets the job and at some point in the future someone looks back at this comment.
If in doubt, disclose. Especially in the tech industry, that's what Gamergate was actually about.

By Nexxxeh 9 years ago
> I realized that I was being mean, but why was that disclaimer relevant?
Because:
- transparency is always good
- adding a small disclosure to the bottom of a post is very low impact
- someone who is interviewing for a job at a company is likely to have a set of biases that influence what they say even if they think that they're being honest and objective.

By timv 9 years ago
I think it's a fair disclosure of potential bias.

By eric_h 9 years ago
Frankly, if you don't know the difference between a disclosure and a disclaimer, you shouldn't be commenting.

By mbrookes 9 years ago
Not poor taste at all. Love GCP. I actually host two corporate static sites using Google Cloud Storage and it is fantastic. I just wish there was a bucket wide setting to adjust the cache-control setting. Currently it defaults to 1 hour, and if you want to change it, you have to use the API/CLI and provide a custom cache control value each upload. I'd love to see a default cache-control setting in the web UI applying to the entire bucket.
I also want to personally thank Solomon (@boulos) for hooking me up with a Google Cloud NEXT conference pass. He is awesome!

By nodesocket 9 years ago
Out of curiosity, are you also using the cloud CDN?
https://cloud.google.com/compute/docs/load-balancing/http/us...

By dward 9 years ago
I found Google Cloud CDN a little overly complicated to get setup since you need to use load balancers.
I use CloudFlare. They handle generating a SSL certificate, can have a CNAME at the APEX, full-site static caching, 301 http => https redirects, etc.

By nodesocket 9 years ago
How did you get the pass?
Been trying to get one for IO (can't attend NEXT unfortunately)

By 7ewis 9 years ago
Hopefully you're still there even though S3 is back up. I have an interesting question I really, really hope you can answer. (Potential customer(s) here!!)
There are a large number of people out there looking intently at ACD's "unlimited for $60/yr" and wondering what that really means.
I recently found https://redd.it/5s7q04 which links to https://i.imgur.com/kiI4kmp.png (small screenshot) showing a user hit 1PB (!!) on ACD (1 month ago). If I understand correctly, the (throwaway) data in question was slowly being uploaded as a capacity test. This has surprised a lot of people, and I've been seriously considering ACD as a result.
On the way to finding the above thread I also just discovered https://redd.it/5vdvnp, which details how Amazon doesn't publish transfer thresholds, their "please stop doing what you're doing" support emails are frighteningly vague, and how a user became unable to download their uploaded data because they didn't know what speed/time ratios to use. This sort of thing has happened heaps of times.
I also know a small group of Internet archivists that feed data to Archive.org. If I understand correctly, they snap up disk deals wherever they can find them, besides using LTO4 tapes, the disks attached to VPS instances, and a few ACD and GDrive accounts for interstitial storage and crawl processing, which everyone is afraid to push too hard so they don't break. One person mentioned that someone they knew hit a brick wall after exactly 100TB uploaded - ACD simply would not let this person upload any more. (I wonder if their upload speed made them hit this limit.) The archive group also let me know that ACD was better at storing lots of data, while GDrive was better at smaller amounts of data being shared a lot.
So, I'm curious. Bandwidth and storage are certainly finite resources, I'll readily acknowledge that. GDrive is obviously going to have data-vs-time transfer thresholds and upper storage limits. However, GSuite's $10/month "unlimited storage" is a very interesting alternative to ACD (even at twice the cost) if some awareness of the transfer thresholds was available. I'm very curious what insight you can provide here!
The ability to create share links for any file is also pretty cool.

By i336_ 9 years ago
Now that's what I call a shameless plug!

By ptrptr 9 years ago
We would definitely seriously consider switching to GCS more if your cloud functions were as powerful as AWS Lambda (trigger from an S3 event) and supported Python 3.6 with serious control over the environment.

By scrollaway 9 years ago
Is there something about the GCS trigger that doesn't work for you? I hear you on Python 3, but I'm also curious about "serious control over the environment". Can you be more specific?

By boulos 9 years ago
Here are our main issues with Lambda, from highest-to-lowest priority:
- It supports Python 2.7 only. We need Python 3.4+ support.
- We can't increase CPU allocation without increasing RAM allocation, making them far more expensive than we need.
- Using psycopg2 on it is a PITA due to their handling of system dependencies.
- The system is entirely proprietary, making it impossible to run it locally for testing.
- Cloudwatch sucks for finding errors in the functions and is atrociously expensive.
- API gateway is an extremely crufty system, and used not to let you pass around binary data (this has changed)
- We can't disable/change the retry-on-error policy.
We have a pretty hard tie-in to S3 and Redshift, but when GCF can do better on a majority of these points, we'll begin moving to it. But yes, Python 3 at a minimum would be a requirement.

By scrollaway 9 years ago
> The system is entirely proprietary, making it impossible to run it locally for testing.
I assume that you are referring to emulating the triggering of lambdas behind API gateway...? I've found a project that sets up a node environment to do this. Very handy for js/lambda development. A google search suggests similar options may exist for python.

By mypalmike 9 years ago
On a curious note, how do you guys use lambda?

By vikiomega9 9 years ago
It's a little outdated now, but this post details our pipeline: https://hearthsim.info/blog/how-we-process-replays/

By scrollaway 9 years ago
As someone who's literally just starting to look at Lambda, thanks for that quick read.
I had a lot of "chicken and egg"-type questions about using it, and seeing that critical step of bootstrapping the whole thing via the API Gateway was really informative.

By RulerOf 9 years ago
I keep telling people that in my view, Google Cloud is far superior to AWS from a technical standpoint. Most people don't believe me... Yet. I guess it will change soon.

By simonebrunozzi 9 years ago
Google Cloud is the Betamax of cloud... while it might be technically superior it's not the only factor to consider. :)

By natbobc 9 years ago
Aww... that seems a little early to call ;).

By boulos 9 years ago
you don't comment for 4 years and THAT'S the comment you choose to return with?

By packetslave 9 years ago
Yep, replace "compiling" with "S3 recovery" in the following XKCD - https://xkcd.com/303/

By natbobc 9 years ago
What other factors make it doomed for failure like betamax?

By joshontheweb 9 years ago
I wouldn't say that it's doomed to failure but I do think it has a lot of ground to cover to catch-up. Google has a lot of great technology like TensorFlow, Kubernetes, and Go that will keep them relevant.
In support of my flippant remark I see three indicators that hold parallels to Betamax with detail to follow. I qualify that it is largely informed by my own anecdotal experience. Specifically by objections and responses that I've received/observed while myself and peers have proposed or implemented cloud adoption at various companies.
Indicators
1. market share. 2. proprietary tech stack. 3. technical superiority syndrome.
Detail
1. Currently AWS has a major lead, then Azure, then Google. The implication is that market share translates to mindshare, which in turn yields blog articles, OSS libraries/tools, etc. This becomes a virtuous cycle.
For .NET shops that marketshare will tend to favour Azure on the premise that MS knows best.
2. Some of Google's technology stack has a learning curve that is unique to Google. Take GAE as an example and compare to AWS's nearest equivalent Beanstalk (or Heroku). Beanstalk requires few if any changes to an existing application whereas GAE requires that you do it the App Engine way. It might provide a number of benefits, but it's invasive. Containers are shifting the requirement, however not everyone is in a position or has the desire to start with containers on day 1.
Further Google Cloud's project oriented approach while not a bad organisation mechanism detracts from learning. If you assume the premise that exploration is part of learning it forces the user to hold two items in their head: their objective and Google Clouds imposed objective.
AWS on the other hand generally provides defaults that allow you to launch resources almost immediately after sign-up. Google's approach is better for long-term support, maintenance and organisation but the user needs to have the maturity to understand that benefit.
3. It may be technically superior but that statement in of itself is divisive and can shudder some away. It is not enough to simply be technically superior and from my observation the statements tend to originate from X/Googlers.
A number of people will latch onto feature set (for beta, number of films available was a factor). The absence of features will often discount a choice out of the gate (even if those features are irrelevant) as an example:
- regional coverage: AWS - 15 regions/~38 zones Azure - 36 regions/zones Google - 6 regions/18 zones
- partially/fully managed services: AWS is continually growing these, at a level that seems to outpace competitors.
- Outwardly Google appears to tackle the "hard problems" with technically superior solutions (e.g. TensorFlow, BigQuery) but often appears to neglect the "boring" problems a number of companies want as well (e.g. Cloud VDI's, SnowBall, etc).
- Some areas seem to be ossified due to tight coupling (e.g. servlet 3.0 and python support in GAE).
Summary
There is no silver bullet solution. Every provider will have an outage at some point and this could be a big reason that GCE won't be knocked out of the game. I also think Google is working really hard to build community and mindshare. I don't have a crystal ball so only time will tell what happens but technical superiority has rarely been the sole reason that drives adoption.

By natbobc 9 years ago
I appreciate you taking the time to explain. I'm in the process of making decisions on a new cloud storage provider so this is helpful.

By joshontheweb 9 years ago
One service outage determines superiority? I prefer a lot more data than a single point.

By notyourwork 9 years ago
I'm in the process of moving to GCS mostly based on how byzantine the AWS setup is. All kinds of crazy unintuitive configurations and permissions. In short, AWS makes me feel stupid.

By joshontheweb 9 years ago
I should add that someone from the AWS team reached out to me in response to this comment asking for feedback on how they can improve their usability. So I give them credit for that.

By joshontheweb 9 years ago
As far as I understand the S3 API of Cloud Storage is meant as a temporary solution until a proper migration to Google's APIs.
The S3 keys it produces are tied to your developer account. This means that if someone gets the keys from your NAS, he will have access to all the Cloud Storage buckets you have access to (e.g your employer's).
I use Google Cloud but not Amazon. Once I wanted a S3 bucket to try with NextCloud (then OwnCloud). I was really frightened to produce a S3 key with my google developer account.

By andmarios 9 years ago
The HMAC credential that you'd use with the S3-compatible GCS API, also called the "XML API", does need to be associated with a Google account, but it doesn't need to be the main account of the developer. It can be any Google user account. I suggest creating a separate account and granting it only the permissions it needs. It'd be nice if service accounts (aka robot accounts) could be given HMAC credentials, that's not supported. Service accounts can, however, sign URLs with RSA keys.
As another option, you can continue using the XML API and switch out only the auth piece to Google's OAuth system while changing nothing else.
There's a lot more detail available at: https://cloud.google.com/storage/docs/migrating
Disclaimer: I work on Google Cloud Storage.

By BrandonY 9 years ago
Thanks for the advice. I think it would be even nicer if the HMAC credentials could be assigned to a specific bucket via an ACL.
I like GCS (and the gsutil tool) but occasionally a S3 style bucket is needed. For example you need a S3 bucket or a webdav server in order to send alerts with images from Grafana to Slack. A minor issue but nice to have if possible without having to deal with Amazon's control panel.

By andmarios 9 years ago
Is there any equivalent to the Bucket Policies that AWS provides (http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucke...). Cloud Storage seems to be limited to relatively simple policies without conditionals. For a few AWS IAM keys I set up a policy that limits write/delete access to a range of IPs (among other things). Something like that doesn't seem possible with what Google offers. Or do I miss something?

By dividuum 9 years ago
I am not familiar with AWS bucket policies, but AFAIK there isn't a way to set IP based access to GCS buckets.
To be honest, I do find the GCS permissions a bit complex. You have IAM, you have ACLs and you have S3 keys. Everything is set in a different place and ACLs aren't fully represented on the developers console. S3 keys give full access to everything, IAM service accounts give access per project and ACLs are fine grained (per bucket/object). On the other hand, IIRC, IAM has a write only setting, while ACLs do not. So I can have an account that can write only to all the buckets of my project but not an ACL (not that useful).

By andmarios 9 years ago
> OwnCloud
Kicked the tires, not impressed at all. Notes went missing from the interface could only get them back after manually digging through folders via FTP.

By stef25 9 years ago
"fraction of the cost" - how do you figure? Or are you just saying from a cost-to-store perspective?
Your Egress prices are quite a bit more compared to CloudFront for sub 10TB (.12/GB vs .085/GB).
The track record of s3 outages vs time your up and sending Egress seems like S3 wins in cost. If all your worried about is cross region data storage, your probably a big player and have AWS enterprise agreement in place which offsets the cost of storage.

By rynop 9 years ago
Sorry, my comparison is our Multi Regional storage (2.6c/GB/month) versus S3 Standard plus Cross-Regional Replication. That's the right comparison (especially for outages like this one).
As to our network pricing, we have a drastically different backbone (we feel its superior, so we charge more). But as you mention CloudFront, the right comparison is probably Google Cloud CDN (https://cloud.google.com/cdn/) which has lower pricing than "raw egress".

By boulos 9 years ago
[deleted]

By 9 years ago
So this is more compute related but do you know if there are any plans on supporting the equivalent of the webpagetest.org(WPT) private instance AMI on your platform?
Not only is webpagetest.org a google product but it's also much better suited for the minute by minute billing cycle of google cloud compute. For any team not needing to run hundreds of tests an hour the cost difference between running a WPT private instance on EC2 versus on google cloud compute could easily be in the thousands of dollars.

By Spunkie 9 years ago
Would use Google but I just can't give up access to China. Sad because I also sympathize with Google's position on China.

By malloryerik 9 years ago
boulous not in bad taste at all - happy google convert and gcs user works very well for us ymmv

By zoloateff 9 years ago
boulous is app engine datastore the preferred way to store data or cloud sql or something else, do you mind throwing some light on this thanks

By zoloateff 9 years ago
If you made a .NET library that allows easily connecting to both AWC and GCS by only changing the endpoint I would certainly use that library instead of Amazon's own.
Just saying, it gets you a foot in the door.

By DenisM 9 years ago
I had no idea this was an option. Great to know!

By danielvf 9 years ago
i have had problems integrating apache spark using google storage. especially because s3 is directly supported in spark.
if you are api compatible with s3, could you make it easy /possible to work with google storage inside spark?
remember i may or may not run my spark on Dataproc.

By sandGorgon 9 years ago
You can use the Google cloud storage connector (https://cloud.google.com/hadoop/google-cloud-storage-connect...) which works with hadoop (and therefore spark).

By bluedonuts 9 years ago
What is your NAS box doing with S3/GCS ?

By mbrumlow 9 years ago
Remote backup (Synology). I've asked them more than once to directly support GCS, or even just to accept my damn patch ;).

By boulos 9 years ago
Are you using Hyper Backup? That seems to support S3-compatible destinations, including GCS, at least in DSM 6.1 -
https://www.synology.com/en-us/knowledgebase/DSM/help/HyperB...

By gr2020 9 years ago
S3 applications can use any object store if they use S3Proxy:
https://github.com/andrewgaul/s3proxy

By gaul 9 years ago
How about giving a timeline of when Australia will be launching? I see you're hiring staff, and have a "sometime 2017" goal on the site, but how about a date estimate? :)

By thejosh 9 years ago
Does GCS support events yet?

By philliphaydon 9 years ago
As Relay's chief competitor in this region, we of Windsong have benefited modestly from the overflow; however, until now we thought it inappropriate to propose a coordinated response to the problem.

By hyperpallium 9 years ago
What software are you using for your NAS box?

By espeed 9 years ago
Classy parley. I'll allow it.

By pmarreck 9 years ago
Competition is great for consumers!

By masterleep 9 years ago

S3 is currently (22:00 UTC) back up.

The timeline, as observed by Tarsnap:

    First InternalError response from S3: 17:37:29
    Last successful request: 17:37:32
    S3 switches from 100% InternalError responses to 503 responses: 17:37:56
    S3 switches from 503 responses back to InternalError responses: 20:34:36
    First successful request: 20:35:50
    Most GET requests succeeding: ~21:03
    Most PUT requests succeeding: ~21:52

By cperciva 9 years ago

Thanks for taking the time to post a timeline from the perspective of an S3 customer. It will be interesting to see how this lines up against other customer timelines, or the AWS RFO.

By josephb 9 years ago
Playing the role of the front-ender who pretends to be full-stack if the money is right, can someone explain the switch from internal error to 503 and back? Is that just them pulling s3 down while they investigate?

By kaishiro 9 years ago
My guess based on the behaviour I've seen is that internal nodes were failing, and the 503 responses started because front-end nodes didn't have any back-end nodes which were marked as "not failing and ready for more requests". When Amazon fixed nodes, they would have marked the nodes as "not failed", at which point the front ends would have reverted to "we have nodes we can send traffic to" behaviour.

By cperciva 9 years ago
Could be anything. Most likely scenario is the internal error is a load shedding error and the 503s were when the system became completely unresponsive. If it was a configuration issue then it is more likely that it would have directly recovered rather than going 'internal error -> 503 -> internal error'.

By greenleafjacob 9 years ago
503 is typically what we see when our proxy can't connect to the backend server. We usually get 500 with internal server error when we've messed up the backend server.
So it's likely that the first 500s were the backend for s3 failing, then they took the failing backends offline causing the load balancers to throw 503 because they couldn't connect to the backend.

By hmottestad 9 years ago
S3 is not a monolithic architecture, Amazon is a strong proponent of Service Oriented Architecture for producing scalable platforms.
There are a number of services behind the front end fleet in S3's architecture that handle different aspects of returning a response. Each of those will have their own code paths in the front end, very likely developed by different engineers over the years. As ever, appropriate status codes for various circumstances are something that always seems to spur debate amongst developers.
The change in status code would likely be a reflection of the various components entering unhealthy & healthy states, triggering different code paths for the front end... which suggests whatever happened might have had quite a broad impact, at least on their synchronous path components.

By Twirrim 9 years ago
no. soundcloud uses aws s3. it is still down. this is false information.

By thenewregiment2 9 years ago
Soundcloud recovering from this failure and S3 being operational are two separate issues. We use S3 and it will take us nominally an hour to recover after S3 went up, for example.
S3 has started working as of about 20 minutes ago, and things are running smoothly.

By endersshadow 9 years ago
Thanks!

By quakeguy 9 years ago
There are other Amazon services that were affected. For example, we're still not seeing auto scaling groups working correctly.

By jeffasinger 9 years ago
"[RESOLVED] Increased Error Rates
Update at 2:08 PM PST: As of 1:49 PM PST, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally."
https://status.aws.amazon.com/

By ta_wh 9 years ago
oh the famous downvote for the smear campaign. just admit it.

By thenewregiment2 9 years ago
You're getting downvotes because you don't understand that B being down is not an effective indicator of the status of A, even if B depends on A.

By joatmon-snoo 9 years ago
Think you're mistaken, I don't have downvote privileges!

By ta_wh 9 years ago
Claiming a statement is false when it's demonstrably true is something that will likely get downvoted every time. It's misleading to others and fills the board with noise.

By espeed 9 years ago
A piece of hard-earned advice: us-east-1 is the worst place to set up AWS services. You're signing up for the oldest hardware and the most frequent outages.
For legacy customers, it's hard to move regions, but in general, if you have the chance to choose a region other than us-east-1, do that. I had the chance to transition to us-west-2 about 18 months ago and in that time, there have been at least three us-east-1 outages that haven't affected me, counting today's S3 outage.
EDIT: ha, joke's on me. I'm starting to see S3 failures as they affect our CDN. Lovely :/

By gamache 9 years ago
Reminds me of an old joke: Why do we host on AWS? Because if it goes down then our customers are so busy worried about themselves being down that they don't even notice that we're down!

By traskjd 9 years ago
Reminds me of an even older joke (from 80's or 90's):
Q: Why computers don't crash at the same time?
A: Because network connections are not fast enough.
(I think we are starting to get there)

By nabla9 9 years ago
These are both pretty good. Added to color fortune clone https://github.com/globalcitizen/taoup

By contingencies 9 years ago
I'm getting the same outage in us-west-2 right now.

By xbryanx 9 years ago
The dashboard doesn't load, nor does content using the generic S3 url [1], but we're in us-west-2 and it works fine if you use the region specific URL [2]. In practice this means our site on S3/Cloudfront is unaffected.
[1]: https://s3.amazonaws.com/restocks.io/robots.txt
[2]: https://s3-us-west-2.amazonaws.com/restocks.io/robots.txt

By firloop 9 years ago
Good catch. My bet is that because s3.amazonaws.com originally referred to the only region (us-east-1) the service that resolves the bucket region automatically is really hosted in us-east-1. I think AWS recommends using the region in the URL for that reason, however that is easier said than done I think. I would bet that a few of Amazon's services use the short version internally and are having issues because of it.

By madmod 9 years ago
Seeing it in eu-west-1 as well. Even the dashboard won't load. Shame on AWS for still reporting this as up; what use is a Personal Health Dashboard if it's to AWS's advantage not to report issues?

By STRML 9 years ago
Now it's in the PHD, backdated to 11:37:00 UTC-6. How could it take an hour to even admit that an issue exists? We have alerts set on this but they're useless when this late.

By STRML 9 years ago
Same here, and it's 100% consistent, not 'increased error rates' but actually just fully down. I'd just stop working but I have a demo this afternoon... the downsides of serverless/cloud architectures, I guess.

By WaxProlix 9 years ago
Heh that "increased error rates" got a chuckle out of me, I guess 100% is technically an increase.

By synicalx 9 years ago
Well what if you'd hosted it on your hard drive and it crashed? It seems like the probability of either is similar nowadays.

By pm90 9 years ago
The difference there is you can potentially do something about it, vs having to wait on an upstream provider to fix an issue for everybody.

By jacobwg 9 years ago
"you can potentially do something about it" vs. "you have to do something about it"
Perspective is everything.

By btgeekboy 9 years ago
Grab different machine, git clone your repo, good to go.
What's the odds of the server with your repo and your own hard drive crashing at the same time?

By JupiterMoon 9 years ago
Strangely, your comment made me read this entire post about working out probabilities.. http://www.statisticshowto.com/how-to-find-the-probability-o...
Quite interesting really!

By _ao789 9 years ago
Our services in us-west-2 have been up the whole time.
I think the problem is globally accessible APIs are impacted. As others have noted, if you can use region/AZ-specific hostnames to connect, you can get though to S3.
CloudFront is faithfully serving up our existing files even from buckets in US-East.

By all_usernames 9 years ago
S3 bucket creation was down in us-west-2, because it relied on us-east-1 (I expect that dependency will get fixed after this), but all S3 operations should have continued to function in us-west-2, other than cross-region replication from us-east-1.

By illumin8 9 years ago
IIRC the console for S3 is global and not region specific even though buckets are.

By codelitt 9 years ago
Also, cross-region replication is a new-ish thing: https://aws.amazon.com/blogs/aws/new-cross-region-replicatio...

By seanp2k2 9 years ago
Same outage in ca-central-1

By Ph4nt0m 9 years ago
I can confirm this as well.

By ngtvspc 9 years ago
Huh, I'm not seeing it on my us-west-2 services. Interesting.

By gamache 9 years ago
My advice is: don't keep your eggs in one basket. AZs a localised redundancy, but as Cloud is cheap and plentiful, you should be using two or more regions, at least, to house your solution (if it's important to you.)
EDIT: less arrogant. I need a coffee.

By movedx 9 years ago
But now you're talking about added effort. Multi-AZ on AWS is easy and fairly automatic, multi-region (and multi-provider) not so much. It's easy to say things like this, but people who can do ops are not cheap and plentiful.

By gamache 9 years ago
The only difficult aspect of multi-region use is data replication, which I can confirm is a (somewhat) difficult problem. This issue was with S3 which has an option to automatically replicate data from the bucket's region to another one. It's a check box. A simple bit of logic in the application and you can move between regions with ease.
Even data replication has options for this, too.
And I work in Ops.

By movedx 9 years ago
Well, you've explained how to do multi-region in S3. Now let's cover EC2, ELB, EBS, VPC, RDS, Lambda, ElastiCache, API Gateway, and all the other bits of AWS that make up my services. And then we can move on to failover application logic.

By gamache 9 years ago
I picked out S3 as this issue is directly related to it, yet the solution is simple: turn on replication and have your application work with it (which is on the developers, not ops.)
EC2: why are you replicating EC2 instances or AMIs across regions? Why aren't you using build tools to automatically create AMIs for you out of your CI processes?
ELB: Eh? Why do I need ELBs to be multi-regional? I'm a little confused by this on, sorry.
EBS: My systems tend to be stateless, storing as much log, audit, or data in external systems such as RDS, DynamoDB, S3, etc. Storing things on the local system's storage is a bit risky, but if you have to there are disk replication solutions available. EFS comes to mind for making that easier. Backups also come to mind in the event of data loss.
VPC: Why does a VPC need to be cross regional? This one is also lost on me.
RDS: Replication is easy -- it's done for you. Convincing developers their application needs to potentially work with a backup endpoint to the data is harder than data replication problems at times. More often than not, it's simply a case of switching to a read-only mode whilst you recover the write copy of your RDS instance, but this is the role of the developers, not ops.
Lambda, ElastiCache, API Gateway... all these things aren't arguments against my original point: architect correctly. Yes it involves more work (from the developer's perspective, mostly), but more often than not in the event of a failure you're left head and shoulders above your nearest competition and left soaking up the profits as a result.
Based on your responses, however, I think we can safely agree to disagree and move on.
Have a great day! I hope you weren't too badly effected by the S3 outage!
EDIT: typo.

By movedx 9 years ago
Two different vendors if you can afford it. It's a bit of a hassle though.

By jacquesm 9 years ago
I like to stick to one, but I have seen some success stories with an AWS/GCE mix :-)
HashiCorp's Terraform makes it a lot easier to go multi Cloud, and abstracting away configuration of the OS and applications/state with Ansible makes the whole process a lot easier too.

By movedx 9 years ago
Here is one success story:
https://cloudplatform.googleblog.com/2017/02/guest-post-mult...

By cfieber 9 years ago
It shouldnt be technically possible to lose S3 on every region, how did amazon screw this up so bad?

By bischofs 9 years ago
I believe the reports here are misleading: if you try to access your other regions through the default s3.amazonaws.com it apparently routes through us-east first (and fails), but you're "supposed to" always point directly at your chosen region.
Disclosure: I work on Google Cloud (and didn't test this, but some other comment makes that clear).

By boulos 9 years ago
Amen. We setup our company cloud 2 years ago in US-West-2 and have never looked back. No outage to date.

By twistedpair 9 years ago
If you have a piece of unvarnished wood handy...

By jacquesm 9 years ago
Is us-east-2 (Ohio) any better (minus this aws-wide S3 issue)?

By compuguy 9 years ago
us-east-2 is brand new and us-east-1 is the oldest region. Any time there is an issue, it is almost always us-east-1. If possible, I would migrate out of us-east-1.

By mullen 9 years ago
Probably valid, though in this case while us-west-1 is still serving my static websites, I can't push at all.

By jchmbrln 9 years ago
The s3 outage covered all regions.

By nola-radar 9 years ago
Really? Even Australia? Can you provide evidence of this so I know for any clients that call me today? :)
EDIT: Found my answer. "Just to stress: this is one S3 region that has become inaccessible, yet web apps are tripping up and vanishing as their backend evaporates away." -- https://www.theregister.co.uk/2017/02/28/aws_is_awol_as_s3_g...

By movedx 9 years ago
That's a really good point!

By notheguyouthink 9 years ago
I used to track DynamoDB issues and holy crap, AWS East had a 1-2 hour outage at least every 2 weeks. Never in any of the other regions. AWS East is THA WURST

By shirleman 9 years ago
The s3 outage covered all regions.

By nola-radar 9 years ago
Yup, same here. It has been a few minutes already. Wanna bet the green checkmark[1] will stay green until the incident is resolved?
[1] https://status.aws.amazon.com/

By alexleclair 9 years ago
The red check mark is hosted on S3...

By nostromo 9 years ago
The truth is stranger than fiction.
https://twitter.com/awscloud/status/836656664635846656

By ak2196 9 years ago
"Care to share the code as an anti pattern?" brilliant.

By crowbahr 9 years ago
[deleted]

By 9 years ago
Comment of the year.

By AtheistOfFail 9 years ago
Fact of the year.

By Kiro 9 years ago
In December 2015 I received an e-mail with the following subject line from AWS, around 4 am in the morning:
"Amazon EC2 Instance scheduled for retirement"
When I checked the logs it was clear the hardware failed 30 mins before they scheduled it for retirement. EC2 and root device data was gone. The e-mail also said "you may have already lost data".
So I know that Amazon schedules servers for retirement after they already failed, green check doesn't surprise me.

By emrekzd 9 years ago
So just as a FYI the reason that probably happened to you is that the underlying host was failing. I am assuming they wanted to give you a window to deal with it but the host croaked before then. I've been dealing w/ AWS for a long long time and I've never seen a maintenance event go early unless the physical hardware actually died...

By smoodles 9 years ago
That what happens when cloud provider doesn't support live migration for VMs.

By amaks 9 years ago
That's completely ridiculous, get some fucking RAID Amazon.
I order drives off newegg directly to my DC and I'm yet to lose data with the cheapest drives available in RAID10.

By problems 9 years ago
Yes, solving problems at your scale and AWS' are quite comparable.

By prdonahue 9 years ago
but I never lost data off an usb stick how hard could it be!

By LoSboccacc 9 years ago
[deleted]

By 9 years ago
Really?!?! Several times USB sticks (and USB HDs) failed on me and other people I work with.

By aurelianito 9 years ago
Not saying my scale is the the same at all - but the fact they can't do something so simple that I can do it as a single individual is embarrassing at best.
Simple solutions to this do scale - Linode and DigitalOcean don't have such issues for example - and while they're not Amazon scale, they are quite large and I'd say they prove the concept.

By problems 9 years ago
EBS data is backed up in multiple redundant ways (using erasure encoding I think).
Local storage is not intended for permanent storage, and is more use at your own risk. That's also why most of the new EC2 instances don't even support local storage.
Availability =/= durability of course

By phonon 9 years ago
It's not just a RAID that can fail. And everyone who uses AWS should expect failures. You should build your infrastructure to handle such failures well.

By foxylion 9 years ago
They offer no RAID on local storage and only the expensive, IO restricted EBS as an alternative.

By problems 9 years ago
Yes, the only way a server can die is from non-raided disks.

By vacri 9 years ago
Otherwise they should at least be providing customers their data back.

By problems 9 years ago
I think you misunderstood the local storage. It is not intended to permanently store data. It's a volatile storage like RAM.

By foxylion 9 years ago
It's crazy how much better the communication (including updates and status pages) is of the companies that rely on AWS than AWS' communication itself.
https://status.heroku.com/incidents/1059

By tuna-piano 9 years ago
Blake Gentry gave a full accounting of Heroku's response process here - http://www.heavybit.com/library/video/every-minute-counts-co...
Amazon should take notes.

By tcsf 9 years ago
I feel for them. Imagine, 40 or 50 different engineering teams all responsible for updating their statuses. At this moment on the AWS status page I see random usage of the red, yellow, and green icons, even though all the status updates are "Increased error rates." What that tells me is that there's no unified communication protocol across the teams, or they're not following it. And just imagine what it's like being on the S3 team right now.
I notice even Cloudflare is starting to have problems serving up pages now.

By all_usernames 9 years ago
Font Awesome went out for me for a bit, but they did a great job getting back up and keeping their users in the loop.
https://status.fortawesome.com/

By SnowingXIV 9 years ago
Also, https://status.pantheon.io

By busterarm 9 years ago
These service health boards are more like advertisement page then actual status of the service.

By tlogan 9 years ago
I guess their bizarre thinking is something along the lines of: "unless we have proof that noone can access the service, we won't change the indicator from green to yellow.
Seriously: I don't understand why you guys stay with AWS.

By mwfj 9 years ago
Because you perceive public clouds only as virtual machine providers, that you can replace with other provider in two days. A detailed cloud migration consists of replacing some parts of your software to use managed services provided by a specific cloud provider, and AWS is still has the best service offerings IMHO. When you use these services carefully also you will see that AWS is very cheap and reliable enough. Outages like today's are happening in every platform and it is possible to mitigate them.
You can use Adwords as a self-service user. Without knowing so much of details you can run your ads but also you can bery easily ruin your budget. But many enterprise customers use it very differently than those users and they are extremely optimizing the cost. Cloud is the same. If you don't know how big customers use AWS, it is normal that you are surprised because AWS is still leading the market.
You say GCP is better than AWS. Which part is better? GCP does not have many services of AWS we benefit from. How can you compare totally different providers? You can only say AWS EC2 is worse than GCP. But you cannot compare whole platforms in one sentence.

By cagataygurturk 9 years ago
(Sorry, I'm late to reply, but since you addended your comment you might still be listening...)
After spending a year evaluating both AWS and GCP (with an emphasis on their managed database services; both SQL and no-SQL) my general feeling is this:
"Microsoft Windows is to Unix as AWS is to GCP".
(Or perhaps closer to the truth: "VMS is to Unix as AWS is to GCP".)
Baically AWS services seem like they are badly designed by buerocratic mediocre engineers following some bureocratic template for "a service".
GCP feels a lot saner (both API- and UI/console-wise). I often got the feeling it's designed by people who:
a) are smart and well-rounded in terms of experiences. It does take cleverness and experience to design something elegant that is also useful.
b) take pride in their work (it does show)
(And then, as a bonus: It's cheaper!)

By johansch 9 years ago
You talk about SQL and No-SQL as managed services and it shows that your experience is limited to a classical application consisting of virtual machines and some data storage. However these are not the only services offered by both platforms and currently AWS has a richer feature set. For example Lambda and its deep integration with whole AWS platform is the biggest game changer from my point of view. If we are talking about virtual machines and databases, I can accept this comparison. However we are talking about 30+ services, some of them are even not available somewhere else and solving serious business problems in production and at scale. It is very wrong to put everything into basket and compare. Maybe GCP has better pub/sub service and AWS has better object storage. These should be compared seperately. Answering to your question, why do we still stay at AWS, because it is solving our problems in the most cost effective way and with reduced complexity, we are happy with it.

By cagataygurturk 9 years ago
Sorry for endless number of typos and mistakes. Obviously I was sleepy while I was writing this.

By cagataygurturk 9 years ago
> Seriously: I don't understand why you guys stay with AWS.
Personally I've been using it for ages and I know most services inside and out. They do suffer downtime in some regions occasionally, but it'd be too expensive at this point to move.
And who doesn't suffer downtime? You can't avoid it; you just need a plan to deal with it. For example, having a backup replica bucket in another region and the ability to quickly switch your CDN over would probably be a good idea here; that's what I did.
If you want to go further you can replicate your data to another cloud provider entirely and use low TTLs to switch to a backup CDN if your system is that mission-critical (in the event of a worldwide AWS failure doomsday scenario).
All systems will fail you and it's our responsibility as IT professionals to have a plan to mitigate this.

By gtsteve 9 years ago
Low TTL on DNS entries might do more harm than good: if your DNS provider gets seriously DDoS, being able to rely on caches can save the day.
Anyway, I agree with your conclusion.

By nicolaslem 9 years ago
Sunk cost fallacy.
I do agree that we should all plan for failures.
However, I also think it's a sign of failure in planning and architecture foresight if it's too expensive to move away from a particular cloud provider.

By johansch 9 years ago
The sunk cost fallacy is when you (irrationally) decide to stick with what you're doing purely because you've already spent a lot of resources on it. It doesn't apply when you've done an economic analysis and found out it doesn't make sense to swap.
There are plenty of cases where it just wouldn't make sense to switch after looking at the costs, opportunity costs, etc. For example, if his site makes him $10 a month, outages cost him $1 a month that could be mitigated by moving, and it would cost $1000 of labor to swap providers. (Depends on interest rates.)
Perhaps it was originally a failure to not have a plan to easily move from a provider, but it doesn't seem unreasonable to me that right now it may cost too many hours of work to justify the move.

By froogle 9 years ago
> I don't understand why you guys stay with AWS.
Who do you recommend instead (assuming in-house or Hetzner-equiv is out of reach)? Google Cloud? Azure? Rackspace?

By rattray 9 years ago
Google Cloud if you're looking for something similar. It's just so much better and cheaper. I think a lot of the resistance here towards that kind of move is just because people are inherently lazy and they aren't paying the bill themselves.
(I'm guessing a relatively large part is also selfish attachment to the market leader because of employment reasons. I hate wasting money, both for myself and for my employer, so I don't really understand this kind of thinking - but I do understand how it could flourish in a venture capital-rich time/locale.)
I also recommend reading:
https://thehftguy.com/2016/06/15/gce-vs-aws-in-2016-why-you-...

By johansch 9 years ago
Google Cloud doesn't exactly have the greatest reliability/uptime either.

By debaserab2 9 years ago
Google also doesn't have the best record for developer tools.

By joatmon-snoo 9 years ago
GC's CDN doesn't cache files bigger than 4Mb. No Windows VMs. Bound to AWS for these 2 reasons.

By chebum 9 years ago
What about something like B2 from https://www.backblaze.com/ ?

By kohuma 9 years ago
S3 in a single region is based out of multiple data centres / availability zone, with data distributed so that the loss of a single availability zone won't impact either data availability or durability, even to the point of being comfortable with complete physical destruction of an AZ. The same applies for Azure, GCP etc.
B2 is based out of a single DC (or at least, was at launch and I don't see anything that suggests that has changed?) You've got to decide what's most important to you. Data persistence or $$$.

By Twirrim 9 years ago
OVH

By pmalynin 9 years ago
Bad idea there, support is horrible.

By LeoHaggins 9 years ago
I think it's more, "if the service can't do what people need it to do, that's a problem; if the service cluster gets wedged hard enough to stop responding to the requests of our monitoring system, that's a failure."
Which would make sense (and is sorta-kinda a best-practice) if Amazon wrote services such that they "crashed early"—but instead they're seemingly written so the backend lock up and be rendered completely useless at "doing its job" but will continue to run just fine.
Either of those two design decisions is potentially a good thing on its own, but they need to be considered in light of one-another if you want your status page to make any sense. If you want to report cluster failures, code your clusters to actually fail. If you want to keep your clusters up, write your monitoring checks as whole-stack acceptance tests.

By derefr 9 years ago
> Seriously: I don't understand why you guys stay with AWS.
You don't seem to have enough experience to comment on the issue.

By notyourwork 9 years ago
Please visit this comment sub-tree:
https://news.ycombinator.com/item?id=13765786

By johansch 9 years ago
That is a regurgitation of your opinion without any facts.
Comparing technology and saying "it seems" or "i feel" isn't really a good argument to convince me one way or the other.

By notyourwork 9 years ago
> Seriously: I don't understand why you guys stay with AWS.
I tried them all and Amazon is still the best.

By tlogan 9 years ago
Postgres on RDS

By cgag 9 years ago
Come to NEXT in a week! :).

By boulos 9 years ago
Any chance UDF iterators for Cloud Bigtable are in the works?
Being able to run distributed D4M/GraphBLAS queries in Cloud Bigtable would be killer.
"From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database" https://arxiv.org/pdf/1606.07085.pdf

By espeed 9 years ago
I'm seeing green checkmarks across the board, but they just added a notice to the top of the page:
> Increased Error Rates
> We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region.

By hartleybrody 9 years ago
I guess sub-1% to 100% failure rate is technically an "increase".

By syntheticcdo 9 years ago
I guess file uploads and downloads are technically "API calls".

By Artemis2 9 years ago
the worst thing is when your system cant handle these "increased error rates" as your control plane cascades failure due to something like this....
The worst "increased error rate" problem I had was when the API was failing and my autoscale system couldnt deal and launched thousands of instances because it couldnt tell when instances were launched (lack of API access) and the instances pummelled the fuck out of all other parts of the system and we basically had to reboot the entire platform....
Luckily, amazon is REALLY forgiving with respect to costs in these (and actually most) circumstance....

By samstave 9 years ago
recalls numerous times
Yes. Yes they are. Thankfully.

By salvor 9 years ago
I always joke that if one of those statuses ever went to red, it means the zombie apocalypse has begun.

By matwood 9 years ago
The number of non-green marks is the number of ICBMs currently in flight towards an AWS data center.

By paulddraper 9 years ago
The good news is, if Amazon's services are marked as offline, you're allowed to use Amazon Lumberyard to control nuclear power plants.

By cperciva 9 years ago
In case anyone wants to see what mysterious the red icon looks like: https://status.aws.amazon.com/images/status3.gif
At best when there are problems (not like now I guess) I will see the "note" green icon https://status.aws.amazon.com/images/status1.gif

By chrisan 9 years ago
I've heard (on the Fnord new show on the most recent CCC congress, so take it with a grain of salt and a bucket of humor) that Amazon's TOS are more or less void when a Zombie Apocalypse breaks out.
They had some convoluted but fairly specific wording in their TOS, whoever wrote must have had a lot of fun.

By krylon 9 years ago
From https://aws.amazon.com/service-terms/
> 57.10 Acceptable Use; Safety-Critical Systems. Your use of the Lumberyard Materials must comply with the AWS Acceptable Use Policy. The Lumberyard Materials are not intended for use with life-critical or safety-critical systems, such as use in operation of medical equipment, automated transportation systems, autonomous vehicles, aircraft or air traffic control, nuclear facilities, manned spacecraft, or military use in connection with live combat. However, this restriction will not apply in the event of the occurrence (certified by the United States Centers for Disease Control or successor body) of a widespread viral infection transmitted via bites or contact with bodily fluids that causes human corpses to reanimate and seek to consume living human flesh, blood, brain or nerve tissue and is likely to result in the fall of organized civilization.

By jfim 9 years ago
First the fall of human civilization has to be a real threat per the TOS so not sure they'll care.
Second, I know the lawyer and yes he had fun.

By grogenaut 9 years ago
Then I guess it has begun, the page is now showing red. I'd put a picture on imgur but it's not loading.

By obsurveyor 9 years ago
http://downdetector.com/status/aws-amazon-web-services looks like a reasonable alternative place to check/report downtime.

By jonstaab 9 years ago
I just check Twitter, since Amazon's status is always a lie. My personal dashboard is still showing no problems. It's bad enough that the main public status is always green even when there's clearly a problem, but you'd think they could at least make the private status accurate.

By zedpm 9 years ago
Which is coincidently down.

By eicnix 9 years ago
Maybe they are hosted on S3 facepalm or maybe they just got a surge in traffic

By talawahdotnet 9 years ago
[deleted]

By 9 years ago
downdetectordown.com ?

By booleanbetrayal 9 years ago
yep, page won't load.

By pure_ambition 9 years ago
Gah. It was up 3 minutes ago. Anyone have any suspicion this is another ddos episode? I saw that SO was down last night too: https://twitter.com/StackStatus/status/836450836322516992

By jonstaab 9 years ago
Pretty confident that isn't it. S3 was returning InternalErrors for 22 seconds before it started timing out and/or returning 503s to all my requests.
I'd bet that something broke (causing InternalError responses) and then nodes started marking themselves as failed (causing the timeouts and 503s soon after).

By cperciva 9 years ago
I want to see the botnet capable DDoSing S3. That would be something.

By chx 9 years ago
Apparently, that's down too. Sigh.

By vjdhama 9 years ago
So, global S3 outage for more than an hour now. Still green, still talking about "US East issue". I'm amazed.

By Fiahil 9 years ago
It doesn't appear to be global; my app in eu-west-1 appears unaffected.
It's possible that the console won't work however as I believe that's served from us-east-1.

By gtsteve 9 years ago
My site hosted on S3 is also running.

By chebum 9 years ago
Looks like they have fixed the issue with their health dashboard now.
From https://status.aws.amazon.com/ : Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

By gordon_freeman 9 years ago
There was an alert on the personal health dashboard[1] a second ago, it said S3 Operational issue in us-east-1 but when I tried to view the details it showed an error.
Then I refreshed and the event disappeared altogether.
[1] https://phd.aws.amazon.com/phd/home?region=us-east-1#/dashbo...

By talawahdotnet 9 years ago
Same here. But it is in the general status dashboard: http://status.aws.amazon.com/

By socialentp 9 years ago
Still green now, 8 minutes in.

By tuna-piano 9 years ago
I've had a few non-Amazon providers tell me AWS things are not working in the last 5 minutes, no note from Amazon though.
Nice.

By bpicolo 9 years ago
Just sent out a notice to our customers via our status page. I really wanted to be able to add a link back to AWS detailing the issue but that's a pipe dream I suppose.

By leesalminen 9 years ago
... still green

By fudged71 9 years ago
Calling @jeffbarr
https://news.ycombinator.com/user?id=jeffbarr

By clamprecht 9 years ago
Looks like his personal site isn't loading... :)

By adrenalinelol 9 years ago
Yup, it is indeed hosted with Amazon.

By johansch 9 years ago
Me right now: https://www.youtube.com/watch?v=_cHa063Mwos

By ceejayoz 9 years ago
We have a slack emoji for it called greenish. It's the classic AWS green checkmark with an info icon in the bottom. Apparently it's NOT an outage if you don't acknowledge it. It's called alt-uptime.

By ak2196 9 years ago
I really liked it. But when trying to add it to my HipChat group it failed to upload. Why? S3 outage, what an irony.

By foxylion 9 years ago
AWS internal lingo calls this the "green-i"

By nhumrich 9 years ago
if(true){ displayGreenCheck() }

By jrcii 9 years ago
Just went yellow
Edit: nevermind

By cheeze 9 years ago
Did it? Still fields of green for me.

By schneidmaster 9 years ago
While keeping the status green for s3, they have at least put up a notice at the top:
Increased Error Rates
We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region.

By malloci 9 years ago
Yeah I just now saw that. Probably regional cache clearing or something.

By schneidmaster 9 years ago
Still green for me

By matthuggins 9 years ago
[deleted]

By 9 years ago
Just went yellow
Increased Error Rates
We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region.
https://status.aws.amazon.com/

By leesalminen 9 years ago
Check individual services ...
Amazon Simple Storage Service (US Standard) Service is operating normally

By nkozyra 9 years ago
Well, at least our decision to split services has paid off. All of our web app infrastructure is on AWS, which is currently down, but our status page [0] is on Digital Ocean, so at least our customers can go see that we are down!
A pyrrhic victory... ;)
[0] - http://status.hrpartner.io
EDIT UPDATE: Well, I spoke too soon - even our status page is down now, but not sure if that is linked to the AWS issues, or simply the HN "hug of death" from this post! :)
EDIT UPDATE 2: Aaaaand, back up again. I think it just got a little hammered from HN traffic.

By cyberferret 9 years ago
When even the status page is down, panic.

By jariz 9 years ago
Could be worse, your entire infrastructure could be hosted on Heroku.
You don't use S3 but because they do, your entire infrastructure crumbles.

By AtheistOfFail 9 years ago
I didn't realize Heroku used s3 until today, when my heroku app failed. Makes me wonder why I'm using heroku instead of just aws.

By bananabill 9 years ago
If you're looking for the simplicity of Heroku but want to run on raw AWS check out Convox. It's open source and free to try.
https://convox.com
Disclosure: I'm one of the cofounders

By ddollar 9 years ago
Heroku is a rewrap of AWS for simplicity.

By AtheistOfFail 9 years ago
You're paying Heroku to not have to think about deployment or scale, which is also why their marketplace is successful - who wants to think about managing a database, when two clicks and a few environment variables later you can have one. Heroku is great for devs who don't want to think about ops and can afford to throw money at the problem (it gets really expensive really fast).

By eddieroger 9 years ago
Skyliner is an option if you're doing real production stuff.

By sachinag 9 years ago
I don't see why this is being downvoted. It's a pretty legitimate concern.

By lambdasquirrel 9 years ago
esp since my side project on heroku which uses nothing is down because of this.

By 0xCMP 9 years ago
The biggest change heroku needs to make is support different regions.

By ExactoKnight 9 years ago
HTTP 500 :(

By insomniacity 9 years ago
Plot twist: Digital Ocean is secretly hosted on AWS

By crack-the-code 9 years ago
shhhhhhhhhhh!!!

By neom 9 years ago
LOL...I just got notice that our status server is down now too! Maybe DO is just a rebranded offshoot of AWS after all... :D

By cyberferret 9 years ago
FYI to S3 customers, per the SLA, most of us are eligible for a 10% credit for this billing period. But the burden is on the customer to provide incident logs and file a support ticket requesting said credit (it must be really challenging to programmatically identify outage coverage across customers /s)
https://aws.amazon.com/s3/sla/

By gmisra 9 years ago
The 10% savings of ~$10 does not compare to time/potential business lost, but thanks for the tip :)

By primitivesuave 9 years ago
> potential business lost
My startup's op team had a great discussion today because of this that basically boils down to "if we hit our sales goals, an incident like this a year from now would end our company".
Looks like our plans to start prepping for multi-cloud support will be a higher priority.

By mabbo 9 years ago
> an incident like this a year from now would end our company
I'm genuinely curious, what kind of business are you in that a four hour outage would end the company? High frequency trading or something?

By omni 9 years ago
You're in the right ball park- services for traders and brokers in the finance industry. A two hour outage during trading hours would be an extinction level event.

By mabbo 9 years ago
not op, but when i worked in the ticketing business if this happened during a big on sale we would get hosed.

By itake 9 years ago
A low-cost first step is enabling cross-region replication [1].
[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html

By kevan 9 years ago
I'd be willing to bet that the effort you put into a multi-cloud solution will be more expensive than you think, and far more brittle in the event of an emergency. It always is.

By owenmarshall 9 years ago
You're not wrong, but if your customers can't afford an outage, you can't afford not having a fallback plan. Probably worth a few trial runs during low volume hours to be sure.

By mabbo 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
thats for below 99.9% - they are at 99.997% .. you are never getting that 10% credit..

By machbio 9 years ago
0.1% of 28 days is 40 minutes, so it seems likely to happen.

By christop 9 years ago
I was calculating it for a year - maybe the availability applies to per billing cycle - you may be correct..

By machbio 9 years ago

You got your orders of magnitude wrong ;)

    99.9964583 = 100 - 153/(30*24*60)
    99.6458333 = 100 * (1 - 153/(30*24*60))

By joatmon-snoo 9 years ago

I was calculating it for a year - maybe the availability applies to per billing cycle.. I am still not able to understand your math - mind explaining

By machbio 9 years ago
Your numbers are still off for a year.
```
    99.9997089 = 100 - 153/(365*24*60)
    99.9708904 = 100 - 100*153/(365*24*60)
```
The formula is 100% minus 100% times downtime/time in month/year.
153 is the number of minutes they were down going off the reported updates at https://status.aws.amazon.com/ - 11:35AM PST was when they fixed the status page, 2:08PM PST was when S3 was fully back online. (And 153 is underestimating it, because there were errors going on for long before they fixed the status page, but I don't have timestamps on that.)

By joatmon-snoo 9 years ago
From Amazon: https://twitter.com/awscloud/status/836656664635846656
```
    The dashboard not changing color is related to S3 issue.
    See the banner at the top of the dashboard for updates.
```
So it's not just a joke... S3 being down actually breaks its own status page!
By geerlingguy 9 years ago
For this kind of page it might be best for them to use a data URI image to remove as many external resources as possible.

By etler 9 years ago
Unicode characters would work fine, and be even smaller.
Warning sign, octagonal sign, no Entry (all filtered by HN).
There are plenty of possibilities.

By Symbiote 9 years ago
I was thinking they should host the little green check mark icons on s3.

By JangoSteve 9 years ago
Thank god I checked HN. I was driving myself crazy last half hour debugging a change to S3 uploads that I JUST pushed to production. Reminds me of the time my dad had an electrician come to work on something minor in his house. Suddenly power went out to the whole house, electrician couldn't figure out why for hours. Finally they realized this was the big east coast blackout!

By jliptzin 9 years ago
Disadvantage of being in the detail I guess. My thinking was Imgur seems broken today >>> Something major on the intertubes must be fk'd.

By Havoc 9 years ago
Precisely how I discovered it. Imgur down. Imgur is almost like a piece of critical Internet infrastructure. That + some other site misbehaving tipped me off that something very wrong is happening...

By TeMPOraL 9 years ago
irc.freenode.net / ##aws (must be registered with nickserv to join)
outage first reported around 11:35CST.

By stevehawk 9 years ago
Corporate language is entertaining while we all pull out our hair.
"We are investigating increased error rates for Amazon S3" translates to "We are trying to figure out why our mission critical system for half the internet is completely down for most (including some of our biggest) customers."

By ethanpil 9 years ago
Coincidence?
https://twitter.com/homakov/status/836649802842591232
I've been fuzzing S3 parameters last couple hours...
And now it's down.

By maxerickson 9 years ago
[deleted]

By 9 years ago
All: I hate to ask this, but HN's poor little single-core server process is getting hammered and steam is coming out its ears. If you don't plan to post anything, would you mind logging out? Then we can serve you from cache. Cached pages are updated frequently so you won't miss anything. And please do log back in later.
(Yes it sucks and yes we're working on fixing it. We hate slow software too!)

By dang 9 years ago
"I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror, and were suddenly silenced. I fear something terrible has happened."

By greenhathacker 9 years ago
Down for us as well. We have cloudfront in front of some of our s3 buckets and it is responding with
```
    CloudFront is currently experiencing problems with requesting objects from Amazon S3.
```
Can I also say I am constantly disappointed by AWS's status page: https://status.aws.amazon.com/ it seems whenever there is an issue this takes a while to update. Sometimes all you see is a green checkmark with a tiny icon saying a note about some issue. Why not make it orange or something. Surely they must have some kind of external monitor on these things that could be integrated here?
edit: Since posting my comment they added a banner of
"Increased Error Rates
We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region."
However S3 still shows green and "Service is operating normally"

By chrisan 9 years ago
To mitigate the effect of S3 going down I use cross-region replication to replicate objects to another S3 region. If S3 went down in my primary region I could update the app config to write back to the backup region and update the CDN configuration to use the backup region as an origin.
I did that out of paranoia but it turns out this could happen to us. Does that sound like a sensible approach?
Fortunately all my company's stuff is in eu-west-1 which still seems to be fine.

By gtsteve 9 years ago
I'm certainly not an expert here, but just to make you feel good (if nothing else), this is exactly what we did this morning (Melbourne time). Woke up to a bunch of flailing Lambda funcs on us-east-1. Luckily we're using Apex so cross deploying them to Singapore took all of 30 seconds. We were concerned about API Gateway since it was also sitting in us-east-1 but ended up not being an issue. Realized that redeploying S3 and Lambda across regions can be done in practically no time, but we would have been in trouble had we needed to replicated our APIs in another region. Going to start exploring spec'ing up our existing gateways in swagger to help with this.

By kaishiro 9 years ago
Sysadmin: I can forgive outages, but falsely reporting 'up' when you're obviously down is a heinous transgression.
Somewhere a sysadmin is having to explain to a mildly technical manager that AWS services are down and affecting business critical services. That manager will be chewing out the tech because the status site shows everything is green. Dishonest metrics are worse than bad metrics for this exact reason.
Any sysadmin who wasn't born yesterday knows that service metrics are gamed relentlessly by providers. Bluntly there aren't many of us, and we talk. Message to all providers: sysadmins losing confidence in your outage reporting has a larger impact than you think. Because we will be the ones called to the carpet to explain why <services> are down when <provider> is lying about being up.

By johngalt 9 years ago
People were joking about this but it turns out to be true: they host the status icons on their service: https://twitter.com/awscloud/status/836656664635846656

By carbocation 9 years ago
Due to HN's flaky Cloudflare 503 Bad Gateway error, I noticed that Cloudflare is also being affected by S3 being down in a similar but subtle way. See their status page's broken logo on the upper left hand corner.[1] It was actually directly linking to a S3 URL: https://s3.amazonaws.com/statuspage-production/pages-transac...
[1]: https://www.cloudflarestatus.com/

By devy 9 years ago
https://www.cloudflarestatus.com/ http://www.trellostatus.com/
Looks like they are both using the same solution for their status pages. The icon for trellostatus did also fail to display.

By nso 9 years ago
Was HN affected by Cloudbleed?
I would rather access HN without Cloudflare as man-in-the-middle, especially over HTTPS.

By frik 9 years ago
All websites that use Cloudflare were potentially affected by Cloudbleed which is what makes it such a terrible thing.
You will never know the exact damage, the only thing you can do to play it 100% safe is to rotate all credentials on sites using Cloudflare.
And you can't access HN without going through Cloudflare (unfortunately, but HN is having a hard enough time to keep up with traffic as it is, without Cloudflare it would perform a lot worse than it does).

By jacquesm 9 years ago
Hahaha too bad.
Google Analytics, Cloudflare, AWS, those are things you can never escape from.

By shp0ngle 9 years ago
Disagree on Google Analytics.

By jacquesm 9 years ago
EDIT: Misread parent.

By qeternity 9 years ago
Because your status page shouldn't depend on your own infrastructure. Literally the problem Amazon is having right now.

By dsl 9 years ago
[deleted]

By 9 years ago
I guess they don't want to host their status page on their own CDN in case it went down too.

By chinhodado 9 years ago
But only the logo image hosted on S3 is broken though. It seems to be preventable if they host the logo image together with their status page.

By devy 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
Saw that too, sounds like a convenient excuse for being caught in a lie.
AWS Employee #1: Hey, people are catching on that our status page isn't accurate
AWS Employee #2: Tell them it's cause of S3

By swearfu 9 years ago
I'd suspect the humiliation of hosting your own status page on the infrastructure it's monitoring would far outweigh the "lie".

By joshmanders 9 years ago
People are much more forgiving of mistakes than they are of deception.

By swearfu 9 years ago
The icons aren't hosted there (or if they are, they are cached). https://status.aws.amazon.com/images/status3.gif
The status information is hosted there.

By paulddraper 9 years ago
Looks like the dashboard is fixed https://twitter.com/awscloud/status/836662601090134017

By samaysharma 9 years ago
The "red check mark is stored on S3" may have been sarcasm, but apparently there was a kernel of truth to it?!
Poor show when a service disruption means the status page can't be updated....

By JimboOmega 9 years ago
[deleted]

By 9 years ago
While status icon being hosted on S3 is funny, I think it's more likely that it's not the icon itself that caused the status page to not getting updated, but rather the fault of service information (say, a JSON file) that used to generate the status page that was stored on S3. The banner could probably be configured locally, so they choose to update that for the time being (e.g. while moving the status bucket somewhere else).

By sirn 9 years ago
> Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard.

By buryat 9 years ago
I like how HN (and others) handle this - there should be a static link to a 3rd party source, like a twitter feed, at the top of any status page.

By jpwgarrison 9 years ago
if that simple, why the text desc for Details also didn't reflect the incident?

By taobility 9 years ago
Is there any service that distributes your files to multiple cloud services at the same time? With this recent S3 outage, I'm now feeling uneasy to store files on S3 for mission critical apps.

By jaequery 9 years ago
The outage was on us-east-1. If you are hosting mission-critical files in a single region, S3 is not the problem.

By outworlder 9 years ago
To be fair, most people don't use the region replication thing for S3. Of course this is why I push folks to use GCS's default Multi-Regional buckets, because when a regional outage does occur, it's awfully nice to at least have the option to spin up elsewhere. If your critical data is in Virginia today, all you can do is wait.
Disclosure: I work on Google Cloud, and we've had our fair share of outages.

By boulos 9 years ago
I believe that IPFS together with Filecoin is intended to be something like this in a broader, free market sense. Unfortunately IPFS is probably far from ready for mission critical apps and Filecoin hasn't launched at all.

By RehnoLindeque 9 years ago
ins3ption

By kangman 9 years ago
It's unbelievable that the status page is still showing green checkmarks, almost what, 2 hours into the outage?
edit: oh, it is actually because of the outage! So if they can't get a fresh read on the service status from s3, they just optimistically assume it's green... even though the service failing to provide said read... is one of the services they're optimistically showing as green XD

By MaxfordAndSons 9 years ago
Hey can't change it due to the S3 issue. See their twitter post: https://twitter.com/awscloud/status/836656664635846656

By purplecones 9 years ago
Wow so they can't update the S3 status page currently due to S3 issues, including the status page processing to update it, which runs upon S3.
That raises many more questions about how well accounted outages have been in the past and equally reported. Then the design aspect that in itself highlights if you run things in the cloud, what fallback do you have if that goes wrong. So certainly the impact from this outage is going to echo for a while, with many questions being asked.

By Zenst 9 years ago
Smells BS. As pointed out in https://news.ycombinator.com/item?id=13757284, text should have reflected the real situation. So the icons are either not the only problem or just an excuse.

By smarx007 9 years ago
It seems status pages should be on entirely independent infrastructure, give the criticality of the information they provide. Perhaps even a separate domain.

By douglasfshearer 9 years ago
Same related flaw as Three Mile Island. Fail closed and measure the output, not the intent.

By sitkack 9 years ago
> Because we [sysadmins] will be the ones called to the carpet to explain why <services> are down when <provider> is lying about being up.
But isn't that the whole point of lying: to the less technical manager (often the only person whose view matters at major customers), the status board saying "up" means the problem is the sysadmins, not the vendor.

By dragonwriter 9 years ago
That works in the vendor's favor in the short term, but can screw them in the long term because you get staff who go the extra mile to avoid the vendor in the future, including structuring requirements to avoid them.
For example, by experience and gossip I know Wind stream has awful reliability, but they handwave that away. By including a requirement I knew they couldn't meet (dynamic E911), they were knocked out of a 200 site VoIP RFP early.

By mjcl 9 years ago
Hurry, look now, so you can tell your grandchildren!!!
Greenish ELB, RDS.
Yellow EC2, Lambda.
Red S3, Auto Scaling.
EDIT: A few dozen services in us-east-1 are down/degraded.

By paulddraper 9 years ago
> but falsely reporting 'up' when you're obviously down is a heinous transgression.
When SLA's are in play and so are job performance scores and bonuses there is probably a strong incentive to fudge numbers. It can be done officially ("Ah but sub-chapter 3 of chapter X in the fine print explains this wasn't technically an outage") or unofficially.

By rdtsc 9 years ago
When I worked in Antarctica any outage affecting users that lasted over 50 minutes was considered an official "outage" and had to be reported to mission command. So of course ALL maintenance was rolled back/backed out if it came anywhere even close to 50 minutes, just so we wouldn't have to fill out the stupid outage paperwork.

By vocatus_gate 9 years ago
Thank you for the insight. Could you and/or any sysadmin on here elaborate on what a "nail in the coffin" situation might look like? For example, is this current outage with inaccurate status updates enough to seriously consider migrating to another CDN provider? If so, which one would you migrate to?

By primitivesuave 9 years ago
Disclaimer, not a job-toting sysadmin quite yet, but here's my 2¢:
- Architectural SPOFs (single points of failure) need to be carefully weighed up in any design, and "ALL our files are on $single_provider" is one such huge red flag. Unfortunately these considerations are all too frequently drowned out by the ease of going with the least path of resistance.
For example GitHub occasionally goes down, which breaks a remarkable amount of infrastructure: a huge number of people don't know how to use Git, do full clones from scratch each time, and have no idea how to work without a server (even though Git is built to work locally); CI systems tend to want to do green-field rebuilds, so start out with empty directory trees and need to do full clones each build (I'm not sure if any CI systems come with out-of-the-box Git caching); GH-powered authentication systems fall apart; etc. Kinda crazy, scary and really annoying, but yeah.
In terms of "nail in the coffin", that depends on a lot of factors, including a subjective analysis of how much local catastrophe was caused by the incident; subjective opinions about the provider's reaction to the issue, what they'll do to mitigate it, perhaps how transparent they are about it; etc.
Ultimately, the Internet likes to pretend that AWS and cloud computing is basically rock-solid. Unfortunately it's not, and stuff goes down. There were some truly redundant architecture experiments in the 80s (for example, the Tandem Nonstop Computer, one of which was recently noted to have been running continuously for 24 years: https://news.ycombinator.com/item?id=13514909) but x86 never really went there, and superscalar computing is built on a sped-up version of the same ideas that connect desktop computers together, so while there are lots of architectural optical illusions, well, stuff falls apart.
- Everyone in this thread is talking about Google Compute Engine, but it really depends on your usage patterns and requirements. GCE is pretty much the single major competitor to AWS, although the infrastructure is _completely_ different - different tools, different APIs, different pricing infrastructure. The problem is that it's not like like MySQL vs PostgreSQL or Ubuntu vs Debian; it's like SQL vs Redis, or Linux vs BSD. Both work great, but you basically have to do twice the integration work, and map things manually. With this said, if you don't have particularly high resource usage, VPS or dedicated hosting may actually work out more cost-effectively.
TL;DR: you go back to the SPOF problem, where _you_ have to foot the technical debt for the reliability level you want. Yay.

By i336_ 9 years ago
This is why I always set up my own monitoring for services in addition to the provider's status page. Simple SmokePing graphs have saved me a ton of time when it comes to troubleshooting provider outages. It especially helps when I can show them exactly when there are problems.

By discreditable 9 years ago
Why is always the manager that is the bad guy in these scenarios? Haven't we grown up yet?

By jnordwick 9 years ago
The manager is not the bad guy. They are doing everything they should do in the scenario I presented. Checking into an outage affecting a critical system. Criticizing the sysadmin's findings based on the evidence that Amazon's status page disagrees. I don't expect a non-technical party to believe me over Amazon.
The bad guys are the providers who report false positives to preserve metrics.

By johngalt 9 years ago
Just commenting here because hopefully people can see: AWS status page updated: 1:44 CST

By rabidonrails 9 years ago
[deleted]

By 9 years ago
Any is always the manager that is the bad guy in these scenarios? Haven't we grown up yet?

By jnordwick 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
It's not a lie, it's an "alternative fact" about how totally like awesome AWS is!

By rdiddly 9 years ago
They don't show it on the status dashboard at https://status.aws.amazon.com/ (at least at the time I originally posted this comment).
But if you go to your personal health dashboard (https://phd.aws.amazon.com/phd/home#/dashboard/open-issues) they report an S3 operational issue event there.
Edit: Mine is reporting region us-east-1
Edit 2: And now the event disappeared from my personal health dashboard too. But we are still experiencing issues. WTH.

By jrs235 9 years ago
Also seeing the intermittent event on the personal dashboard. Wonderful.

By chickenfries 9 years ago
Apparently they need a special/separate status page for everyone's personal health dashboard too. SMH.

By jrs235 9 years ago
[deleted]

By 9 years ago
It's interesting to note the cascading effects. For example, I was immediately hit by three problems:
* Slack file sharing no longer works, hangs forever (no way to hide the permanently rolling progress bar except quitting)
* Github.com file uploads (e.g. dropping files into a Github issue) don't work.
* Imgur.com is completely down.
* Docker Hub seems to be unavailable. Can't pull/push images.

By atombender 9 years ago
A doctor's office that's unable to process patients due to the outage:
https://mobile.twitter.com/drjincali/status/8366578638879989...

By mcphilip 9 years ago
I mean, that's not really AWS's problem, is it? Outages happen. If you have a mission-critical service like health care, you really shouldn't write systems with single points of failure like this, especially not systems that depend on something consumer-grade like S3.
This appears to be a normal doctor's office where there are routine appointments. Emergencies would be referred to the ER anyway. And while I obviously don't know the details of how his office is run, you'd think that you could get by on a pen-and-paper fallback to manage the office. Maybe that's an advantage to keeping experienced office staff on board.

By cookiecaper 9 years ago
I work in the healthcare industry and there's a big push from AWS into offering HIPAA compliant services for things like patient records. It's becoming much more common to tie in third party services into electronic healthcare software. Obviously no mission critical system should have a single point of failure and doctor's offices should have fallback plans for handling service outages, but most care providers don't have staff onsite with the technical expertise to understand the extent of the coupling. I'm just closely watching this space and found that tweet interesting in relation to the parent comment's remark about realizing the scope of this S3 outage. There's no blame unique to AWS here, but it is becoming an increasingly important piece of plumbing in the industry.

By mcphilip 9 years ago
Fine. But that's just buying into one of the very common misconceptions about AWS (or any hosting provider), no? This idea that Amazon sells a fault tolerant product. They don't. Amazon sells you tools that can make a fault tolerant product, but making your own product resilient is entirely upon you.

By inferiorhuman 9 years ago
Whatever software that doctor is using should have been built with offline capability (local storage, syncs to servers when network connectivity is restored).

By ArlenBales 9 years ago
On the upside.. no one is stealing more data from the CloudPets thing we were talking about earlier today.

By monksy 9 years ago
We're still up and running (Sync.com) if you need to share some files.

By jasonsync 9 years ago
AWS - too big to fail.

By cylinder 9 years ago
My own service was running just fine, until I tried to push out a (fairly critical) bugfix. Sadly, my deploy procedure relies on my build server pushing docker images to quay.io, and prod servers pulling them back down from quay.io, and quay.io hosts their Docker registry in S3. Time to make some apologies and excuses to my users...

By azernik 9 years ago
* Hipchat file sharing no longer works, hangs forever * CircleCI cannot access artifacts

By Ph4nt0m 9 years ago
I was also trying to open gotomeeting web client, no luck.

By kevin2r 9 years ago
Zoom was down too.

By smackfu 9 years ago
Heroku has issues too.

By chrisper 9 years ago
i was unable to purchase cat food at my local pet store with a credit card because their POS software was down

By pweissbrod 9 years ago
what's truly incredible is that S3 has been offline for h̶a̶l̶f̶ ̶a̶n̶ ̶h̶o̶u̶r̶ two hours now and Amazon still has the audacity to put five shiny green checkmarks next to S3 on their service page.
they just now put up a box at the top saying "We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region."
increased error rates? really?
Amazon, everything is on fire. you are not fooling anyone
edit: in the future, please subscribe to @MyFootballNow for timely AWS service status updates https://pbs.twimg.com/media/C5xdm9_WMAAY7y_.jpg:large

By fletom 9 years ago
@mikecb on Twitter explained it well. "The red icon is stored in S3 US East."

By idlewords 9 years ago
Apparently this is not a joke:
“The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.”
https://twitter.com/awscloud/status/836656664635846656

By jasoncrawford 9 years ago
I think that might be my favourite tweet this year so far.

By noir_lord 9 years ago
There are some real gems @Pinboard too: "Green checkmark = no lava in data center. Green checkmark with information icon = data center filling with lava https://status.aws.amazon.com"

By mpetrovich 9 years ago
FYI, in seriousness you can see the fabled red status icon here:
https://status.aws.amazon.com/images/status3.gif
It does exist, apparently.

By paulddraper 9 years ago
It showed up during the big DynamoDB outage last year.

By rsynnott 9 years ago
Is the manager of that group still working there post-outage?

By busterarm 9 years ago
I thought this was a joke, but apparently not too far off:
"The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."
https://twitter.com/awscloud/status/836656664635846656

By tlrobinson 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
While that may be true, that's not the reason you're seeing green. You should have been seeing a broken image or a status page not finishing loading if that was an issue.

By scarlac 9 years ago
https://en.m.wikipedia.org/wiki/Sarcasm

By eknkc 9 years ago
Or so we thought...
https://twitter.com/awscloud/status/836656664635846656

By eherot 9 years ago
The best jokes always have a grain of truth.

By mikecb 9 years ago
If this is how they're going to handle an outage of their premier AWS service, it's other cloud providers that will be seeing green.

By koolba 9 years ago
Funny, cloudflare is also having trouble, page only showed on the third request.
Perfect storm.
As for other cloud providers seeing green: Or maybe people will come to their senses and will see that monocultures are bad, whether in biology or hosting.

By jacquesm 9 years ago
> Funny cloudflare is also having trouble, page only showed on the third request.
I bet they're related. The moment I got an alert of the S3 outage I started refreshing a bunch of status pages at a feverish pitch. Multiply that by a thousands of others doing the same and boom you've got the equivalent of a DDOS.

By koolba 9 years ago
I've been getting intermittent Bad Gateways on HN for the last few days.
Ray ID: 33863460edf54231

By maxerickson 9 years ago
That was (obviously) sarcasm :)

By general_failure 9 years ago
Humour is wasted on the internet.

By tonyedgecombe 9 years ago
or was it? dun dun

By mikecb 9 years ago
(Disclaimer: I work for AWS.)
The dashboard is not changing color due to the S3 issue. We're updating the banner in place of that.
Edit: Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.
http://status.aws.amazon.com/

By ckozlowski 9 years ago
For some reason, reading about "believe we understand root cause" made me think of: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

By Ph4nt0m 9 years ago
Maybe you could encourage your colleagues to host the status page outside of AWS?

By perlgeek 9 years ago
We'll have to wait for the postmortem, but I bet it was an unintentional dependency on S3 that no one realized had come into place until S3 went down -- especially considering how fast they were able to remove the dependency and fix it.

By oxguy3 9 years ago
This reminds me of a GitHub outage from them having a build dependency on GitHub. IIRC, they tried to roll back to building a prior version but since the site was offline, the build failed.

By koolba 9 years ago
S3 gets used to store a lot of static content. Can't speak for that team, but I'm sure they'll take that feedback. Happy the banner functionality remained unimpeded.

By ckozlowski 9 years ago
Possibly AWS status page wisely relied on a third party, which relied on a fourth party, which relied on S3.

By all_usernames 9 years ago
It took them ~30 minutes.

By idlewords 9 years ago
It took them 2 hours actually.

By LunaSea 9 years ago
Maybe with GCP? :)

By johansch 9 years ago
I'm happy to offer up some spare space on my godaddy hosted Linux plan if that helps...

By apapli 9 years ago
Could you go more in depth? What does S3 have anything to do with it?

By danappelxx 9 years ago
I think the most reasonable guess is that they have some backend system that continously pushes some status json/xml file to an S3 bucket.
Then there's the frontend, that apparently periodically reads this file from S3 and caches the results.
I guess the comment they added on the top after two hours of being in the dark was likely manually added to the web frontend.
Obviously all of this would be hilariously badly designed if it was made this way. Still...

By johansch 9 years ago
It's where they store the error icon.

By unfunco 9 years ago
So the "working" icon works, but the "not working" doesn't? I'm not sure that's right.

By danappelxx 9 years ago
Tragic comedy gold

By vocatus_gate 9 years ago
[deleted]

By 9 years ago
What we need are status pages that are driven by votes from verified customers, which could also serve to inform the provider about issues.
This would address issues that are only visible from the outside.

By stretchwithme 9 years ago
And system monitoring which isn't dependent on itself. Kind of a "duh" kind of thing...

By laughfactory 9 years ago
http://outage.report/ does this pretty much, except for the "verified customers" part.

By ATsch 9 years ago
> Amazon, everything is on fire. you are not fooling anyone
Fun story, when I was an intern at Amazon there was actually a warehouse fire. The result was a lot of manual database entry updating as products were determined to be destroyed or still fit for sale.

By j2kun 9 years ago
I'm curious about what happened to products that were no longer fit for sale, but still fit for use. Do you recall?

By wildmusings 9 years ago
In the military, a warehouse fire or equivalent suddenly generates a ton of "backdated transfer requests" showing that various stock had been sent to the warehouse just previously!

By robaato 9 years ago
This sounds like rank corruption. Surely such a thing is rare in the military?

By dTal 9 years ago
To be fair, there's a plausible explanation for what robaato describes that doesn't involve corruption. Suppose it's standard or common to move things first and then file such "backdated transfer requests". After a fire that destroys everything in a warehouse, there would be a flurry of activity to quickly account for everything that was destroyed, so paperwork that would otherwise have trickled in over a month or two might suddenly be hurriedly filed in a few days.

By wildmusings 9 years ago
It could just be lackadaisical administration that only gets urgently addressed when there is something perceived as a particular problem.
The military is not exactly known for being great at keeping track of things that aren't nuclear weapons, and sometimes falls short even on those.

By dragonwriter 9 years ago
[deleted]

By 9 years ago
There is "Amazon Warehouse Deals", Amazon itself acting as a used products seller on Amazon. This is usually used for returns etc., but I wouldn't be surprised if they also handle something like this.

By hobofan 9 years ago
There are businesses that specialize in remaindering fire-damaged goods -- mostly stuff that smells like smoke.
They showed up in my town in the early 1980s after one of our local malls had a smokey fire. They sold a bunch of stuff that came from other places, too, including a ton of 15mm miniature soldiers.

By greglindahl 9 years ago
The AWS Status page will lie to you: https://medium.com/@ev.dev.dev/the-aws-status-page-will-lie-...

By evtothedev 9 years ago
I like how this post says "if you look at the AWS Status Page, this what you see". but you can't see the image. because S3 is down.

By fletom 9 years ago
I thought this was funny so I took a screenshot of the blog post and uploaded it to the company Slack. The upload failed because Slack uses S3.
This is getting crazy.

By brianpgordon 9 years ago
If this isn't good evidence that amazon downright lies on their status page and that no green checkmark should ever be considered trustworthy, I don't know what is.

By kevin_b_er 9 years ago
Why is the status board hosted on AWS? Most providers host such pages on a 3rd party, specifically for this reason; correlated failure.

By twistedpair 9 years ago
> edit: in the future, please subscribe to @MyFootballNow for timely AWS service status updates https://pbs.twimg.com/media/C5xdm9_WMAAY7y_.jpg:large
So this is what centralization looks like.

By rinze 9 years ago
I hear that their process for updating the status page involves S3.

By skywhopper 9 years ago
it would appear that you are correct
"The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates." https://twitter.com/awscloud/status/836656664635846656

By fletom 9 years ago
I very seriously thought that it was a joke to say that S3 was needed to show the red icon, but apparently they can't update the dashboard about the status of S3 because of S3.

By kevin_b_er 9 years ago
So... https://twitter.com/awscloud/status/836656664635846656
"The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."

By knaik94 9 years ago
the non-green icon is probably hosted on s3 (i'm not trying to be funny)

By gtrubetskoy 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
"Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard."
Yep.

By whafro 9 years ago
I love it, like that fixes the problem! ..now fix the REAL problem

By _ao789 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
I don't think they intentionally kept the checkmarks there. They probably just didn't update it as quickly as developers made a post on Hacker News (not surprising, they were probably investigating).

By artursapek 9 years ago
After having seen multiple AWS outages/service disruptions, with nothing other than a green checkmark ever showing, I am now very confident that the checkmarks are hardcoded and there is no logic behind them.

By hobofan 9 years ago
It's already been confirmed by amazon employees on HN that the color can only be changed manually by an employee and it needs a high level of approval.
Also, there are incentives based on colors, so the managers really don't want to admit any failure.

By user5994461 9 years ago
While this is a personal feeling and I don't have any data (metrics) to back it up: I think a large percentage (and probably a majority) of metrics don't end up helping a company once they are created - especially if any salary or bonuses are based on them. They are always so gamed that they become worthless.
This is a great case in point if true.

By bdavisx 9 years ago
Your point hits on a true thing. One problem is that companies measure proxies for performance, not performance itself. A great book on the topic (and related topics) is Weapons of Math Destruction. Anyway, a green checkbox is pretty far into proxie-land. It's not very closely related to client retention or profitability, and now we see it's not even related to operational time of the equipment. Yikes. So a proxy like this is not even worth using as a metric; it can only cause false confidence that some information is known, and that leads to bad decisions. Not the least of which is bonusing incompetent managers.

By rab-the-goat 9 years ago
Amazon has metrics so they can tell a story, not so they can measure things.
As a cute example, one of their senior people (in a stats heavy role) couldn't explain how they'd detect if people wanted to be able to automatically order socks and tshirts on a buying cycle outside of what I call the "scheduling horizon", eg every 3-6mos. (Things I need regularly, but sparsely enough it doesn't stand out to do proactively -- eg, I buy socks when they all have holes, not on a reasonable replacement cycle.)

By SomeStupidPoint 9 years ago
Yup probably some incentives due to SLA's for their larger customers.

By KnoopKnoop 9 years ago
> Also, there are incentives based on colors, so the managers really don't want to admit any failure.
A textbook case of "wrong incentives". #1 incentive should be satisfied customers.

By mschuster91 9 years ago
You can have a high level of customer satisfaction if you lie to them and arrange so that they don't even notice, and by having a good damage control strategy for when some customers do notice they aren't getting that they were promised.
Such approach has better ROI than actually doing high quality products or services, which is why so much of what we buy is utter shit. That's especially true on the mass market, when satisfaction of individual customers doesn't impact your company at all, as long as they're not complaining too loud.

By TeMPOraL 9 years ago
Are any incentives affected by customer complaints? Because if so, all we need to do is complain and they might actually start using the status system meaningfully. (I'm sure this is wilful thinking, though; I doubt people don't complain in this situation!)

By oneeyedpigeon 9 years ago
when you start investigating an outage, that is exactly when you should change your checkmark to yellow if not red. if you're as big as AWS there should not be any more than a minute or two between when your service goes down and when you actually update your status page to show that.

By fletom 9 years ago
It should actually just be automated.

By CaptSpify 9 years ago
It's not just us-east-1! They're being extremely dishonest with the green checkmarks. We can't even load the s3 console for other regions. I would post a screenshot, but Imgur is hosed by this too.

By STRML 9 years ago
IIRC the Console is mostly hosted out of US-EAST-1. Direct API calls to other S3 regions are likely to work, but it's not surprising the Console's having trouble.

By ceejayoz 9 years ago
Its unreal watching key web services fall like dominoes. Its too bad the concept of "too big to fail" applies only to large banks and countries.

By rrggrr 9 years ago
[deleted]

By 9 years ago
"To big [to allow] to fail" is what that term means.
This would extend to a service like amazon actually, where survival of the service would be an extraordinary effort in case this problem lasted for a long time.
The way you imagined it, as 100% uptime, is incorrect.

By elastic_church 9 years ago
I didn't get that at all from the OP. His comment I'm fairly certain is of the "why the heck are we centralizing the web to 2 or 3 infrastructure companies" for what amounts to a minor amount of convenience.
We've seen this story play out in other industries and it never works out well for average people. It's been astounding for me to watch the pace of this centralizing and who is helping it along.
The tldr; point is that a single service provider should not have the amount of control Amazon does over the Internet. At least that's my take.
I know my opinion on this wildly differs from the HN crowd and SV "decision makers" these days - what is so curious to me is that this is a complete 180 from that same demographic even 10 years ago.

By phil21 9 years ago
Yes: "why the heck are we centralizing the web to 2 or 3 infrastructure companies". Its a systemic asset, unregulated, in a world where every systemic asset is regulated (eg. utilities, transportation infrastructure, etc.).

By rrggrr 9 years ago
Regulations means you force your standards on other people who might not need or want them.
If you want other infrastructure companies or decentralized internet, you are free to do that yourself via voluntary means.

By witty_username 9 years ago
It's because centralization is really, really convenient when it works correctly. And most of the time, it does. It's just when it breaks it's really, really bad.

By vocatus_gate 9 years ago
The fact that we even have "key" services is absurd.

By nialv7 9 years ago
Thanks for sharing. I overheard someone on my team say that a production user is having problems with our service. The team checked AWS status, but only took notice of the green checkmarks.
Through some dumb luck (and desire to procrastinate a bit), I opened HN and, subsequently, the AWS status page and actually read the US-EAST-1 notification.
HN saves the day.

By mabramo 9 years ago
On thing I learned here. When something seems horribly wrong, check HN first, it may be "global" problem.

By foxylion 9 years ago
Same here, was eating lunch and browsing HN when hipchat started lighting up with customer complaints.

By davewritescode 9 years ago
Wow, S3 is a much bigger single point of failure than I have imagined. Travis CI, Trello, Docker Hub, ... I can't even install packages because the binary cache of NixOS is down. Love living in the cloud.

By rnhmjoj 9 years ago
Twilio call recordings appear to be missing as well.

By stri8ed 9 years ago
Trello API is up, so you can use any Trello client except the web client.

By krallja 9 years ago
Notice how Amazon.com itself is unaffected. They're a lot smarter than us.

By benwilber0 9 years ago
Browsing works but some functionality is broken; for example you can't view your order history.

By officelineback 9 years ago
I do recall reading somewhere that Amazon.com isn't actually hosted or fully leveraging on the AWS platform, mostly due to the political struggle between the AWS and the merchant department.

By yichi 9 years ago
Sales Department: "We require 100% uptime! Can you do that?"
AWS Department: "Wellll, if we don't change the status to red, it's as if we were up all the time!"

By Raphmedia 9 years ago
There are public talks on Youtube from Amazon.com titled "Drinking our own Champagne" where they say the opposite.

By p0rkbelly 9 years ago
Yeah, that was the original pitch for AWS. Engineering presentations since the initial launch have included "yeah... not quite" admissions though.

By krakensden 9 years ago
The ability is there for any company to take advantage of multiple regions. Takes time and money, but it's doable.

By bpicolo 9 years ago
And they've just broken four-9's uptime (53 minutes). They must be pretty busy, since they still haven't bothered to acknowledge a problem publicly...

By bandrami 9 years ago
Best thing about incidents like these: post-mortems for systems of this scale are absolutely fascinating. Hopefully they publish one.

By obeattie 9 years ago
Have they even acknowledged the mortem..?

By magic_beans 9 years ago
Everything is green. There are no tanks in Baghdad center.

By malchow 9 years ago
Yes.
"We've identified the issue as high error rates with S3 in US-EAST-1, which is also impacting applications and services dependent on S3. We are actively working on remediating the issue."

By obeattie 9 years ago
> We are actively working on remediating the issue.
I do love corporate-speak.
It's a rapidly oxidising waste receptacle (rather than a dumpster fire).

By noir_lord 9 years ago
Agreed. AWS' postmortems are fascinating.

By officelineback 9 years ago
This seems like an appropriate time as any... Anyone want to list some competitors to S3? Bonus if it also provides a way to host a static website.

By AndyKelley 9 years ago
Backblaze's B2 is an S3 near-clone: https://www.backblaze.com/b2/cloud-storage.html
I don't know about reliability, but it's a fraction of the price of S3.

By danohu 9 years ago
As I've said previously: I love the Backblaze folks, but B2 is apparently hosted in a single building. So that's not a question of if, but when.
Disclosure: I work on Google Cloud.

By boulos 9 years ago
In theory you are totally right but looking at the AWS it seams that the likelihood of issues due to ops complexity could be higher than the risk of a simple but single site service going down.

By qaq 9 years ago
Google Cloud Storage comes to mind. From what I recall they can host static websites as well.

By alfredxing 9 years ago
Yes, we can. You can host static content in GCS in the same way that you would using S3, or you can Google Cloud Load Balancer to do more complex setups, such as mixing static GCS content and compute URLs on the same domain.
If you're interested: https://cloud.google.com/solutions/web-serving-overview#host... https://cloud.google.com/compute/docs/load-balancing/http/us...
Disclaimer: I work on Google Cloud Storage.

By BrandonY 9 years ago
Load Balancing looks very interesting. Hopefully that'll be out of Beta sometime soon?

By PuffinBlue 9 years ago
Google Cloud Storage or its Firebase hosting version.

By edgartaor 9 years ago
An open source project minio.io implements Amazon S3 APIs, and allows you to create your own in-house S3 like infra.

By technofide 9 years ago
If its static site, put it anywhere and just cache it with Cloudflare. (turn on the always on feature)

By ruchit47 9 years ago
You also get your ssl sessions backed up on global webcaches as an added feature.

By dokument 9 years ago
What does it mean?

By angry-hacker 9 years ago
He's referring to CloudFlare's newest feature, CloudBleed: https://en.wikipedia.org/wiki/Cloudbleed

By ArlenBales 9 years ago
Time to move your stuff away from Crimeflare folks

By remx 9 years ago
To do this you'd create a 'Page Rule' that specifically tells Cloudflare to 'cache everything' for a URL pattern like https://examplesite.tld/* .
You can set a cache invalidation time too.
Always online is a slightly different feature I believe.

By PuffinBlue 9 years ago
IBM Bluemix offers file storage[NFS], block storage[iSCSI], and cloud storage [S3 API, Swift API].
we also offer a large number of boilerplates such as Flask, ASP,net, node.js
we have been making lots of changes lately check us out!
static page deploy guide: https://www.ibm.com/blogs/bluemix/2014/08/deploying-static-w...
https://console.ng.bluemix.net

By solotronics 9 years ago
And Swift.

By Entangled 9 years ago
GitLab Pages [0] is pretty good for static websites IMO, you can use GitLab CI to build more complex sites as well [1].
Disclosure: I work for GitLab
[0]: pages.gitlab.io [1]: https://about.gitlab.com/2016/12/07/building-a-new-gitlab-do...

By connorshea 9 years ago
Slightly related, FYI all of the package downloads for Gitlab are timing out with a 504 error from Cloudfront ("Cloudfront is having difficulty accessing S3". The registry is also showing an error.
I'm trying to downgrade to an older version because our install is not working but can't get the DEB unfortunately.

By trashcan 9 years ago
Looks like it's back up now, sorry about that :(

By connorshea 9 years ago
GitLab really needs to provide a way to force HTTPS for domains that have it set up. I have to manually link with `https://` everywhere.

By sotojuan 9 years ago
Yeah, we've wanted that ourselves. There's an issue for it, but it's not scheduled yet. If you know Go, feel free to take a crack at it :)

By connorshea 9 years ago
I would say use CloudFlare, but...

By jdormit 9 years ago
Use cloudron.io and install Ghost/Wordpress/Write your own. Just run the server wherever you want.

By newsat13 9 years ago
Interesting that they still host the cloudron.io site on Amazon platform
ping cloudron.io -> 54.192.7.94 -> server-54-192-7-94.dfw3.r.cloudfront.net (Amazon Technologies) [1]
[1] http://whatismyipaddress.com/ip/54.192.7.94

By puddintane 9 years ago
Correct me if I am wrong but you are pointing out that their website is on a CDN. Are you saying they need to host their website on Wordpress/Ghost? AFAIK, Ghost & Wordpress do not easily scale compared to a CDN if your site has heavy traffic (which landing pages must be engineered for as opposed to small blogs).

By newsat13 9 years ago
Dream Objects https://www.dreamhost.com/cloud/storage/

By pas 9 years ago
Netlify is great for hosting static sites. Free for non-commercial sites.

By kylemathews 9 years ago
Netlify's status page and pre-rendering service are down due to the S3 outage.
https://twitter.com/netlifystatus/status/836643259053023232

By frakkingcylons 9 years ago
Disclaimer: I work for netlify and posted that tweet.
Yup, they took those portions of our service down, but we now have redundant status page hosting setups and prerendering that is not tied to S3 (the latter is the only part of our service that was affected, and it was fixed within an hour of the outage)

By _fool 9 years ago
Any other region of AWS would also have worked around this one.

By ec109685 9 years ago
Rackspace's Cloudfiles. Does support static websites.

By josephlord 9 years ago
I use both RS Cloud Files and Google's Cloud Storage. Google's is superior in nearly every way.
The only con is that it is a Google product that could be deprecated at any point in time. But, with all the acquisition stuff happening over at RS, I'd be lying if I said I wasn't worried about them killing of their cloud offering.

By leesalminen 9 years ago
Two things:
1) Google Cloud Storage can host static websites:
https://cloud.google.com/storage/docs/hosting-static-website
2) Google Cloud Platform has a 1 year deprecation policy, which would never happen with a product that so many companies and customer rely on (Google Reader had a small but passionate base)
Disclaimer: I work on Google Cloud Platform

By deesix 9 years ago
To clarify on #2, are you saying that Google Cloud Platform in its entirety has a 1 year deprecation policy? Or that individual products within the platform have a 1 year deprecation policy? I'm not worried about Google deprecating the Storage service but that they could kill off the entire platform.
Also just wanted to say that I've been extremely happy with GCP thus far and all the services I've tried thus far have more features than RS. I really hope GCP is here for the long haul.

By leesalminen 9 years ago
Sorry, for any product on GCP, there is a 1 year deprecation policy. GCP isn't going anywhere. See the comment about Snap and Diane Greene's involvement (she is an Alphabet board member).
Disclaimer: I work on Google Cloud Platform

By deesix 9 years ago
For what it's worth, a fairly large 5 year contract between Google Cloud and Snap Inc was recently made public.

By vgt 9 years ago
I worry about everything Google hosts because they have such a track record of just randomly axing products with no or little warning.

By vocatus_gate 9 years ago
https://twitter.com/aws_shd/status/836635812020158464
"Increased API Error Rates - 9:52 AM PST We are investigating increased error rates in the US-EAST-1" "S3 operational issue - us-east-1"

By 140am 9 years ago
Yeah and still a green check mark :D

By JepZ 9 years ago
Would love to know the threshold for "Increased" -_-

By FLGMwt 9 years ago
"more than one"

By SteveNuts 9 years ago
Started a list of "things to do when S3 is down."
https://justinjackson.ca/s3/
What else should I add?

By mijustin 9 years ago
Have a philosophical debate with yourself (just within your mind) as to whether all this interweb/webternet stuff is a worthwhile pursuit...knowing that it would not survive should a comet hit the earth again...or, maybe it will? ;-)

By mxuribe 9 years ago
Yes. Go for a walk and interact with people in the community. You might meet someone interesting and learn something as well!

By vinayan3 9 years ago
My suggestion would be to rediscover the clarity and focus of thinking about systems and code on paper.

By contingencies 9 years ago
Migrate all your stuff to Cloudfront.
S3 is not a CDN!

By remx 9 years ago
Or instead of Cloudfront, you could migrate to a service provider that provides meaningful status pages.

By hobofan 9 years ago
I'm going to go do my laundry.

By beanland 9 years ago
It's winter - hit the slopes!

By coreywstone 9 years ago
I'm loving #16 right now...

By vacri 9 years ago
What kills me is that their status page still shows nothing is wrong.
https://status.aws.amazon.com/

By ethanpil 9 years ago
Their status page probably can't refresh because S3 is down.

By jeffijoe 9 years ago
correct.

By amazon_throw 9 years ago
Apple's iCloud is having issues too, probably stemming from AWS. Ironically Apple's status page has been updated to reflect the issue while Amazon's page still shows all green. https://www.apple.com/support/systemstatus/

By valine 9 years ago
I can't stream music from my iCloud library.

By brandon272 9 years ago
Wow this is a fun one. I almost pooped my pants when I saw all of our elastic beanstalk architecture disappear. It's so relieving to see it's not our fault and the internet feels our pain. We're in this together boys!
I'm curious how much $ this will lose today for the economy. :)

By dfischer 9 years ago
[deleted]

By 9 years ago
Was just pentesting it, and have some minor result. If you are using S3 browser uploads, make sure parameters you supply to Presign do not contain \n or it can lead to format injection https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthenti...
Many aws SDK libs don't remove \n for you.
(I hope it wasn't me who broke it lol)

By homakov 9 years ago
"Was just pentesting it" ... hopefully with their permission. Be careful.

By buildbuildbuild 9 years ago
It wasnt heavy pentesting, just some params jungling. No way it could cause anything :) still funny coincidence

By homakov 9 years ago
Incredible how much stuff this affected for me. Opbeat is not loading and I can't even deploy because CircleCI seems to depend on S3 for something and my build is "Queued". This seems so dangerous...

By rawrmaan 9 years ago
Hi, Beni from Opbeat here. Our community site is indeed down unfortunately, but the rest should be unaffected. Where do/did you experience issues?

By piquadrat 9 years ago
Hi Beni! My dashboard wasn't loading (the JS assets from cloudfront.net seemed to be throwing an access-control-origin error) but it is now working again. Very slow, but it works. Fortunately nothing critical is happening right now so no worries. Love your service :)

By rawrmaan 9 years ago
Ah right, static assets are an issue, didn't notice it right away due to local browser caching. Sorry about the trouble!

By piquadrat 9 years ago
circleci.com won't even load for me.

By vidarh 9 years ago
It is, of course the checkmark will stay green throughout this as Amazon doesn't care about actually letting its customers know they have a problem.

By c4urself 9 years ago
Now might be a good time to ponder a lasting solution. Clearly, we cannot trust AWS, or any other single provider, to stay up. What is the shortest, quickest to implement, path to actual high availability?
You would have to host your own software which can also fail, but then at least you could do something about it. For example, you could avoid changing things during critical times of your own business (e.g. a tradeshow), which is something no standard provider could do. You could also dial down consistency for the sake of availability, e.g. keep a lot of copies around even if some of them are often stale - more often than not this would work well enough for images.

By DenisM 9 years ago
High availability is improved by hosting in multiple AWS regions.
S3 offers alternative region replication functionality and you can use Cloudfront of another CDN to load balance between buckets

By alexbilbie 9 years ago
Yup, all of our fail overs worked flawlessly. You wouldn't even be able to tell if you didn't know.

By Cshelton 9 years ago
But do you always serve files from S3? Wasn't the entire S3 down today? Or was it just some regions? I couldn't even connect to s3.amazonaws.com ...

By DenisM 9 years ago
Only one region, us-east.

By bendbro 9 years ago
How can you use Cloudfront to point to multiple buckets? I'm pretty sure each origin needs a path and Cloudfront serves the first path that matches.

By temuze 9 years ago
[deleted]

By 9 years ago
I use a master and four mirror slaves. Then use round robin for reads. I used to have three slaves, but when two of them was down at the same time I got nervous and set up one more.

By z3t4 9 years ago
How does that work with S3? Your master and slaves are presumably just SQL databases, right? Or?

By DenisM 9 years ago
vps via different providers. think RAID but with VPS.

By z3t4 9 years ago
Quickest path? Use AWS (or similar). Outages happen. The more robust you want to be against outage, the more prep work you have to do. In order to have 'no outage', you have to do an incredible amount of work. There is no quick path.

By vacri 9 years ago
That sound you hear is every legacy hosting company firing up its marketing machine

By bandrami 9 years ago
I've always wondered why people dismiss dedicated hosting without a second thought. It's actually cheaper than AWS if you factor in all of the performance you get.

By yichi 9 years ago
Post about S3 not being a CDN hosted on an S3-powered blog:
https://jdorfman.posthaven.com/medium-bitcoin-660x493-dot-jp...
The irony

By remx 9 years ago
So S3's been down for at least 3 hours. Does AWS break this year's S3 durability & reliability promise of eleven 9s by now? [1][2]
[1]: https://aws.amazon.com/s3/details/
[2]: https://en.wikipedia.org/wiki/High_availability#Percentage_c...

By devy 9 years ago
That's durability (data loss) not availability.
Here's [1] their official SLA. This outage so far brings them to less than 3 nines of uptime this month (43.8 minutes) but still more than 2 nines (7.2 hours) so it sounds like everyone gets 10% off their S3 bill.
Very curious if Amazon will apply this automatically or only if you complain.
Edit: from further down the same page, it looks like only if you write in to support do you get these broken SLA credits. Kind of lame since everything else about their billing is so precise and automatic.
[1] https://aws.amazon.com/s3/sla/

By dcosson 9 years ago
I have gotten several credits from AWS for under a dollar due to incorrect billing calculations on their end that they caught, notified me about and sent a credit for. So I will assume yes, they do this automatically (if they don't I'd be quite surprised).

By kseifried 9 years ago
11 9's is durability not availability.

By noir_lord 9 years ago
I don't think this has anything to do with durability & reliability (they probably haven't lost data). It's about availability.

By kccqzy 9 years ago
But wait. Isn't S3 "the cloud". Everyone promised the cloud would never go down, ever. It has infinite uptime and reliability.
Well good thing I have my backups on [some service that happens to also use S3 as a backend].

By caravel 9 years ago
I know your comment is in jest, but Amazon do say their API SLA for S3 is 99.99% available [1]
[1] https://aws.amazon.com/s3/sla/

By djhworld 9 years ago
After more than an hour we are now at ~99.8%, so everybody affected should be able to claim a 10% discount, right?

By hobofan 9 years ago
Which is less than 5 minutes per month, I guess they've already used that :).

By M4v3R 9 years ago
> Everyone promised the cloud would never go down, ever.
No, they didn't. Large portions of AWS's documentation details how you, the developer, are responsible for using their tools to engineer a fault-tolerant, highly available system. Everything goes down. AWS promises varying amounts of nines everywhere, not 100%.

By ceejayoz 9 years ago
> Isn't S3 "the cloud". Everyone promised the cloud would never go down, ever.
S3 is not the cloud, it's one system running in the cloud. The cloud is not down, S3 and services dependent on (and possibly related to) it are.
One of the selling points of the cloud is that dynamically provisioned services from multiple providers enable engineering fault tolerant systems that are relatively secure against the failure of any single backend. But, yeah, if you are dependent on one infrastructure vendor's service -- particularly running in one particular region/zone -- you are probably better off than running on a single server for reliability against failures, but you aren't anywhere close to immune to failures. I don't think even cloud vendors have been particularly reluctant to make that point.

By dragonwriter 9 years ago
We used to have a word for "the cloud" back in my day. It was called "outsourcing."

By vocatus_gate 9 years ago
Who told you that? I'm not sure anyone ever told you that.

By draw_down 9 years ago
Not sure if its related or not (I'll just assume it is), but dockerhub is down as well. Haven't been able to push or pull for the last 15 minutes, some other folks complaining of the same thing.

By agotterer 9 years ago
Yea my self hosted concourse is down. Feels quite bizarre to have our internal CI so crippled by S3... though i'm still investigating, perhaps the workers are just hung after all of the failed docker pulls.

By notheguyouthink 9 years ago
Yeah, bad morning. I started by trying to pull a Docker image. When I couldn't, I tried to push some stuff to S3. Now of course I'm checking HN. :p

By jchmbrln 9 years ago
Can't pull images from either Dockerhub or Quay right now

By serialpreneur 9 years ago
confirmed: dockerhub is down too

By eggie5 9 years ago
Hi all. I came across this forum on Google. I have the same error - and it's all a bit beyond me. I'm not a techie or coder but set up Amazon S3 several months ago to backup my websites and it generally works fine - and has saved my bacon on a couple of occasions. (Also back up in Google Drive.)
As someone who's really only a yellow belt (assuming you're all black belts!), just so I understand ('cos I'm cacking myself!) ...
I'm seeing the same issue. Does this mean there's a problem with Amazon? I can't access either of my S3 accounts even if I change the region, and I'm concerned it may be something I've done wrong, and deleted the whole lot. It was working yesterday!!!
Would be massively grateful for a heads up. Thanks in advance.

By robineyre 9 years ago
Yes this is an issue with Amazon and there is little you can do besides wait until it has been resolved

By z4chj 9 years ago
> Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.
"Believe" is not inspiring.

By flavor8 9 years ago
From https://status.aws.amazon.com/: "Update at 12:52 AM PST: We are seeing recovery for S3 object retrievals, listing and deletions. We continue to work on recovery for adding new objects to S3 and expect to start seeing improved error rates within the hour."
(I think the AM means PM)

By samaysharma 9 years ago
And now:
> Update at 1:12 PM PST: S3 object retrieval, listing and deletion are fully recovered now. We are still working to recover normal operations for adding new objects to S3.

By boulos 9 years ago
It looks like the S3 outage is spreading to other systems or the root cause of the S3 problem is affecting different services. There are at least 20 services listed now. [1]
[1]: http://status.aws.amazon.com/

By redm 9 years ago
Yup it looks so. My console says I have zero buckets, my Lambdas are timing out and https://aws.amazon.com/ returns a big:
"500 The server encountered an error processing your request." message

By talawahdotnet 9 years ago
Our lambda functions are also unavailable. Lucky for us we didn't move any of our critical functionality to lambda yet although we are planning to once we have an EC2 backup in place...

By alanning 9 years ago
Don't know if share911 is still your product (its about page is down), but you could run your critical services on top of something like PagerDuty to give you the reliability you need.

By ec109685 9 years ago
Sometimes refreshing the console gives this error instead of showing ZERO buckets https://pbs.twimg.com/media/C5xZVGKUYAAXYGj.jpg:large

By gaia 9 years ago
Apologizes for the "me too" post:
It appears to be impacting gotomeeting, I get this error when trying to start a 12pm meeting here:
CloudFront is currently experiencing problems with requesting objects from Amazon S3.
Edit: ironically, my missed 12pm meeting was an Azure training session.

By vpeters25 9 years ago
Years ago when we launched our product i decided to use the US-WEST-2 region as our primary region and to build fail over to US-EAST-1 (Anyone here remember the outage of 2011? Yeah, that was why).
There is something to be said about not being located in the region where everything gets launched first, and where most the customers are not [imo all the benefits of the product, processes and people, but less risk].
Good luck to everyone impacted by this...crappy day.

By verelo 9 years ago
Status Pages (Services & Products affected by S3 outage)
- https://status.heroku.com/
- https://status.aws.amazon.com/
- https://medium.statuspage.io/
- https://status.slack.com/
- http://status.filestack.com/
- http://www.trellostatus.com/
- https://health.autodesk.com/
- http://status.ifttt.com/
- http://status.imgur.com/
- http://status.docker.com/

By jedicoder107 9 years ago
Pretty good list of other affected sites/services in this article: http://venturebeat.com/2017/02/28/aws-is-investigating-s3-is...
Some big names and services popular with HN mentioned there. Quora, AirBnb, SendGrid, Downdetector(heh).

By tyingq 9 years ago
<% if(service.isUp || true) { renderGreenButton() } %>

By malchow 9 years ago
Amazon outage just reported on NBC News.[1]
AMZN stock down $3.45 (0.41%).
[1] http://www.nbcchicago.com/news/national-international/Amazon...

By Animats 9 years ago
https://twitter.com/Schmidt_RB/status/836641520321179648
"I know I'm piling on here, but Amazon's stock price is a better uptime indicator than their status page. #AWS #S3 #awscloud"

By johansch 9 years ago
Just reported on USA Today.[1]
[1] http://www.usatoday.com/story/tech/news/2017/02/28/amazons-c...

By Animats 9 years ago
Fox News now has the story.[1]
[1] http://www.fox5ny.com/news/238689310-story

By Animats 9 years ago
London Daily Express [1], CBS[2] now reporting the outage.
[1] http://www.express.co.uk/life-style/science-technology/77332... [2] http://losangeles.cbslocal.com/2017/02/28/amazon-web-service...

By Animats 9 years ago
Reuters, Associated Press, and The Hill now reporting the outage.
"http://www.isitdownrightnow.com/" and DownDetector are down.

By Animats 9 years ago
> AMZN stock down
YES! Buy on rumor, sell on fact as the saying goes.

By gist 9 years ago
It's been down that level from before this started happening. Surprised this outage hasn't moved it yet.

By BWStearns 9 years ago
Target's stock price absolutely tanked today. I bet that, while the s3 outage will have a negative effect on AMZN, Target reporting way under estimates is probably creating a positive effect that evens it out.

By CephalopodMD 9 years ago
That makes sense. Hadn't heard about the Target news.

By BWStearns 9 years ago
This is why it's important to write code that doesn't depend on only a single service provider. S3 is great. But it's better to set up a Riak cluster on AWS than to actually use S3, if you can.
The only services my team uses directly are EC2 and RDS, and I'm thinking of moving RDS over to EC2 instances.
We are entirely portable. We can move my entire team's infrastructure to a different cloud host really quickly. Our only dependency is a Debian box.
I flipped the switch today and cloned our prod environment, including VPN and security rules, over to a commodity hosting provider.
Change the DNS entry for the services, and we were good to go. We didn't need to do anything because everyone was freaking out about everything else being down. But our internal services were close to unaffected.
At least for my team.
Obviously, we aren't Trello or some of the other big people affected. And we don't have the same needs they do. But setting up the DevOps stuff for my team in the way that I think was correct to begin with (no dependencies other than a Debian box) really shined today. Having a clear and correct deployment strategy on any available hardware platform really worked for us.
Or at least it would have if people weren't so upset about all our other external services being down that they paid no attention to internal services.
Lock-in is bad, mmkay?
If your company is the right size, and it makes sense, do the extra work. It's not that hard to write agnostic scripts that deploy your software, create your database, and build your data from a backup. This can be a big deal when some providers are flipping out.
All-your-junk-in-one-place is really overrated, in my opinion. Be able to rebuild your code and your data at any given point in time. If you don't have that, I don't really know what you have.

By ianamartin 9 years ago
Not knowing your situation exactly, but there could be a cost of running your own infrastructure and not taking advantage of their services? For example, are the chances of losing data higher in riak (take into account disaster and operational bugs that could result in data loss or availability issues) than in one of amazon's supported data stores.
I don't necessarily disagree with what you are saying but there is cost of doing everything yourself.
You would have been equally protected if you had been in more than one region.

By ec109685 9 years ago
Yes. There's a cost to every decision you make. Sometimes it's the cost of s3 being down. Sometimes it's the cost of some developer time to make services agnostic. The value of that isn't immediately obvious, perhaps.
But the developer cost here (my time) was worth it. Our shit wasn't down, while everyone else's was.
I also want to point out that I spent minimal time setting this up. We can deploy to GCE or commodity VPCs at a moment's notice, and that a project I did over a couple of weekends piggybacking on the ansible playbooks I wrote for AWS.
It's not that hard. You have to get your developers on board with being provider agnostic, and you have to be agnostic yourself. But it is not insurmountable.
It also help when you're the lead dev or your team and also have a good relationship with the devops guy. :)

By ianamartin 9 years ago
You still didn't address the reliability aspect. S3's durability is likely much higher than other solutions you might chose.

By ec109685 9 years ago
We're in US-West-2 and our ELBs are dropping 5XXs like there's no tomorrow. This is definitely cascading.

By vegasje 9 years ago
We're in US-West-2 and not seeing any issues. Are the instances behind your ELB trying to access S3 in their application logic?

By soccerdave 9 years ago
Nope. No S3 logic behind the scenes. A few of the ELBs are fine, and a few are not. Seems random.
The EC2 instances themselves are fine, but the affected ELBs are spitting out 500s.

By vegasje 9 years ago
All our systems are running just fine in us-west-2 right now.

By twistedpair 9 years ago
Canvas (the educational software platform) is down, and my friends/students are in bad shape now. 'sso.canvaslms.com' returns 504, assume from this S3 outage.

By huac 9 years ago
[deleted]

By 9 years ago
Anyone want to share their real experience with their reliability of Google Cloud Storage.

By etse 9 years ago
Down in US-East-1 as of 17:40 GMT. Amazon SES also down in US-East-1 as of a few minutes later.
Hearing reports of EBS down as well.

By scrollaway 9 years ago
EBS is down for 30% of my servers as well

By kureikain 9 years ago
I confirm for SES being down in US-East-1 :-(

By St-Clock 9 years ago
The status page shows a lot of yellow and red now.
From http://status.aws.amazon.com/ Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

By oshoma 9 years ago
You think this is bad? Just look at what's happening in Sweden...

By FussBudget86 9 years ago
Okay, it's been a few hours and this is starting to get ridiculous. When was the last time that we had a core infrastructure outage this major, that lasted for this long?

By Fej 9 years ago
It really is amazing how many web services are dependent on S3. For instance, the Heroku dashboard is currently down for me. Along with all of my services that are on Heroku.

By kevindong 9 years ago
Same here, but worse. Some of the apps I have hosted on Heroku (including APIs) are showing "Application Error". Like you, tried logging into dashboard and got a Heroku error page.

By TravelTechGuy 9 years ago
Turns out that not all Heroku dynos are hosted on US-East. One of my friend's dyno is still up and running great.

By kevindong 9 years ago
Same here :(. Not sure why serving a connection to my dyno depends on S3 being up...

By mcjiggerlog 9 years ago
I'll bet that what Heroku does is precompile all the code when you push to Heroku (or doesn't), saves it to a S3 bucket, and kills the running process after a certain amount of inactivity. Once it detects a pending network request, the code gets loaded from the S3 bucket into the EC2 instance, and then your code spins up.

By kevindong 9 years ago
For those using Heroku - any workarounds to at least put the app in maintenance mode? It seems their entire platform, including API, is down.

By natashabaker 9 years ago
Quora is down.

By olegkikin 9 years ago
[deleted]

By 9 years ago
I am having trouble sending attachments in the Signal app - seems unlikely, but could this be related?
[edit- looks like they do have a pretty heavy reliance on S3, per https://github.com/WhisperSystems/Signal-Server/blob/master/... and various other sources.]

By jpwgarrison 9 years ago
https://twitter.com/whispersystems/status/836651250842124288

By mayneack 9 years ago
Sendgrid, Twilio, Quora is also down. Is this related to S3. Entire world depends on AWS

By ganesharul 9 years ago
Twilio's outgoing SMS seems to be working fine for me.

By leesalminen 9 years ago
Sendgrid being down particularly hurts - we need it to notify our users of the problem

By yodon 9 years ago
That's quite ironic. http://isitdownrightnow.com is also down.

By jyriand 9 years ago
Thank you, HN, for giving me the answer the AWS Service Health Dashboard could not.

By booleandilemma 9 years ago
Yes. Have heard confirmation from Amazon that this outage is affecting us-east-1.

By BlackjackCF 9 years ago
Now at the top of Drudge http://drudgereport.com/

By leesalminen 9 years ago
I bet the outage is related to the new color coded CloudWatch metrics: https://twitter.com/awscloud/status/836630468778864640
As part of the release they wanted to make sure everybody gets a chance to see "red" metrics.

By koolba 9 years ago
'Increased Error Rates' is a bit harsh, couldn't they call it 'Sub-prime Success Rates'?

By mixedbit 9 years ago
My Atom keep crashing and the log says it can't resolve :
https://atom-installer.github.com/
is there a part of this hosted on S3? I cannot open Atom anymore, it keep crashing on the check for updates screen...

By Globz 9 years ago
Yup, same here. For a moment, I was worried that the UI showed 0 buckets. Gave me a heart attack.

By newsat13 9 years ago
"Inc." is quoting comments from here.[1]
[1] http://www.inc.com/sonya-mann/amazon-web-services-outage.htm...

By Animats 9 years ago
Google DNS 8.8.8.8 was (for the first time that I've noticed) spotty about 30 minutes ago. Something big is happening: http://map.norsecorp.com/#/

By l0c0b0x 9 years ago
AWS is updating twitter here. No red icons on the status page IS an AWS issue:
https://twitter.com/awscloud/status/836656664635846656

By dangle 9 years ago
We got timeouts to our bucket address from every location we tried starting at 10:37 Mountain time (GMT-7). Slack uploads started failing, imgur isn't working, and the landing page for the AWS console is showing a 500 error in the image flipper in the middle of the page. The Amazon status page has been all green, but there is a forum post about people having problems at https://forums.aws.amazon.com/thread.jspa?threadID=250319&ts...
In the last couple of minutes that forum post has gone from not existing to 175 views and 9 posts.

By linsomniac 9 years ago
Funny, status page is incorrect because of S3
https://twitter.com/awscloud/status/836656664635846656

By ayemeng 9 years ago
Updated:
Amazon Elastic Compute Cloud (N. Virginia) Increased Error Rates less 11:38 AM PST We can confirm increased error rates for the EC2 and EBS APIs and failures for launches of new EC2 instances in the US-EAST-1 Region. We are also experiencing degraded performance of some EBS Volumes in the Region.
Amazon Elastic Load Balancing (N. Virginia) Increased Error Rates more
Amazon Relational Database Service (N. Virginia) Increased Error Rates more
Amazon Simple Storage Service (US Standard) Increased Error Rates more
Auto Scaling (N. Virginia) Increased Error Rates more
AWS Lambda (N. Virginia) Increased Error Rates more

By rabidonrails 9 years ago
Where are you seeing these updates?

By amasad 9 years ago
https://status.aws.amazon.com/

By dlb_ 9 years ago
According to the personal health dashboard, they've root-caused the S3 outage and are working to restore.
In the meantime, EC2, ELB, RDS, Lambda, and autoscaling have all been confirmed to be experiencing issues.

By joatmon-snoo 9 years ago
Meanwhile engineers across the globe scramble to fix outages due to AWS s3, $AMZN is unaffected on the stock market. Just shows the disconnect between emotions and reality.
https://www.google.com/finance?chdnp=0&chdd=0&chds=1&chdv=0&...

By nodesocket 9 years ago
Why would this affect Amazon's stock? Amazon generates $136B per year. AWS comprises $8B of that. Even if 5% of customers left because of this (and that's never going to happen), it would barely create as much as a tiny ripple in their ocean of revenue. Investors care not about Amazon's cloud services.

By IAmGraydon 9 years ago
AWS's profit helps fund the rest of their business.

By ec109685 9 years ago
I was listening to sessions from AWS Re:invent last night. What jumped out at me was the claim of 11 9's for S3. How many of those 9's have they blown through with this outage?

By bdcravens 9 years ago
That's a durability target, not an availability SLA. Durability != Availability.

By ta_wh 9 years ago
That makes more sense (since I was listening to it in the context of a conference session, I'm not sure if I heard a distinctive term being used, though I'm sure they're careful about the language they use)

By bdcravens 9 years ago
That's for data retention, S3 only guarantees three 9's availability.

By wintermute-_- 9 years ago
http://venturebeat.com/2017/02/28/aws-is-investigating-s3-is...
https://www.theregister.co.uk/2017/02/28/aws_is_awol_as_s3_g...

By trakl 9 years ago
Experiencing issues with Elastic Beanstalk and Cloudfront as well.

By dyeje 9 years ago
I cannot eb init or deploy to us-east-1

By seibelj 9 years ago
I can't download purchased MP3's from amazon's own site, I get "We’re experiencing a problem with your music download. Please try downloading from Your Orders or contact us."
When I go to my orders I get "There's a problem displaying some of your orders right now. If you don't see the order you're looking for, try refreshing this page, or click "View order details" for that order."
It seems that Amazon is eating its own dog food.

By jasonl99 9 years ago
I just spent the last hour trying to figure out why in the hell I can't update the function code on a lambda instance. Next time I will remember to check HN first!

By splatcollision 9 years ago
Omg I wish I googled this earlier. Wasted hours debugging :(

By machinarium 9 years ago
Yep. I was wondering why my heroku deploys were hanging so I was looking into every possible issue on my end.
And then I see the news.

By SnowingXIV 9 years ago
There goes my Trello to do list. Now I'm lost. Oh well.

By spacecadets 9 years ago
My ELBS and EB related instances are also down. I can't even get to Elastic Beanstalk or Load Balancers in the web console. Anyone else having this issue?

By ryanmarr 9 years ago
Yes, we're experiencing this issue as well. Are your ELBs also in us-east-1?

By JBerryMedX 9 years ago
It doesn't look that bad, think about it S3 is such a critical part of almost any web application, it is treated like a realtime micro-service. So looks like most of the Internet in the U.S. is affected but nevertheless no one is dead yet and the world has not ended. So even if hypothetically let's say China attacked us using cyber-warfare it wouldn't be so bad after all... This was kind of like a test.

By soheil 9 years ago
I think this explains why the docker registry is down as well.
http://status.docker.com/

By khamoud 9 years ago
Still no RCA? I'd love to hear what the issue was for this. A couple of coworkers and I are betting that it was a networking issue of some sort.

By sc30317 9 years ago
Negative comment on all this in Forbes.[1] Too much centralization. CEOs read that.
[1] https://www.forbes.com/sites/ryanwhitwam/2017/02/28/amazon-s...

By Animats 9 years ago
New update:
"Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue."

By robxu9 9 years ago
https://twitter.com/xiaodown/status/836656364965371904
https://twitter.com/ArturMakly/status/836665379233628161

By artur_makly 9 years ago
Update[1]: AWS Status dashboard now showing icons other than green. https://status.aws.amazon.com/
[1] https://twitter.com/awscloud/status/836662601090134017

By fernandopj 9 years ago
One of my heroku apps is down, and I cant' log into the heroku dashboard to check it out. I'm guessing this is related.

By learc83 9 years ago
Me too. This couldn't have come at a worse time. Just launched a new site.

By robeastham 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
It is, they are downloading slugs from S3.

By jpmw 9 years ago
Yes it's down for me. I can't access files stored on S3. Also, the service I run is hung trying to store files on S3.

By vinayan3 9 years ago
My company's ELBs in us-east-1 are experiencing massive amounts of latency causing the instances to be marked unhealthy.

By JBerryMedX 9 years ago
FreshDesk makes extensive use of S3 and it's been unbearably slow to load for the past hour or so. All on S3 requests.

By leesalminen 9 years ago
Hate to ask, but does anybody now of an alternative storage solution? Also, anyone have any alternative to Heroku for now?

By rajangdavis 9 years ago
https://cloud.google.com/storage/

By moreisee 9 years ago
Google Cloud Storage (as mentioned) for storage, Google App Engine for PaaS. :)
(I work on Cloud, specifically Datastore.)

By wsh91 9 years ago
Thanks for the recommendation; is there a way to do back ups?
For example, I have images and various assets stored on S3; would there be a way to change the storage provider on the fly on a website?
The other case is could I have apps hosted on Heroku and set up a service to duplicate the app code and database over to Google for redundancy? This isn't super critical as the apps are not customer focused, but they generate content that is customer focused.

By rajangdavis 9 years ago
We're down too with www.paymoapp.com - pretty frustrated that the status page shows everything is up and running.

By janlukacs 9 years ago
This is truly serverless computing at work.

By poofyleek 9 years ago
After few requests timed out, started to dig a bit. The CNAME for a bucket endpoint was pointing to s3-1-w.amazonaws.com with a TTL of at least an other 5600 secods. Doing a full trace was giving back a new s3-3-w.amazonaws.com The IP related to s3-1-w was/is timing out, all cool instead for the s3-3-w.

By xtus 9 years ago
"We’re continuing to work to remediate the availability issues for Amazon S3 in US-EAST-1. AWS services and customer applications depending on S3 will continue to experience high error rates as we are actively working to remediate the errors in Amazon S3." Last Update 1:54pmEST
It shows up in the event log now too.

By knaik94 9 years ago
I'm running into timeouts trying to download elixir packages, and I'm willing to bet this is the cause

By samgranieri 9 years ago
Only limited impact to Aiven services due to service migration capability http://help.aiven.io/announcements/aiven-customer-notice-aws...

By melor 9 years ago
Per AWS :
For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.
https://twitter.com/awscloud/status/836666548311859200

By mmansoor78 9 years ago
Amazon: The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.
https://twitter.com/awscloud/status/836656664635846656

By axg 9 years ago
Same here. I can log in to the new S3 console UI, but all of my buckets/resources are missing. Same error as you in the old UI. Also unable to connect through the AWS CLI (says, "An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied"). Fun.

By socialentp 9 years ago
Just in case if anybody needs :) https://www.microsoft.com/developerblog/real-life-code/2016/...

By gopalakrishnans 9 years ago
Uh-oh. Same here... and tried taking a screenshot of pinging s3.amazonaws.com and Slack upload hung.

By willcodeforfoo 9 years ago
Sorry, my simplistic mind is only thinking this right now:
http://alessandrobender.com.br/wp-content/uploads/2015/07/fi...

By devenrl 9 years ago
all of your jokes about the dashboard not turning red b/c the icon is hosted on US EAST are true:
Amazon Web Services‏Verified account @awscloud 8m8 minutes ago More The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.

By eggie5 9 years ago
AWS is claiming that Simple Storage (US Standard) is starting to come back up as of 12:54 PM PST.

By Animats 9 years ago
Where is that "Show HN" that will let me check if a site is affected by an S3 outtage?

By adamveld12 9 years ago
Their status page images are hosted on S3, so will be a while for the green checkmarks to update

By pfela 9 years ago
Look like the dashboard has been updated to no longer use S3: AWS is having a major meltdown right now
http://status.aws.amazon.com/#ecr-us-east-1_1488312155

By cdnsteve 9 years ago
"We have now repaired the ability to update the service health dashboard. " - full of yellow red icons now indeed https://status.aws.amazon.com/

By tudorconstantin 9 years ago
The AWS status page is still showing all green but how has a header saying they are investigating increased error rates. https://status.aws.amazon.com/

By linsomniac 9 years ago
It appears Docker Hub is hosted on S3 as well, none of the official images can be pulled.

By tzaman 9 years ago
I have in the middle of thoughts of moving out of AWS and having a dedicated provider as our billing has increased a lot with the scale. The only thing which was holding me was the uptime confidence. Now I feel it's not a bad idea.

By ruchit47 9 years ago
I get this in my aws console.
Increased API Error Rates
09:52 AM PST We are investigating increased error rates in the US-EAST-1 Region.
Event data Event S3 operational issue Status Open Region/AZ us-east-1 Start time February 28, 2017 at 6:51:57 PM UTC+1 End time - Event category Issue

By jontro 9 years ago
Our static site hosted on eu-central-1 is still up: http://www.creativepragmatics.com.s3-website.eu-central-1.am...

By manmal 9 years ago
Based on reports from the field, it looks like S3 was down for about three hours for most of their customers.
S3 promises four nines of availability (11 nines of durability), so today we got about 3-4 years worth of downtime in one fell swoop. Oops.

By metafunctor 9 years ago
Where do you see four nines for S3 SLA?
https://aws.amazon.com/s3/sla/ shows 99.9%

By cowkingdeluxe 9 years ago
It was from memory. I distinctly remember it being advertised as "four nines". Perhaps they've adjusted their marketing. The S3 FAQ still says [1]: "S3 Standard is designed for 99.99% availability".
[1]: https://aws.amazon.com/s3/faqs/

By metafunctor 9 years ago
durability is different from availability

By machbio 9 years ago
Which is why he made the distinction between the two.

By IAmGraydon 9 years ago
It is 3 nines [1] for availability and .999*365/12/24=1.266 hours per month so it is actually 3-4 times their monthly SLA depending on your get/put requests (gets came up sooner than puts but that is not factored into the SLA AFAIK).
[1]: https://aws.amazon.com/s3/sla/

By mrep 9 years ago
We are starting to see recoveries, our SES emails have mostly gone out and our data synchronization has updated 2 of our 3 feeds. Amazon has posted a message that they expect "improved error rates" in the next 45 minutes.

By linsomniac 9 years ago
https://twitter.com/awscloud/status/836630468778864640
At least now we can see all the network failures in full RGB.

By nlightcho 9 years ago
I'm getting this using s3cmd:
$ s3cmd ls WARNING: Retrying failed request: / ([Errno 60] Operation timed out) WARNING: Waiting 3 sec... WARNING: Retrying failed request: / ([Errno 60] Operation timed out) WARNING: Waiting 6 sec...

By tbeutel 9 years ago
Yes it is down
https://status.aws.amazon.com/
Half internet is down the data center in Virginia the one with the cloud is totally dead apparently. Enjoy the cloud bullshit :)

By francesco1975 9 years ago
I wrote a quick post discussing this outage. I figured I should share here https://blog.containership.io/aws-got-you-down

By phildougherty 9 years ago
Does anyone have trouble with the Cloud Console? The JS assets for the CloudFront dashboards seem broken, so unfortunately it’s not possible to change the behaviours of the Distributions (e.g. to point them to another bucket)

By jotaen 9 years ago
So much has broken thanks to this. Web apps, slack uploads, parts of Freshdesk etc. I don't love you right now AWS.
https://status.aws.amazon.com/

By mmaunder 9 years ago
Great, all my billing services on Heroku are turned off. Why do they need S3 access for me to access my web dynos?
I'd rather my app load but appear broken so I can show my own status rather than just shutting down every single app...

By Exuma 9 years ago
Same here. S3 as a point of failure makes zero sense for dyno uptime, I'm very frustrated with Heroku. (the dyno was already running, no need to download a new slug in my opinion)

By buildbuildbuild 9 years ago
Is anybody else having trouble loading http://platform.twitter.com/widgets.js? It is probably hosted on S3 I assume

By rrecuero 9 years ago
Fine for me.

By ladybro 9 years ago
Thanks. It's back up now :)

By rrecuero 9 years ago
I never understood why so many devs flocked to AWS. I actually find their abstraction of services gets in the way and slows down my dev instead of making it easier like so many devs claim it does. I prefer Linode.

By fjabre 9 years ago
It lets them pretend that ops isn't a skillset you need people to specialize in. Just throw more devs and servers at the problem instead of building a good infrastructure.

By Sanddancer 9 years ago
One of a really rare times when it's good to be in Europe (s3 works here).

By samat 9 years ago
Interestingly, I placed an order on amazon.com and while the order appears when I look at my account, none of the usual automated emails have come. I wonder how deeply this is effecting their retail customers.

By tjpaudio 9 years ago
[deleted]

By 9 years ago
Down from the outside; The internal access (from within EC2) APIs still work.

By pmalynin 9 years ago
https://twitter.com/sadserver/status/818937064552951809
Interesting tweet from last month.

By awsoutage 9 years ago
All of our S3 assets are unavailable. Cloudfront is accessible but returning a 504 status with the message: "CloudFront is currently experiencing problems with requesting objects from Amazon S3."

By jhaile 9 years ago
Our IBM cloud- Softlayer provides secure and stable cloud environment with private network, for baremetal,dedicated, private and public cloud. Leave a comment if you want to learn more. also HIPAA ready.

By ibmcloud 9 years ago
Here we go again:
Technology leads to technology (and wealth) monopolies, in other words: more centralization. Which has always been bad.
Just like with Cloudflare leaking highly sensitive data all over the Internet, a couple of days ago.

By benevol 9 years ago
Wouldn't go that far. It's always been the case in the cloud that if you're not region/provider replicated, you're susceptible to localized outages.

By ta_wh 9 years ago
but the sentiment is valid imo (albeit arguably benign)...

By ___start 9 years ago
Yeah, we host on S3 (US-East-1 I think) with Cloudfront for caching / SSL. Some of our requests get through but it's been intermittent. Lots of 504 Gateway Time-Outs when retrieving CSS, JS.

By andrewfong 9 years ago
[deleted]

By 9 years ago
Totally fucked.

By meddlepal 9 years ago
I think there was some fontawesome loading issues related to this, I also noticed a site trying to load twitter messages but couldn't Get the JavaScript loaded during that time today.

By cdevs 9 years ago
Update: AWS dashboard has been fixed and is now showing outages https://status.aws.amazon.com/

By mpetrovich 9 years ago
My EB instances and Load Balancers are also down. I can't even get to load balancers in ec2 web console or to elastic beanstalk in web console. It's been almost an hour now.

By ryanmarr 9 years ago
As of 4:30PM Pacific, we're still having trouble with EC2 autoscaling API operations in US-East-1. Basically very long delays in launching new instances or terminating old ones.

By all_usernames 9 years ago
We have a red error, finally!
Source: https://status.aws.amazon.com/
After two hours, they have finally updated their dashboard.

By AtheistOfFail 9 years ago
Same here, i mistakingly went to the dashboard first too. Silly me.

By notheguyouthink 9 years ago
S3 is down? Official Twitter feed is also "unaware" https://twitter.com/awscloud

By krlkv 9 years ago
Looks like our dashboard is still sustaining it https://acedashboard.cbp.dhs.gov/

By netvisao 9 years ago
It's fixed... I mean the status page https://status.aws.amazon.com/

By ttttytjj 9 years ago
DockerHub is down as well. DockerHub was down in Oct 2015 because S3 was down in US-EAST. They should have known to cache images in multiple S3 regions since then.

By garindra 9 years ago
Can anyone comment on mitigating issues like this with S3 Cross-region replication? I'm reading up on it now while one of my services is dead in the water.

By zedpm 9 years ago
I did some digging and experimentation, and so far it looks like you could keep a backup bucket in another region and use bidirectional replication [0] to keep the two buckets in sync. If something like this happened again, you could point your app(s) at the bucket in another region and keep accepting data. The objects would eventually get replicated back to the original bucket, and you could cut over again when service was restored. There does seem to be an appreciable replication lag, so you could run into problems during your cut where some objects had not yet been replicated, but your app ought to handle things like that gracefully anyway.
[0] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html

By zedpm 9 years ago
The only appropriate comment is that this issue is affecting all of our buckets, both in us-west and us-east. Replicating to another region would yield no useful benefits in this specific failure scenario.

By thraway2016 9 years ago
Can't agree with this. Buckets in eu-west-1 are fine

By nolite 9 years ago
http://downforeveryoneorjustme.com/isitdownrightnow.com
RIP

By beeftime 9 years ago
Alexa smart home component stopped working, if you try to reinstall the Alexa app on your phone, you'll find that you can't even login anymore.

By rebornix 9 years ago
Can't access my website which is hosted on s3 (http://joshuajherman.com).

By zitterbewegung 9 years ago
[deleted]

By 9 years ago
This is one of the times that I am glad to be running my own distributed object storage. I'm sure it's not as robust as Amazon, but......

By bkruse 9 years ago
What are you using?

By jasonjayr 9 years ago
Openstack Swift - easy to get setup and reliable. Takes a little expertise but was well worth the investment in learning it!

By bkruse 9 years ago
The status page is stored in s3. It can't be updated. The page you see is cached in cloudfront. They are working on updating the status page.

By shifted316 9 years ago
What a shame they took down MegaUpload! Clearly we need greater competition in the wholly-owned-infrastructure, file-hosting-as-a-service space.

By contingencies 9 years ago
Getting Issues with Citrix Sharefile api (which I've suspected to run in S3). Seems to only be impacting writes in preliminary assessment.

By jefe_ 9 years ago
[deleted]

By 9 years ago
Just posted on their Twitter:
"The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."

By LeonM 9 years ago
Additionally, Zendesk is apparently failing to process new tickets, so our users can't report the errors they're encountering.

By BrandonM 9 years ago
I'm confused, just logged into work account, and site, and some contract stuff I do. All use S3 / Cloudfront... no errors...

By philliphaydon 9 years ago
[deleted]

By 9 years ago
Heroku as well: https://status.heroku.com/

By afshinmeh 9 years ago
[deleted]

By 9 years ago
There is no cloud, there is only someone else's computer.
:(

By newman314 9 years ago
I'm having issues with CloudWatch and related monitoring services; eg auto-scaling groups are unable to scale up or down.

By bkanber 9 years ago
Post your opinion on http://wp.me/p7HKNy-5h

By vanpupi 9 years ago
> We have now repaired the ability to update the service health dashboard.
It seems their status page is hosted ... as a S3 static website.

By gcoguiec 9 years ago
> We have now repaired the ability to update the service health dashboard -- AWS Status
Well that explains all the green checkmarks /s

By c4urself 9 years ago
Yes, affecting elb in us-east-1 right now. web services are down and unable to bring up the elb screen in the aws console.

By stevefram 9 years ago
Yeah, same here on US-WEST-2. Unable to use the S3 Console, but I can still upload/get content via the API it seems.

By alfg 9 years ago
Is cli working for anyone else? I can't use the console UI, but aws s3 ls and get commands seem to be working fine.

By 4wmturner 9 years ago
I can't get to my Amazon Orders page. "There's a problem displaying some of your orders right now."

By travelton 9 years ago
Yeah, it looks like that was part of the failure.

By palad1n 9 years ago
Any specific regions? us-west-2 seems fine to me. [edit] now I can't see any of my buckets in the web interface.

By kyleblarson 9 years ago
Appears to be us-east-1

By jrs235 9 years ago
Yes it is down
https://status.aws.amazon.com/

By francesco1975 9 years ago
Yes Trello is down and they are using S3 :(

By Globz 9 years ago
Dropbox using this? Can't seem to sync

By cwe 9 years ago
They used to, but i think they are off it now.

By rckclmbr 9 years ago
"Amazon CloudFront: Service is operating normally"
This is bullshit if you're using an S3 origin in your distribution.

By bas 9 years ago
Heroku apps are also down because of this!

By ignaces 9 years ago
This can't be only the US-EAST-1 region. I'm a european resident and most things are down for me too.

By Svenskunganka 9 years ago
I've checked eu-west-1 and it works fine for both reads and writes.

By samat 9 years ago
I'm not an AWS customer, but if it is comparable to Google Cloud's Multi-regional Storage it should be geo-redundant. Doesn't S3 replicate the data across regions, so in case a region goes offline it won't affect the service?

By Svenskunganka 9 years ago
My EC2 Servers are also not provisioning.

By indytechcook 9 years ago
Same here. I can't find any information about that though.

By amasad 9 years ago
Unable to log into my servers. They are still up and taking traffic, but no contact. Also unable to provision new servers at this late hour.

By partisan 9 years ago
Getting the same error on the GUI but the aws cli and sdk seem to be working fine (our site is still up too)

By dgelks 9 years ago
My fire tv stick is totally unusable too. Seems I can't access any applications (even Lodi or Netflix)

By maccard 9 years ago
Still down for us. S3 seems to be the only thing affected - our mobile apps work fine (EC2 and RDS backend)

By headcanon 9 years ago
Anyone else seeing ELB/ALB issues?

By djb_hackernews 9 years ago
Yup. Some of our machines in us-east-1e dropped out of the load balancers.

By redthrowaway 9 years ago
Yes

By asolove 9 years ago
If I listen closely, I think I can hear the pagers going off in South Lake Union from Downtown Seattle.

By mcheshier 9 years ago
You'll remember me when the west wind moves
Upon the fields of barley
You'll forget the sun in his jealous sky
As we walk in fields of green

By kardashev 9 years ago
Status page is lit up like a Christmas tree! Looks like AWS finally found the myriad non-green icons.

By shiven 9 years ago
Huh, I wonder if that's why Origin (EA's Steam competitor) cloud sync just stopped working

By dageshi 9 years ago
[deleted]

By 9 years ago
they already admitted it: https://www.theregister.co.uk/2017/02/28/aws_is_awol_as_s3_g...

By skiril 9 years ago
"bit-barn bods" is some of the worst alliteration I've ever seen.

By draw_down 9 years ago
TierPoint, a large hosting service, is reporting a massive DDOS attack on their infrastructure.

By murphy52 9 years ago
source?

By ignaces 9 years ago
1. Announce security vulnerability
2. People push updates as fast as possible to fix security
3. No tests, so everything blows up

By k__ 9 years ago
Thanks for posting this. I've passed this information through my network.
Slack image uploads are hanging.

By balls187 9 years ago
Anyone doing a region failover? Any issues so far? We are making plans to flip to us-west-1

By happyrock 9 years ago
We did a failover to us-west-2 and it seemed to work.

By whorleater 9 years ago
Looks like SoundCloud is hosting the tracks on S3 , can't program without my music...

By myth_drannon 9 years ago
this still works for me https://musicforprogramming.net

By yuxt 9 years ago
For our app, both S3 and SES have been completely down in us-east-1 for hours now.

By JustinAiken 9 years ago
Is S3 down outside of Us-East too? I can't seem to create a bucket in US-West or EU

By thomassharoon 9 years ago
Other regions still work, but the web console relies on us-east-1 so you should use the API to create new buckets until the issue is resolved.

By officelineback 9 years ago
You can always check by going to www.isitdownrightnow.com/
Oh wait. The site sits on S3. Never mind.

By KurtMueller 9 years ago
Do the engineering thing and build fault tolerant systems. Maybe adopt features that have been around since 2015:
https://aws.amazon.com/blogs/aws/new-cross-region-replicatio...

By zerotolerance 9 years ago
SES seems to be down for us as well in Virginia. Of course nothing on the status page.

By nicpottier 9 years ago
Not sure if it's related... but I'm having issues with Amazon Cloud-drive.

By mwambua 9 years ago
I would assume, I couldn't even share a screen shot of my evidence to my team on slack!

By iamdeedubs 9 years ago
Lol... It's interesting how much depends on Amazon's infrastructure at the moment.

By mwambua 9 years ago
Also, interesting everyone is posting that various sites which are down in this thread. Feels like the internet is down!

By vinayan3 9 years ago
Just finished reading The Everything Store... I bet a "?" email went out.

By jsperson 9 years ago
Definitely experiencing non-loading for dependencies hosted on S3 at the moment...

By magic_beans 9 years ago
S3 and Elastic Beanstalk (S3 dependencies) ... no issues with RDS at the moment

By booleanbetrayal 9 years ago
Been unreliability informed about 1 hour ETA for a fix. fingers crossed

By mystcb 9 years ago
We're on AWS GovCloud and our S3 is all good. GovCloud is its own region.

By ondrae 9 years ago
Shame we pay a bazillion dollars for it. Anyway, curious to know how you've thought about mitigating something like this? I worry if it happens to us govcloud users we obviously have much fewer options for redundancy.

By neom 9 years ago
It appears to be down. My website runs on S3 and my monitors are going nuts!

By amcrouch 9 years ago
Yes serious API problems started about 15 minutes ago. Around noon central.

By tech4all 9 years ago
If you've ever felt the AWS health dashboard was dubious before now...

By edcoffin 9 years ago
Apparently app updates on iOS are failing right now, too. Could be related?

By oaktowner 9 years ago
Yep, currently have over 20,000 people on site seeing no images. Wonderful

By Exuma 9 years ago
Even Kindle books aren't able to be served; download attempts hang.

By DocK 9 years ago
Anybody else seeing 500 errors with AWS Cognito for us-east-1?
They are consistent for me.

By Rockastansky 9 years ago
[deleted]

By 9 years ago
We host with TierPoint and they are reporting a massive DDOS attack

By murphy52 9 years ago
We're seeing queries using Athena against S3 fail in us-east-1

By tomharrisonjr 9 years ago
Any truth to this being a DOS by some kiddies named Phantom Squad?

By reiichiroh 9 years ago
Wow, amazing to watch stuff go down as this problem ripples out!

By framebit 9 years ago
Same here. Also having trouble publishing to S3 via CLI and API.

By austinkurpuis 9 years ago
Is anyone else also seeing 500 errors for cognito on us-east-1?

By Rockastansky 9 years ago
Yup. Every single image on my site is hosted there.... eek! :|

By kolemcrae 9 years ago
Yah getting the same error in multiple regions as of 1:12 EST

By exodos 9 years ago
S3 Ireland (eu-west-1) seems to be doing fine at first sight.

By twiss 9 years ago
Is there a list of all apps/services that rely on S3?

By j_shi 9 years ago
Would be easier to compile the inverse.

By mierenga 9 years ago
- launching AMIs (they are stored on s3)
- cloudfront
- ses
- ebs
- rds (snapshots/backups on s3)
- lambda functions (appear to be stored in s3)
perhaps others

By scottlinux 9 years ago
they're down to 3 nines
edit: for the year, it only takes 52.57 minutes

By gtrubetskoy 9 years ago
It's still down. All morning! So much business lost.

By simplehuman 9 years ago
Experiencing issues with S3 and ELB for over an hour now.

By hyperanthony 9 years ago
We're getting errors indicative of an S3 outage too.

By jacobevelyn 9 years ago
yeah, looks like Travis CI is down, too: https://www.traviscistatus.com

By afshinmeh 9 years ago
And Trello

By freyr 9 years ago
Heroku API/Dashboard is down, Bugsnag is down, etc.

By dorianm 9 years ago
Heroku API/Dashboard is down, Bugsnag is down, etc.

By dorianm 9 years ago
via AWS twitter account "The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates."

By ajmarsh 9 years ago
QUESTION. There could be data lost from this failure?

By edgartaor 9 years ago
Upload failing for me from Sacramento --> us-east-1

By Rapzid 9 years ago
Quora is down too. Getting 504. Gateway Timeout.
Is it related to S3??

By the_arun 9 years ago
Finally! The status page admits something's up.

By oneeyedpigeon 9 years ago
Can anyone get Alexa to play music? Is this related?

By rbirkby 9 years ago
No, it's not https://status.aws.amazon.com/rss/s3-us-standard.rss

By paulddraper 9 years ago
Looks like the RSS feed hasn't been updated for an hour and a half.

By lancefisher 9 years ago
[deleted]

By 9 years ago
We're seeing recovery across our services now.

By carimura 9 years ago
I'm trying to reach S3 hosted website, no luck

By orn 9 years ago
I'm seeing the same error on eu-west as well.

By ianopolous 9 years ago
Dropbox is down as well. This is going to be gud.

By sz4kerto 9 years ago
finally status are updated https://goo.gl/wCINaC

By manshoor 9 years ago
I'm curious to see the postmortem for this.

By nomadic_09 9 years ago
Never depend your business on a single provider.

By aytekin 9 years ago
Yes. and Now my HN profile page is down as well.

By oculusthrift 9 years ago
I've been getting sporadic "Bad Gateway" errors on HN for the past half hour or so, not apparently associated with any one feature (such as profile pages). My suspicion is that it's unrelated to anything happening at AWS (HN doesn't have any AWS dependencies, does it?), though maybe everyone checking HN to see what the status is has increased load.

By grzm 9 years ago
Making an already troublesome day worse. Yeehaw

By grimmdude 9 years ago
I'm seeing problems with Kindle downloads.

By chiph 9 years ago
bets as to the cause? internal DDoS against their dynamo clusters backing s3? DNS issues between amazon's services?

By nickstefan12 9 years ago
hex.pm and docker hub are both failing, a lot of projects can't CI because of these. The house of cards we built.

By chx 9 years ago
[deleted]

By 9 years ago
Looks like they store the statuses on S3

By kopy 9 years ago
US-West(Oregon) just went down as well.

By Trisell 9 years ago
us-west-2 (Oregon) is still up for me from the CLI.

By deboflo 9 years ago
Looks like it. Brief panic caused here.

By josephlord 9 years ago
Any opinions you can post on http://wp.me/p7HKNy-5h as well

By vanpupi 9 years ago
No

By Exuma 9 years ago
SES also down.

By _callcc 9 years ago
StatusPage.io survived. Thanks gents.

By outericky 9 years ago
trello and giphy both seemed affected

By jflowers45 9 years ago
Heroku seems to suffer from this too

By uranian 9 years ago
Dashboard has been updated, finally!

By magic_beans 9 years ago
SES seems to be down for us as well.

By nicpottier 9 years ago
SES seems to be downf or us as well.

By nicpottier 9 years ago
Not seeing any errors from eu-west-1

By nodefortytwo 9 years ago
Having issues as well.. big issues..

By rhelsing 9 years ago
Mr. Robot live shoot? :)
slack file services down too

By SubiculumCode 9 years ago
Leap day bug?

By aabajian 9 years ago
And yet people think I'm crazy for wanting to wrap get time functions so code can be tested...

By kyled 9 years ago
wow even services like Intercom are affected, I can't see who is on my website right now.

By soheil 9 years ago
yeah still all green in AWS status.... maybe their red and yellow icons are kept on S3. :-)))

By sweddle 9 years ago
Many a true word is spoken in jest - that's exactly what the problem is :)

By camperman 9 years ago
Now what kind of business choose to remain down for 2 hours plus during the peak business hours?
Seems cloud computing still has a lot to learn.

By sk2code 9 years ago
[deleted]

By 9 years ago
Is this only affecting US-EAST-1?

By rhelsing 9 years ago
eu-west-1 is doing great. Obviously European ops are superior to their US counterparts.

By nvarsj 9 years ago
Same here. We are seeing issues.

By bseabra 9 years ago
Same here. US East (N. Virginia)

By ARolek 9 years ago
We all laughed at the notion om moon people dropping rocks at the earth.
Then they started dropped rocks on S3 and who is laughing now?

By mvindahl 9 years ago
in the s3 web interface requests to S3 backend end with 503 Service Unavailable

By cryreduce 9 years ago
Works for me, in us-west-2.

By Beacon11 9 years ago
you sure? it's not for me.

By joshuahaglund 9 years ago
Same here

By the_arun 9 years ago
is it just us-east-1? could it be prevented by using a different region?

By thepumpkin1979 9 years ago
[deleted]

By 9 years ago
Dead as a doornail for me

By 65827 9 years ago
Yea seeing the same thing

By danielmorozoff 9 years ago
news.ycombinator.com seems really slow right now. s3 dependencies?

By SubiculumCode 9 years ago
[deleted]

By 9 years ago
My website is not down.

By Eyes 9 years ago
same

By thadjo 9 years ago
Yup - dead in the water

By jgacook 9 years ago
Seeing it here as well.

By baconomatic 9 years ago
Cmon but the cloud is magic and very reliable let's move everything to the cloud

By qaq 9 years ago
quite ironic that 'isitdown.com' is also down

By mrep 9 years ago
you should always try 'isisitdown.com/down' first

By Humdeee 9 years ago
isup.me is fine.

By notriddle 9 years ago
Netflix is up. Enjoy

By AzzieElbab 9 years ago
Same here in US EAST

By Raphmedia 9 years ago
Seeing the same here

By xvolter 9 years ago
Outage as a Service

By julenx 9 years ago
The same here still

By ahmetcetin 9 years ago
Yes, appears to be.

By dbg31415 9 years ago
Same problem bro...

By TheVip 9 years ago
Quora is down too.

By prab97 9 years ago
Is it down again?

By sonnyhe2002 9 years ago
It's down :(

By jsanroman 9 years ago
SES is also down

By 0xCMP 9 years ago
[deleted]

By 9 years ago
Yep, same here.

By mtdewulf 9 years ago
is this affecting dockerhub for anyone?

By eggie5 9 years ago
Yep, can't push new images right now

By sklarsa 9 years ago
Yes, docker pulls are failing

By newsat13 9 years ago
same here, east us seems non-responsive

By jahrichie 9 years ago
The same here

By ahmetcetin 9 years ago
Down for me.

By methurston 9 years ago
any one get more info from AWS?

By kangman 9 years ago
what's the SLA for s3?

By kangman 9 years ago
Here it is, in typical SLA language... only AWS knows what it means https://aws.amazon.com/s3/sla/

By nomadicactivist 9 years ago
There's logic behind the complicated rules. The idea is that you can't cause the calculated availability to go down by generating lots of requests when the service is down.

By jpalomaki 9 years ago
And my favourite part: "To receive a Service Credit, you must submit a claim by opening a case in the AWS Support Center." -- for a company that has built itself on automation, surely you could automate some bill credits based on the SLA.

By nomadicactivist 9 years ago
Many companies don't give out refunds and you have to proactively request. (alas my employer's value prop, in a completely different industry of course)

By bdcravens 9 years ago
You can open tickets in support with API calls ... just sayin.

By kesor 9 years ago
Does the AWS Support Center also run on AWS?

By ucaetano 9 years ago
Yes

By aarondf 9 years ago
Same

By davidsawyer 9 years ago
Yes.

By GabeIsman 9 years ago
region-west2 is also down

By dhairya 9 years ago
heroku API is down for me

By thadjo 9 years ago
They placed their API into maintenance mode: https://status.heroku.com/incidents/1059

By smmnyc 9 years ago
So why did the outage occur?

By kfkhalili 9 years ago
Azure is also down. Related?

By davidcollantes 9 years ago
What, to Yahoo Mail being down? Can't see the connection myself. Even with HN I had the twitter status down page before I reloaded five seconds later, in disbelief... Maybe it is not S3.

By Theodores 9 years ago
getting the same...

By renzy 9 years ago
yes, confirmed.

By eggie5 9 years ago
yes it is.

By simook 9 years ago
yup

By b01t 9 years ago
[deleted]

By 9 years ago

  Increased Error Rates

  Update at 11:35 AM PST: We have now repaired the ability to 
  update the service health dashboard. The service updates 
  are below. We continue to experience high error rates with 
  S3 in US-EAST-1, which is impacting various AWS services. 
  We are working hard at repairing S3, believe we understand 
  root cause, and are working on implementing what we believe 
  will remediate the issue.

Amazon hosted their status page on their failing service, ouch. Now they fixed the status page, after more than one hour.

  The dashboard not changing color is related to S3 issue. 
  See the banner at the top of the dashboard for updates.

https://twitter.com/awscloud/status/836656664635846656

By frik 9 years ago

So this is particularly weird - one of my instances was showing 0% CPU in CloudWatch (dropped from 60% at the start of the event), but the logs were saying 'load 500'. I ssh'd in... and the problem resolved itself. The only thing I did was run htop to look at the load, and it dropped from 500 (reported in htop) to it's normal level. Just ssh'ing in fixed that issue.

By vacri 9 years ago
Sorry, my simplistic mind is only thinking this right now:
http://alessandrobender.com.br/wp-content/uploads/2015/07/fi...

By devenrl 9 years ago
According to the personal health dashboard, they've root-caused the S3 outage and are working to restore.
In the meantime, EC2, ELB, RDS, Lambda, and autoscaling have all been confirmed to be experiencing issues.

By joatmon-snoo 9 years ago
According to the personal health dashboard, they've root-caused the S3 outage and are working to restore.
In the meantime, EC2, ELB, RDS, Lambda, and autoscaling have all been confirmed to be experiencing issues.

By joatmon-snoo 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
Its also affecting http://status.fabric.io/
cant build Android apps or update beta builds

By skryshtafovych 9 years ago
Getting Issues with Citrix Sharefile api (which I've suspected to run in S3). Seems to only be impacting writes in preliminary assessment.

By jefe_ 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
[deleted]

By 9 years ago
Is cli working for anyone else? I can't use the console UI, but aws s3 ls and get commands seem to be working fine.

By 4wmturner 9 years ago
Looks like SoundCloud is hosting the tracks on S3 , can't program without my music...

By myth_drannon 9 years ago
[deleted]

By 9 years ago
Well this took out Quay and CircleCI! Hopefully this gets resolved ASAP.

By chadscira 9 years ago
EVERYBODY PANIC! US-EAST-1 is what we use, down for us.

By nanistheonlyist 9 years ago
Dropbox is down as well. This is going to be gud.

By sz4kerto 9 years ago
[deleted]

By 9 years ago
Is it down again?

By sonnyhe2002 9 years ago
soundcloud uses aws s3. it is still down.

By thenewregiment2 9 years ago
back up

By la6470 9 years ago
are you openly admitting that the AWS service status page runs on AWS? because that is far more embarrassing than this downtime ever could be

By fletom 9 years ago
That's piling on, and rudely at that. Please don't be rude on HN.
We detached this subthread from https://news.ycombinator.com/item?id=13756997 and marked it off-topic.

By dang 9 years ago
maybe just the image icon? idk...

By elwell 9 years ago
Mass outage like this is exactly one of the things we are looking to avoid by building a decentralized storage grid with Sia.
Sia are immune to situations like this because data is stored redundantly across dozens of servers around the world that are all running on different, unique configurations. Furthermore, there's no single central point of control on the Sia network.
Sia is still under heavy development, but it's future featureset and specifications should be able to fully replace the S3 service (including CDN capabilities).
https://sia.tech

By Taek 9 years ago
An occasional relevant reference to one's own work is fine on HN, but you've crossed into what this community considers spam, both by doing it too often and by doing it in off-topic places. That will eventually get your account and site banned, so please stop overdoing it, especially if your work is actually the kind of thing the community would find interesting in appropriate doses and places.

By dang 9 years ago
Understood. I will tone it back.

By Taek 9 years ago
Not to push my own product...proceeds to push own product

By elbigbad 9 years ago
[deleted]

By 9 years ago
MaidSafe's SafeNetwork is a much more robust proposition. Sia and Storj could be just simple apps in the SafeNetwork. Using the Blockchain is not adequate for storage purposes, and proof of work is just as silly.
MaidSafe is rewriting all the OSI layers from layer 3 and above, guaranteeing extreme resilience, security and anonymity natively, besides of being totally distributed atomically, self-healing, self-encrypted and self-authenticaticated.
Both Storj and Sia are cute hacks compared to the massive architectural reimagining that MaidSafe is doing.
Read more about it:
Article in Techcrunch: http://techcrunch.com/2014/07/23/maidsafe/
SafeNetwork explained for bitcoiners: https://safe-network-explained.github.io/safe-for-bitcoiners

By inmiseravincit 9 years ago
A cute hack that has been up and running in a fully decentralized fashion for more than 18 months. Something that MaidSafe cannot claim even after over a decade of development.
Sia certainly has less ambitious goals. But it also has a strong track record of delivering.

By Taek 9 years ago
>Not to push my own product
Why would you say that? That's exactly what you're doing...

By thiht 9 years ago
[deleted]

By 9 years ago
At first I thought, "meh.." and, seriously, just the globally distributed filesystem alone is incredibly hard to make very reliable, but that's the only critical job: give data back when asked.
But then I looked at your site.. looks like bidding on surplus storage on different systems. Great idea, especially if you can ensure that people don't botnet it to death. I'm looking forward to hearing great things from you in the future.

By jamiesonbecker 9 years ago
Every time there's an outage like this (AWS, Github, etc) you'll see posts about decentralizing. I consider it, then realize that getting started on decentralization will take longer than the highly reliable service's downtime I'm trying to plan for.

By bdcravens 9 years ago
Worth a look tbh.

By DevKoala 9 years ago
Oddly, the self promo worked for a Google employee (see the top rated comment by boulos), but not for you.

By bbcbasic 9 years ago
The google guy pointed out that they are API-compatible (thus easy to run side-by-side) and, most import IMHO, he was humble, self-deprecating, and honest.
This here is insane. The claim that something nobody has ever heard of is more reliable than AWS or Google is the worst possible way to promote the product – even if true.

By matt4077 9 years ago
Perhaps I did a bad job of explaining, but it's a fundamentally different architecture, based on the same technology as Bitcoin. It's a platform that's been designed to outlive its parent company. And many blockchains already have outlived the companies and developers that built them. This one just happens to focus on cloud storage.

By Taek 9 years ago
That would be because the Google employee didn't claim they were "immune" to problems, and actually admitted having their own problems from time to time.

By IAmGraydon 9 years ago
I like how you know this comment is in poor taste, and posted it anyways.

By ocdtrekkie 9 years ago
That's a non sequitur and rude. Please don't be rude in comments on HN.
Edit: I just refreshed my memory of what's going on with your comments and it's not ok. As we've explained to you before, your attacks on one specific organization (it's obvious which one, but it doesn't matter which one it is) go to such an extreme that they long ago crossed into trolling. We asked you to stop; since you either can't or won't, we've banned your account. I hesitate to, because it exposes us to the criticism some people love to make, that we're protecting interest X or interest Y. But you've so clearly been abusing HN that I don't see what choice we have. This site is for thoughtful conversation, not tendentious vendettas.
As it happens, we banned another user some time ago for doing the same thing about the same organization, only they were promoting it instead of attacking it. Either way it's not ok.
We detached this subthread from https://news.ycombinator.com/item?id=13758088 and marked it off-topic.

By dang 9 years ago
I'm happy he posted it. TIL that they're API compatible. That's a big deal for anybody thinking of insulating themselves from something like this in the future without maintaining two code bases.

By koolba 9 years ago
"you might find the comment poor in taste" != "they knew it was poor in taste"

By treehau5 9 years ago
Right, it's more of a grey area. It's clearly a bad day for all involved, but there are enough sub-threads here about "What can I move to" that I felt a top-level comment about how GCS has worked hard to continue S3 interop support was warranted (it's clearly not widely known).

By boulos 9 years ago
"I'm sorry if you're offended..." :)

By markcerqueira 9 years ago
Well, getting offended by something that isn't a direct, personal attack is a sign of emotional immaturity.

By TeMPOraL 9 years ago
I would argue the very notion that he thought the disclaimer might be necessary meant he full well knew he was posting in poor taste.
And as he states, he knows they have outages too. So it's like if there were two towns in tornado alley, and one town got hit by a tornado, and the other was like "well, it might be in poor taste, but there isn't currently a tornado in our town".
If Google had some higher ground to stand on when they made their marketing posts, maybe they'd have merit. But when they're just pushing product that's no better than the product they're commenting about, it's just spam. If he said "we have 30% less outages than AWS" or something, at least there'd be merit to posting it.

By ocdtrekkie 9 years ago
I'm happy the top comment was posted, and simultaneously happy that you pointed out that it is in questionable taste/timing.
Life is full of tradeoffs and balances, it's not wrong to point out the social or human ramifications of a post, just as it's not wrong to reply, "Yes, but nevertheless..."

By braythwayt 9 years ago
fuck the police

By tommy1212 9 years ago
I try not to put all my eggs in one basket, that's why for images I use imgur. They have a great API and it's 100% free. There is a handy ruby gem [1] which takes a user uploaded image and sticks it on imgur and returns its URL with dimensions etc. On top of that you don't have to pay for traffic to those assets.
[1] https://github.com/soheil/imgur

By soheil 9 years ago
Well, imgur.com is currently down, too. I assume they are still using AWS: http://blog.imgur.com/2013/06/04/tech-tuesday-our-technology...

By Narretz 9 years ago
Imgur's down right now as well (except direct links to images) and it's not lossless (they resave images).

By K2L8M11N2 9 years ago
I admire your sword of sarcasm +1

By SubiculumCode 9 years ago
:)

By soheil 9 years ago
Using imgur over a service that you pay for (with an SLA) is, as my college CS professor used to call it, "skating on thin ice".

By bastawhiz 9 years ago
imgur doesn't use S3 behind the scenes?

By arrty88 9 years ago
It uses it, and it's completely down right now.

By BoorishBears 9 years ago
[deleted]

By 9 years ago
You guys are so funny sometimes.

By draw_down 9 years ago
[deleted]

By 9 years ago