Bad Timing & The Mysterious AWS XML Exception
Sometimes karma is just going to get you. There's no point fighting it. A series of events & issues just come together in cosmic bliss guaranteed to completely wreck your weekend.
We rolled out some new code for a new client last week. One of the "big stories" for both us and the client was a migration away from in-house content storage to a cloud based solution. We had opted to use Amazon S3 for the file storage part and after ~6 weeks of project development, rigourous QA & Regression testing, and a bit of a stressful production release we were good to go. Our Biz/Mkt team had seen it and they were happy. Last minute checks of the production service were done, all looked good, and home we went.
System.ApplicationException: Error Occured in
System.Xml.XmlException: Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res)
Vague & unhelpful on a Saturday afternoon. After a visit to the office, many hours of debugging, testing on multiple servers & pcs with the most bizarre and unhelpful results, one bright spark noticed that the clock on the server was wrong. So we fixed the time.
And it "magically" started working.
*sigh* This kind of software voodoo drives me nuts so I took a few minutes today to figure out what the heck had actually happened.
So why would an incorrectly set system clock possibly manifest itself as a `Root element is missing`
XmlException. A little context first. The exception was occuring in our code on the AWS SDK's S3Client
Note: some code omitted & replaced with "..." for brevity
Time to break out .NET Reflector and go through the SDK Code.
As we follow the chain of calls, we find this little gem, the webservice sets it's own custom header called x-amz-date with the current time from the local system running the App. Well that's some progress. Some quick googling and we find that this timestamp requirement is mandatory for REST calls to the service and that it must be within 15 minutes of the Amazon System Time when the request is received otherwise a
RequestTimeTooSkewed error status will be thrown back.
After some quick testing on my local machine, this does indeed seem to be case. 14 minute difference. No problem. 16 minute difference. Still the XML Exception. As a side note, it turns out our PDC was serving time to our entire production domain of about 14 minutes & 59 seconds out of whack with the real world and lost another second over the weekend. That's just bad luck.
Alas, the docs would appear to be incorrect (at least from the aspect of the the .NET Library behavior). In fact there's a pretty horrid bug in the internal exception handling of the AWS SDK for .NET. When attempting to call the
GetResponse in the main invoke method, it throws an exception and dumps you down into the generic catch all WebExceptions block.
This is where it all goes pear shaped. The
WebException thrown is actually a 403 Access Forbidden. But the
ProcessRequestError method doesn't account for this. It'll trigger the retry flag on an InternalServerError or ServiceUnavailable status code but otherwise it will continue on, and attempt to parse the response body into an S3Error. In our case the response body is
XmlException gets thrown and the original
WebException gets lost in the ether
*Wanders off to find a way to log bugs with the AWS for .NET Dev team*