Category Archives: Security

Uptime Metrics

pingdomI read a post at Royal Pingdom the other day (via RWW) regarding Feedburner’s uptime, and it got me thinking about uptime in general: I feel total percentage of uptime is a misleading metric. For an analogy, if a broadcast meteorologist says there’s a 50% chance of rain in the next 24 hours, it should mean that there’s a 1 in 2 chance that rain will fall on me, which is probably how every listener interprets that information. I doubt very much (though I may be wrong) that they take in to account where in their broadcast area the rain will fall and adjust the probability for how many of their listeners will actually be rained upon. I live not far from the Pacific Ocean, and I’m sure lots of the forecasted rain falls out there on very few people. Although the level of meteorology expertise I expect only exists in Back To The Future Part 2, I think that personalized forecasts are nonetheless a realistic goal.

When it comes to Internet meteorology, which is what Pingdom does, a number like 99.94% uptime is missing critical information. I believe that 0.06% of the time that Pingdom tested Feedburner they got an error, but what does that mean to me? Did everyone who uses Feedburner have similar success, or were there 5,000 people who saw Feedburner down more like 2% of the time? Also, serving up RSS is more like serving up mail than serving up web pages; errors are more hidden and should have graceful recovery. RSS being down is quite different than broken images on an otherwise working site or full outage; it’s not a binary state in my opinion. Many sites have visitors that only stop by for a few minutes each day — downtime is less critical in that case. Furthermore, downtime is not necessarily systemic; two hours of downtime may be catastrophic, but if it only happens once every 5 years, then it’s hardly a trend.

I’m not sure there’s a metric that can easily describe the nuances of uptime better than the percentage, but I hope there is one. Here are some ideas. Bear in mind that these stats should be based on data such as the geographic distribution of the site’s audience and the average session length. In short, I suggest coloring Pingdom’s downtime stats with Compete.com’s traffic data to get a better picture of outages and spot trends.

  • Total User Downtime: the number of hours of server downtime in a specific time period times the number of users affected. Short downtime that affects everyone would be balanced with long downtime on a subset of users.
  • Daily Audience Percentage Affected: percentage of users affected at the time an incident. If there’s a rolling incident that only affects 5% of all users at a time, but gets everyone at some point in the day (as upgrades sometimes do) then the percentage of affected users would be high even though the incident would appear to be 5% downtime from a single user’s standpoint. On the other hand, 50% outage when 90% of the users are not using the site shows good planning.
  • Last X Days Peak [TUD/DAPA]: given one of the above metrics, report the worst (peak) value in the last X number of days. This would mean that it takes a while for incidents to decay, which would be useful for people shopping for reliable hosting.

I am not an expert on these things, just a concerned web software developer who wants better data. If better thinking on uptime metrics has been done elsewhere by smarter people, please let me know with a link in the comments.

Plugging SmugMug’s “Hole”

Today a blogger named Philip Lenssen wrote a post on Google Blogoscoped that showed how private but otherwise unprotected SmugMug galleries can be downloaded without the owner’s consent. In the wake of the recent and similar MySpace private pictures hole, this seems like a serious PR problem waiting to happen. How long will it be before someone’s “private” SmugMug pictures get some major unwelcome publicity, and SmugMug along with them? I’m sure someone’s crawling all of SmugMug right now and packaging it up as a Torrent file (and no, not me).

Here’s a quick description of the “hole” as I understand it. All of SmugMug’s galleries use an ID number in the URL. If you want to see someone else’s photos, you just manually change the ID number to something else; it’s as easy as changing a URL from smugmug.com/galleries/1000 to smugmug.com/galleries/1001. As long as the photos are not password protected (which is a separate preference setting), you can view the photos regardless of whether or not the user has marked the gallery itself “private”. Mr. Lenssen goes on to describe that one solution is to change from numeric ID numbers to GUIDs which are non-sequential and almost impossible to guess. DonMacAskill, CEO of SmugMug, has not yet posted about this in his blog (why add to the fire?) posted his thoughts about this already, but an e-mail from him is quoted in the original post, admitting that GUIDs would be preferable:

I’m in completely agreement, that GUIDs would help greatly here, but I’m afraid our system wasn’t built for GUIDs, and retrofitting our code and database to support GUIDs would be an extremely expensive proposition. [...] We’re also very open to change – nearly every feature, bug fix, and enhancement is driven by customer feedback, like yours. If our customers (or potential customers) asked us to adopt GUIDs because this was a bigger issue than we were aware – we would.

I have an alternative and cheap solution for Mr. MacAskill that would solve the guessable URL problem without using GUIDs which would be a minor patch to SmugMug’s web code that doesn’t necessarily require any database change, although it would benefit. It would satisfy one of SmugMug’s design goals for private pictures/galleries, namely that you could send a link to a private item. The suggestion is this: leave the URLs alone, but add a checksum key as a separate parameter based on private hash salt. Read more »

Sharepoint Locked for Editing

While I was working on a document through our new Sharepoint team site at GWC, Word crashed due to some permission changes. After trying unsuccessfully to open it again many times, I came across this MS support article:

You receive a “(Filename) is locked for editing by ‘another user’” message when you try to modify a document in Windows SharePoint Services even though you are the user who previously opened the document

CAUSE

When a document is opened by a client program, Windows SharePoint Services puts a write lock on the document on the server. The write lock times out after 10 minutes. Users cannot modify the document during the time when the document is locked.

In a scenario where the program that opens the document unexpectedly quits or crashes and you try to open the document again before the write lock times out, the message that you receive says that the document is locked by another user. This behavior occurs even though you are user who previously opened the document.

WORKAROUND

To work around this behavior, wait 10 minutes before you click Edit in ProgramName to open the document again. (dismayed emphasis mine)

God Bless Microsoft, and bless their arbitrary 10-minute timeout that protects me from myself. I have six minutes and counting. (Hums impatiently to self)

The importance of good DNS

We have reached a resolution regarding our DNS problem with BulkRegister. Meetings were had, apologies were offered and promises made. In the end, no system can 100% guard against human error, and they assure us that was the root problem here. We’re now faced with the choice of staying with BulkRegister, who now promises never to turn off skweezer.net again, or find another company whose trustworthiness has yet to be tested. For our part, we feel that now that our account has been admitted into BulkRegister’s theoretical gold club it’s worth staying with them and fostering this relationship.

Question: if someone claims {choose: (ask/yahoo/google/microsoft).com} is a fraudulent site, are they automatically disabled? I think not. What does it take to get in this club? Why is there a club in the first place?
We have learned that DNS is really an unavoidable single point of failure for a web company and deserves equal security attention/planning as network, hardware, and power. This saga also demonstrates how going after the registrar and not the ISP or hosting company of phishers is most effective; it really cuts a site off at the knees. For my part, I am tired of being on the wrong end of the digilantes who don’t understand that Skweezer is a mobilizing web proxy service, not a copyright infringer or phishing portal. That’s partly why I have this blog, so that one of these posts will be one of the top search results for “skweezer phishing” or “skweezer is stealing my content!” For all of you who got here and still are wondering: No, it’s not.

Skweezer – still not a phishing site

Update: Skweezer.net DNS came back online around 11 PM last night, as far as I could tell. Was it a mistake? Does the abuse department at eNom have someone on call at night? I still don’t know even this morning. In the meantime, I am investigating DNS monitoring services such as DNS Stuff’s DNSAlert. DNS is as much a part of security as RAID or UPS.

Right now www.skweezer.net is completely down because our registrar BulkRegister/eNom has suspended DNS service, despite communicating with us earlier in the month. The reason? We’ve been reported once again as a phishing site, which we’re not, obviously. I believe the true culprit is Netcraft’s overly zealous anti-phishing service (more details why I think this below), but BulkRegister has not evaluated the claim appropriately. I guess we’re going to have to get this out of the way once a year, but once again, repeat after me: Skweezer is not a phishing site. In the meantime, if you want to access Skweezer, you’ll have to do it via IP address: http://72.1.97.146/, or try our temporary alternate domain: http://www.skweezer.org. The problem with the IP address URL is a new one to me:

Skweezer suspected of Phishing

Read more »

The Login Barrier is a Barrier to Growth

Jeff Atwood wrote about the login barrier a few days ago, and I found it to be an interesting read, confirming our real-world experience with opening up Skweezer to the anonymous masses a few years ago. His basic point is that by hiding functionality behind a login/sign-up screen, sites turn users away. His conclusion:

If your application requires users to log in, don’t underestimate the impact of the login barrier you’re presenting to users. Consider utilizing anonymous, cookie-based accounts to give users a complete experience that more closely resembles the experience that named users get. By removing the login barrier and blurring the line between anonymous users and named users, you’re likely to gain a lot more of the latter.

As onerous as that barrier is to normal users, it is much worse for mobile users who have to triple-tap out passwords through multiple screens to fill in a form, God help them if they misspell their password and have to do it twice. Once we had the courage to open Skweezer to anonymous users, however, usage skyrocketed. In an attempt to quantify “skyrocket”, if memory serves me correctly, we received more unique anonymous users in the month following the switch than we had seen visit Skweezer in the prior two years combined. Oddly enough this eventually caused a spike in registered users also, as Jeff predicted above. Increased usage forced us to grow server capacity, which enabled even higher traffic, which in turn made us serious contenders for larger partnerships, and the cycle continues. Today Skweezer handles hundreds of thousands of users each week on the public site as opposed to just a few thousand subscribers per month. We just celebrated our 150 millionth page. Our revenue today is much higher than the “lost” subscription revenue, not to mention the priceless partnership opportunities that have been opened by our enterprise-grade capacity. You can’t get big unless you’re able to grow. I can’t wait to celebrate the next Skweezer milestones that are coming soon: the first million-user week, the first million-page day, and so on.

Looking back, it is clear that our perceived need for mandatory registration was a relic of our old business model to directly charge users a subscription for access. As for the future of Skweezer account registration, we will continue to allow anonymous site usage and will endeavor to leave sign up for only the actions that absolutely require it, such as saving favorites. My personal wish is to enable OpenID sign up someday in order to reduce the steps of the mobile user’s registration even more.

IIS local SSL certificate install illustrated

One of the all-time most popular posts here continues to be “Testing with a local SSL certificate for free“, even though it is over a year old. To save you the click, here are those three simple steps again:

  1. Download and install the IIS Diagnostics Toolkit from Microsoft.
  2. Run the newly installed SSL Diagnostics program
  3. Right click on your local website and choose “Create new cert”. It will install a two-week locally signed certificate on your machine that is not technically valid, but will at least allow you to test SSL activity.

Create CertificateIt recently occurred to me that the UI for the free Microsoft tool is not straightforward: it is not immediately obvious where you can right-click. Here are some screenshots to show what step three above looks like. First of all, the image to the right is what the SSL Diagnostics screen looks like, and it shows where you should click to make the certificate (click to see full size). Once you’ve completed this step, you’ll see the screen update and show you information that looks much like the screen below.

Post-Create Certificate

When you visit your local server to see if it works, you may get some security errors. For example, these are the two messages that I get when I try to visit https://localhost/ using Firefox:

Certificate Error Security Error
Firefox Certificate Error Firefox Security Error

On apologies

AOL LogoMost are familiar with the AOL fiasco this week (they accidentally made the search logs of more than 650,000 users public), and the resulting apology:

Although there was no personally identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

Still, the data was specific enough for the NY Times to track one user down. When confronted with that story, the AOL spokesperson apologized specifically to the unmasked woman, but added “there is not a whole lot we can do.” He went on to explain that the system that collected this data did not record the screen names of the users whose searches were captured, which I do not believe. There has to be a way for AOL to identify every one of the 650 thousand customers who were affected by this breach of privacy and apologize to them directly or somehow try to make it right. After all, how were these numbers consistently generated in the first place? (Perhaps the number is an internal customer ID, or maybe it’s a hash of their username.) Instead, AOL issued conditional apologies that can be summed up as this: “we’re sorry, but what’s done is done, it’s really not so bad, and it probably won’t happen again.”

I contrast this “apology” with a message I received yesterday from Peter Blum, developer of some very useful ASP.NET controls which I downloaded this week. Out of respect I don’t want to go into specifics of exactly what happened, but via e-mail he described the problem, sincerely expressed his personal remorse (“I feel really bad about my mistake”) and extended my license period. Here’s the kicker: this is a trial product and I’m not a paying customer (yet). I think other companies might have hoped their mistake went unnoticed, or perhaps qualified their mistake and become defensive. Further, Mr. Blum made sure to give me something as proactive compensation (an extension of the trial period), even though I personally had not yet complained. I am impressed.

As long as they are staffed by human beings, companies will occasionally make mistakes. The lesson in it for us here at Greenlight Wireless, a company that is also entrusted with sensitive user data, is to do our best in protecting that data, but be forthright and proactively apologetic if/when we accidentally let our customers down.

Skweezer is not a phishing site!

Update: it seems Skweezer is green now, even though there haven’t been any new ratings besides mine. Maybe this post triggered a review? We’ll never know, and my concern about lack of transparency stands.

 Big red XPhishing scams are a serious problem on the Internet. Thankfully this issue is getting more attention. While I consider myself immune to the pleas for my bank information, nevertheless (after reading some of the latest info about OpenDNS, which I intend to write about later) I recently installed the McAfee SiteAdvisor Firefox extension, which rates the sites you visit by safety. Naturally one of the sites I visit daily is skweezer.net. Imagine my surprise to see in my toolbar that both skweezer.net and greenlightwireless.net are flagged as dangerous by SiteAdvisor.

The entry for greenlightwireless.net has no ratings of its own, but is flagged evil because of its association with skweezer.net. Skweezer’s black eye, in turn, is because of a single user, “JoshMeister”, who on July 8 left this supremely insightful comment for Skweezer.net:

Phishing scam at http://www.skweezer.net/s.aspx/2/signin.ebay.com/ws/eBayISAPI.dll?SignIn

Skweezer affiliations (SiteAdvisor)And that’s it: we’re officially evil. My hat is off to you, JoshMeister. I’m glad the Internet is full of network experts like yourself who can tell the difference between a phishing site and a transcoding service. My response to this false charge is on the SiteAdvisor site (I validated as the website owner) but if you would like to help us out and set the record straight, please register as a reviewer and leave a comment in our favor.

I am concerned about the ripple effect of things like this. How many users are warned off from our site for the wrong reason? I doubt that the reviewer was malicious (probably just mistaken) but what if this was a competitor that wants us censored or hamstrung? How many services or ISPs license SiteAdvisor’s database? What is the vetting process that McAfee takes here? Each reviewer has a “reputation score”, but we were nailed by a single reviewer with a score of 2/7. Frankly I’m glad that anti-phishing services like this exist, but I’m concerned with one that relies too heavily on a community of users to provide a service that would otherwise be to expensive to staff properly. If this passed an internal review process, then the process is not thorough enough; otherwise, Skweezer.net would never have gotten flagged.

Tough love for WebTV Skweezer users

MSN TVI haven’t yet commented on our recent change to Skweezer to discontinue free WebTV access a few weeks ago (a.k.a. MSN TV, but I prefer to call it by the old WebTV name). To recap: since April 24, 2006 if you try to browse Skweezer with your WebTV device and you’re not logged in as a Skweezer Pro subscriber, you get a the following message:

Skweezer® Notification

Hello MSN TV user! Skweezer was developed for mobile phone and PDA users. In order to use Skweezer with your MSN TV device, you need to create a Skweezer Pro account. Click here to sign up for Skweezer Pro, or to update your existing Skweezer account to Skweezer Pro.

If you are not using MSN TV and believe you have received this message mistakenly, please let us know.

It was not done lightly. After all, how cool was it that a completely unintended group of users found a new use for our technology? Theoretically, Skweezer is a perfect fit for WebTV: the content is reformatted for the lower resolution screen, and our dynamic compression really speeds up the web for the mostly dial-up connections. It’s similar to the problem that mobile users have, and we thought that was pretty cool at the time. However, as we experienced rapid growth, it became important to re-examine our traffic patterns to see if there was some way we could improve service quality. Read more »