Monthly Archives: July 2006

The hash race

On my earlier post this week about coding for speed, I said that reinventing the framework is bad. That is to say, it’s unlikely that some home-grown bit of code will outperform similar code that’s already been written by finer minds. There are always exceptions. I present to you a case in point: some experimental code on hashing binary data.

Hashing is useful when you need to check to see if any data has changed (a checksum), or if you need to generate a unique number from some arbitrary content. The standard solution to this problem is to generate an MD5 hash. Another more secure option is to generate a SHA hash. These are built into the .NET framework, so you don’t have to know anything other than you can feed a byte array in one end and get a small byte array out the other end. You can then take the byte array and turn it into a number or string or whatever. Another option is to do an XOR hash over the whole thing. A 32-bit example XOR hash function in VB.NET will follow this post.

I ran a short test to compare how long it would take to hash 1 million random bytes 1,000 times using MD5, SHA, and three versions of XOR: 64-bit, 32-bit, or 16-bit.

Hash race chart

In this case, 64-bit XOR hashing was half as slow (or twice as fast?) as MD5, just as MD5 was half the speed of SHA. The lesson is: be familiar with the framework to know your options, but code algorithms yourself when there are obvious speed advantages.
Read more »

Interview question SQL

So we’re in the midst of hiring another developer, and this time we whomped up an interview test. Technical tests are an excellent means of separating the big talkers from the true techies. Here is just one of the test questions that I am surprised no one so far came close to answering:

Given the following tables, write SQL to retrieve:
(a) the names of all parents with children who are at least 18 years old and still live at home
(b) the ratio of men to women per zip code, assuming that the Gender field is either “M” or “F”

CREATE TABLE Person
      (PersonID int NOT NULL IDENTITY (1, 1),
      FirstName varchar(50) NOT NULL,
      LastName varchar(50) NOT NULL,
      ParentID int NULL,
      AddressID int NOT NULL,
      DateOfBirth datetime NOT NULL,
      Gender char(1) NOT NULL)

CREATE TABLE Address
      (AddressID int NOT NULL IDENTITY (1, 1),
      AddressLine1 varchar(100) NOT NULL,
      AddressLine2 varchar(100) NULL,
      City varchar(50) NOT NULL,
      State char(2) NOT NULL,
      ZIP char(10) NOT NULL)

For (a), I want to see them do a self join or an inner select, plus at least take a stab at doing some date math. Instead, they’re not reading the direction to get “parents of all children who are 18″ and instead are writing SQL to select people whose birthdate is before July 26, 1988. They don’t even try the “still live at home” part, that is, children with the same address ID. Read directions!

For (b), I have also seen some half-hearted attempts. We need two recordsets: number of males per zip code and number of females. Join them up, divide one by the other and voila! Gender ratio by ZIP code. Bonus points for converting to numeric datatype to get an actual decimal.

Am I being to rough? Is this really so hard?

Firefox is my development platform

Firefox logoMozilla’s Firefox browser is my browser of choice for testing and developing web applications. The browser itself is not the thing, it’s the extensions that make it so choice for what I do. Just today I had to spoof a referrer header to test whether or not it would be validated by some code, and I found an extension called TamperData to do just that; so far, so good. Not including TamperData (which I assume will become essential over time), here are the extensions I use that I consider essential to my Firefox experience while testing and developing sites:

Reinventing the framework is bad

There’s a bit of code on this site that regularly gets a few hits every day via search engines, namely my IP to country lookup code. BTW, WordPress is not so good at sharing code. There’s a function in that code that converts an IP string into a 64-bit integer (Int64). It does this by splitting an IP address into an array of four strings, then getting the hexadecimal value of those string, then joining the string back together and determining the numeric value. Sub-optimal, but it works.

It turns out that built into the .NET framework Net.IPAddress object, there are several methods that do this same thing but better. They are more robust, handle both IPv4 and IPv6 addresses, and the functions that are rewritten to use them run in about 1/3 the time (will post after the jump). My point is that before writing code, one should be familiar enough with the underlying framework to be sure that the same thing isn’t already written. One can get better code in less time.

Read more »

ESPN MVNO, part II

I had my misgivings about ESPN Mobile’s service some time ago. Since then, things haven’t gone so well for them, unfortunately. I really mean “unfortunately” because I’d like to see data services do better in America, really. Rising tide and all that.

While reading the news on this, I found a really good writeup on MVNO math by Julie Ask, a JupiterResearch analyst, describing how small this market really is. I have since subscribed to the Wireless section of the JupiterResearch weblogs. Good stuff.

Skweezer is not a phishing site!

Update: it seems Skweezer is green now, even though there haven’t been any new ratings besides mine. Maybe this post triggered a review? We’ll never know, and my concern about lack of transparency stands.

 Big red XPhishing scams are a serious problem on the Internet. Thankfully this issue is getting more attention. While I consider myself immune to the pleas for my bank information, nevertheless (after reading some of the latest info about OpenDNS, which I intend to write about later) I recently installed the McAfee SiteAdvisor Firefox extension, which rates the sites you visit by safety. Naturally one of the sites I visit daily is skweezer.net. Imagine my surprise to see in my toolbar that both skweezer.net and greenlightwireless.net are flagged as dangerous by SiteAdvisor.

The entry for greenlightwireless.net has no ratings of its own, but is flagged evil because of its association with skweezer.net. Skweezer’s black eye, in turn, is because of a single user, “JoshMeister”, who on July 8 left this supremely insightful comment for Skweezer.net:

Phishing scam at http://www.skweezer.net/s.aspx/2/signin.ebay.com/ws/eBayISAPI.dll?SignIn

Skweezer affiliations (SiteAdvisor)And that’s it: we’re officially evil. My hat is off to you, JoshMeister. I’m glad the Internet is full of network experts like yourself who can tell the difference between a phishing site and a transcoding service. My response to this false charge is on the SiteAdvisor site (I validated as the website owner) but if you would like to help us out and set the record straight, please register as a reviewer and leave a comment in our favor.

I am concerned about the ripple effect of things like this. How many users are warned off from our site for the wrong reason? I doubt that the reviewer was malicious (probably just mistaken) but what if this was a competitor that wants us censored or hamstrung? How many services or ISPs license SiteAdvisor’s database? What is the vetting process that McAfee takes here? Each reviewer has a “reputation score”, but we were nailed by a single reviewer with a score of 2/7. Frankly I’m glad that anti-phishing services like this exist, but I’m concerned with one that relies too heavily on a community of users to provide a service that would otherwise be to expensive to staff properly. If this passed an internal review process, then the process is not thorough enough; otherwise, Skweezer.net would never have gotten flagged.

The trouble with contextual ads

Self-googling is so wrong, but nevertheless I followed a recent link from a Google News search on “Greenlight Wireless” to one of our PR articles that celebrates our union with our web host, Data393. In case this doesn’t work later, here’s a screenshot for the full effect:

Screenshot of PR article

Note the Rackspace ad right in the middle of the article for their competitor. Either this is totally brilliant or a complete disaster depending on your point of view. I think it does more harm than good in this case, but that’s just my opinion. The article is all about how we went with Data393 and found them to be superior in all respects. When I see this ad, I am reminded of all the reasons why we didn’t we go with Rackspace. This is the advertising equivalent of wearing the loosing team’s jersey as you leave the stadium. This is just another example of how placing ads along side content can have unintended consequences.

Mobile Model Muddle

Quick: what kind of mobile phone do you have? You may know the brand, but what’s the model number? Don’t know? You’re not alone, according to this blurb from The Register:

A survey of 761 mobile phone users aged 15 and over, commissioned from Ipsos MORI by LogicaCMG, found that 49 per cent of mobile phone users didn’t know what model they use. A further nine per cent were unaware of the make.

This is a serious barrier of entry for services that require you to know this information before using them, and that translates into lost sales.