Mourning Posterous: how and why I built Urgeous

When Posterous announced it was closing its doors, I was upset. The writing had been on the wall for some time (ever since they were acquired by Twitter), and I had been prudent enough to use my own domain, but it still hurt. I would have to move my posts, and most importantly, find a new service.

Then I thought that instead of whining, I should do something about it.

The what

What I liked most about Posterous was the ability to post by email; it let you post from any platform, without any special app. If you only had your phone, you could post. If you were in a cybercafé half-way around the world, but had access to Gmail, you could post. So any system that I would build would need to have that.

I also liked the fact that you could post using Markdown syntax (although one had to declare one's intentions at the beginning of each post).

The rest of Posterous I didn't much care about: I never really got what "Spaces" were about, for example.

And as with most (or all?) other webapps, I hate login in with a passion. I clear my browser cache often, and use many different computers; and we all have so many accounts on so many web services now.

I understand that I'm being irrational, that login in is necessary and that there are many solutions to managing credentials. Still, I don't like it. I don't like wanting to do something and being presented with a security screen and having to produce a "password". I'm not crossing borders in a war zone -- I'm just editing my blog from my home!!

So I decided that my blogging system would not require users to login to post, or to manage their posts.

As a start, those were the only three requirements: post-by-email, Markdown, no login.

The how

I had been toying with Mailgun (email management as-a-service) for some time, and liked it a lot; the team is super-responsive, the service seems rock-solid, and I find it on the whole much simpler to use than their competitors. The ability to process incoming email was included almost from the start.

(Other experimentations have described the Mailgun solution in more detail before).

For storage, after looking at several possibilities (such as PostgreSQL 9.2, which I could never install!, or simple files on the filesystem, which becomes a mess very rapidly), I decided to use Amazon Web Services: not the cheapest approach, but it scales indefinitely and rarely breaks (and when it breaks, everything else is broken so it doesn't really appear as it's your fault, since all of the modern Internet seems to run on AWS). I used:

  • S3 (simple storage service) for messages
  • simpleDB for metadata.

SimpleDB is less well-known than S3 but is super useful; it's a key-value storage solution that one may call "no-sql" -- if not for the fact that it's queried with a sql-like syntax. What I like about it is that one can add attributes on the fly without "creating" them before hand, and query them immediately. Database columns really are a pain in this regard.

And so, here's how it works: when an email is sent to post@urgeous.com, it is forwarded by Mailgun to a script on urgeous.com; that script

  • makes sure the email is actually from Mailgun
  • takes the "stripped-text" part of the post and parses it into HTML ("stripped-text" is a field populated by Mailgun that includes the text version of the message without quoted parts or signature blocks)
  • stores the metadata to AWS SimpleDB and the HTML + Markdown text to AWS S3
  • sends a message back to the user

The other benefit of letting users post only by email is that Mailgun flags spam and can even filter it; we'll seen how it works in practice but I think email spam is easier to spot than blog or comment spam.

In order to let users edit or delete their posts, a random key is generated for each message received and sent back to the user, in direct clickable links (so, no login and no typing). This is not very secure as the keys travel in the clear, but I feel it's an acceptable compromise (and since there is one key per message, no one key can give access to all posts).

And that's it!

Posts can be seen by querying the post-id and fetching the corresponding S3 file.

There are also a couple of other functions (ability to reply to posts, saving of last 3 versions when editing, automatic tagging of messages using OpenCalais, etc.) but they can be described later and not all of them are implemented in full.

But for now, this thing works!

The future

I'm not making any promises except that I'm going to use this system for myself. But if it can benefit other people, then it would also be very cool. If it takes off, we'll figure out how to make it work, either with some kind of advertising, or by selling "premium" features.

But if it doesn't take off, then I will probably be able to maintain this as it is for an indefinite amount of time (since I'm using it myself anyway).

Mon, 25 Feb 2013 • permalink

For some queries, all first 10 results on Google are spam

The other day, I was looking for the phone number of my hairdresser to make an appointment; it turns out that when searching for the salon's name, Google provides only spam results that aren't helpful and may even end up costing you money.

The story

I couldn't remember the salon's name, but I knew where it was, so I looked it up in StreetView (the salon is in Sèvres, a small town near Paris, France):

My hairdresser in Sèvres

Next, I looked for "franck provost sevres", and I got two things:

  • "Places for franck provost near Sèvres":

Places near Sèvres

Those are interesting and would be helpful, except none of them are actually in Sèvres.

  • A list of web results for my query:

Pages mentionning franck provost sevres

It looked odd: no directory site is present in the results (such as "pagesjaunes.fr", "118218.fr", etc.); instead, there were only a bunch of specialized websites with very specific names:

  • meilleurcoiffeur.com means "best hairdresser"
  • justacote.com means "next door"
  • beaute-addict.com means, well, beauty addict
  • etc.

Checking them out, we find that every site suggests a different phone number for the same salon at the same physical address; here are some of those numbers: 08 99 18 58 xx, 08 99 10 35 xx, 08 99 02 19 xx, 08 99 51 06 xx...

Notice something? Here's a hint: the real number for the salon is in fact 01 46 23 86 xx.

Phone numbers beginning with 08 99 are toll numbers. Calling a number starting from 01-05 will probably be free (or very cheap, depending on your phone company), but calling an 08 99 number will cost you 1,34 euros just to make the connection, plus .34 euros per minute. And this money doesn't go to the salon, it goes to the site that set the number up (with the provider of toll numbers taking a hefty cut).

In short, the sites in Google's first page provide zero value and try to scam you into calling their toll number instead of the real number.

(What's worse, many of these sites serve a different version of their page to Google than to users, where the actual phone number can be seen).

It's a shame these businesses exist, but there's not much we can do about it. But why would Google help them thrive?

In Google's defense

Is there anything that can be said in Google's defense? Actually, yes.

There's no "Franck Provost" salon in Sèvres anymore. It changed its name some time ago (left the Provost franchise) and is now "Fréquence Beauté Coiffure Rive Droite" (a local franchise).

Street View wasn't updated recently enough to reflect the change (which is understandable), but all online directories were, as well as the Google index. So in the Google index, the only pages that contain the tokens "franck provost sevres" are pages from spam sites! (which are probably updated less frequently).

When searching for Fréquence Beauté Coiffure Rive Droite Sèvres one gets, almost exclusively, directory sites in the results, and the correct phone number is immediately available on the results page, without further clicking.

But of course, nobody is going to search like this, because nobody knows the new name of the salon (or if they do, they already have its phone number!)

What to do

Ironically, the freshness of Google's index can cause problems.

It could make sense to have access to some "historical index" ("here's how the results would have looked like last year for this query")? But it would probably be quite confusing.

Retaining changes at the item or semantic level could be helpful. The salon is the same, only the name changed. Google should try to know that, and provide the correct phone number while searching for the old name. (A hard problem, certainly).

In any case, giving so much visibility to spammers and scammers seems very wrong.

Mon, 25 Feb 2013 • permalink