Against Vendor Lock-in: How I Consume Content

There’s a lot of talk right now regarding the recent move by Apple to demand that all apps (on iOS) which offer access to premium content sold outside of the App Store, also offer it via in-app purchase, thereby giving a 30% cut to Apple.

The heart of the argument goes like this:

  • this 30% cut is reasonable for content publishers, but eats all of the margin of distributors
  • but that’s the point: to push distributors out of the platform, leaving only Apple between users and content
  • but is it fair for Apple to continue to increase its dominant position?
  • well nobody’s forcing anybody to use Apple products
  • but users are locked in! once you have all your music on iTunes, etc., moving out could prove extremely expensive: This lock-in effect is only going to become more pronounced as Apple shifts content ownership to the cloud and has users stream the movies they ‘own’ from its own servers. (from TechCrunch).

The point about users lock-in is real; content producers and distributors all want to lock users in, of course, because once the cow is in the barn they can milk it day in and day out.

I believe it’s the responsibility of users to push back lock-in effects. Here’s how I do it.

Music

All my music is in DRM-free mp3, encoded in 192 kbps. For maybe 90% of it, I ripped them from CDs; I bought a few on Amazon (still in mp3 format) and exactly 0 on iTunes.

They play on every audio player ever built; but what’s even better, with a Sonos system I’m able to stream them anywhere in my house.

Movies

Just like music, I rip movies off of DVDs: that way I get a perfect quality movie, with just the subtitles I want, and it’s also legal (downloading movies from the Internet gives you low quality, usually without subtitles or not in the language you want, and you’re breaking the law doing it).

Ripped DVDs can be watched from any device except Apple’s; for the movies I’d like to watch on an iPad I re-encode them in m4v.

Books

For now I still read books on paper, which has many advantages: you can read a paper book anywhere, the distributor cannot take it back from you after you purchased it, you can lend it to anyone you please, etc.

At some point I’d like to go digital though, but it’s unclear to me how I should stay clear of DRM. There are two approaches that might be feasible:

  • rip the books myself using a device such as this one; this would be analogous to what I do for music and films, but seems a little time consuming. It may be worth trying though (the “booksaver” is not available yet but there are other solutions, including countless DIY tutorials to build a personal book scanner)
  • buy Kindle-versions and then un-DRM them: this is probably the simplest approach but its legality is questionable?

How much does it cost

Everything is stored on a file server; the Netgear NV+ provides incredible value for under $500: 4 1To disks give you 2,7 To of fault-tolerant storage.

Before that, to just store music, I had an NSLU2 from Linksys, that lets one transform any USB disk into a network device; you can still find NSLU2 on eBay for 30-50 bucks (total setup cost: $30 for NSLU2 + $50 for a 100 Go USB disk = $80!!)

I use TwonkyMedia server software for streaming movies; it works on many devices including the NV+ and the NSLU2 (the version for NSLU2 can be hard to find but it exists); the licence costs around $20.

TL;DR

The details of my setup are not the point here; the point is that end-users should fight vendor lock-in, and that it’s feasible, simple, and cheap.

Thu, 17 Feb 2011 • permalink

How a Search Engine Works, Part 1

The principles at work behind a search engine are really simple, but, I think, seldom understood.

This is a series of posts about the subject; my goal is to keep each post really short. This first post is about principles; in the other parts we’ll build a search engine from scratch.

Simple, you say?

Google or Bing employ hundreds of the best engineers in the world and their products are far from perfect: so the problem must really be difficult?

Well, yes, the many problems Google or Bing are trying to solve are extremely complex, esp. ranking and spam-fighting.

(Ranking is useful if you’re searching a very big corpus; spam-fighting is relevant if there are people out there trying to game your results. But if you’re trying to search a closed corpus of reasonable size, then you don’t have these problems.)

I insist, the basics are simple. A search engine:

  1. indexes a group of documents for the words it contains
  2. uses this index to answer queries

What is an index?

An index is a list of words, and for each word, a list of documents containing these words. So for example if our corpus contains the following two documents:

  • doc1 “The white cat ate a mouse.”
  • doc2 “My dog is more lazy than my cat.”

the index for this corpus will be:

  • a doc1
  • ate doc1
  • cat doc1 doc2
  • dog doc2
  • is doc2
  • lazy doc2
  • mouse doc1
  • more doc2
  • my doc2
  • than doc2
  • the doc1
  • white doc1

That’s it. That’s the single most important concept of how a search engine actually works.

(We’ll come back to how to build an index further in this series).

How do queries work?

Once we have indexed our corpus, answering queries is just a matter of reading the index:

  • retrieving the list of documents for each word in the query
  • crossing those lists and returning the documents common to all lists.

Let’s see how this works for sample queries:

  • query1: “dog cat”
  • meaning of the query: give me the list of documents that contain both words “dog” and “cat”
  • how to solve the query:

    • retrieve the list of docs for “dog”: doc2
    • retrieve the list of docs for “cat”: doc1 doc2
    • compute the intersection of those two lists: doc2
    • return the result: doc2
  • query2: “big dog”

  • meaning of the query: give me the list of documents that contain both words “big” and “dog”
  • how to solve the query:
    • retrieve the list of docs for “big”: -nil-
    • return the answer: -nil-
    • there are no other steps because any intersection containing an empty list will always result in an empty list

But what about “OR”?

The astute reader will notice that in the sample queries above we always imply an “AND” operator, which means that we ask the search engine which documents contain ALL of the words in the query (“dog AND cat”). This is how modern search engines work — they work that way because it makes more sense, and it reduces noise.

But of course you could want to have a list of documents that contain ANY of the words of the query (dog OR cat). AltaVista operated that way by default: I don’t think it helped them stay in business, but let’s try do do that anyway.

All we have to do is to build a union of the lists of documents instead of an intersection; the solution to query2 would therefore be:

  • query2b: “big OR dog”
  • meaning of the query: give me the list of documents that contain either “big” or “dog”
  • how to solve the query:
    • retrieve the list of docs for “big”: -nil-
    • retrieve the list of docs for “dog”: doc2
    • compute the union of those two lists: doc2
    • return the result: doc2

And that’s it!

Quick summary: a search engine

  • indexes documents
  • (therefore, needs an index)
  • returns lists of documents from its index that match the words in the query
Tue, 15 Feb 2011 • permalink

LunaTik Ill Traducted in The Réal

Last week, at long last, I received my LunaTik: it’s a gadget that lets you wear an iPod nano as a watch.

The LunaTik is a great success story that has been told numerous times: in short, they needed $15,000 to get started, used Kickstarter and ended up with $941,718 pledged from more than 13,000 people — the most successful Kickstarter project of all time.

Before them, there were many other attempts to make a wrist band for the Nano, but none approached the same level of success; part of it is due to marketing, and esp. a brilliant video and all those blog posts, but the core of their success really lies in execution: the thing is perfect. Sturdy, and beautiful.

Perfection Alas! is not from this world, of course, and when I received the LunaTik I noticed a little problem.

The packaging displays instructions in English, French and Spanish; the French translation seems to have been made using some automated software that knows neither French nor English. Here’s an example:

  • original English sentence: “Multi-Touch Watch Band Assembly”
  • their “French” version: “L'Assemblée pour Multi-touchent le bracelet de montre”
  • their version, translated back to English: “The congress for Multi are touching the watch band”. (Yes, it’s a plural “are”, because “touchent” is plural. But the whole sentence is completely nonsensical anyway).
  • what they meant to say: “Schéma de montage du bracelet”

I sent an email to Minimal telling them exactly that, and they responded kindly that they would address the problem in the future.

I appreciate that they took the time to read and answer my email, and they’re obviously dedicated and nice people. But the damage is done: and my point is that this is simply rude. It sends the exact opposite of the message the brand is trying to convey. It says “I don’t care”. Translation is admittedly a trivial matter; but if they don’t care about it, one wonders what else they didn't care about.

The American Translators Association has put together a short document called“Translations: getting it right” (PDF) ; in contains many insights such as these:

  • In many cultures, awkward or sloppy use of the local language — especially by a native English speaker — is not amusing. It is insulting.
  • Professional translators work into their native language; if you want your catalog translated into German and Russian, the work will be done by a native German speaker and an native Russian speaker. As a translation buyer, you may not be aware of this, but a translator who flouts this basic rule is likely to be ignorant of other important quality issues as well.

In the end I’m very happy with the LunaTik and quite grateful to the Minimal people to having built it; but I would be even happier if they had cared enough not to take a shortcut on translation.

Tue, 15 Feb 2011 • permalink