Category Archives: Tech

More on TTS (or maybe it should be Moron: TTS?)


I’ve never done this before so bear with me while I figure out exactly what I want to say. I’m sure I have written bad blog entries before (this is where you say “No! Absolutely not! This was your first! Well, not that it was bad really, just not up to your usual standards. What were we talking about again?“), but I’m not especially happy with parts of the one I wrote on Thursday night. It’s not that I’ve changed my mind on how silly the NY Times article was, but my article focused on the fact that the technology to read the books is simply not there yet (as Wil Wheaton’s test proved beyond any doubt). But upon reflection, I think I missed at least some of the point of Mr. Blount’s article, so I’m going to post a semi-rebuttal to my own article.

The fact that the technology isn’t there yet isn’t really the point. I’m sure Mr. Blount would acknowledge that right now, a computer reading a book is not the same thing as a human reading it, and that right now the Kindle’s TTS (text-to-speech) feature is not a huge threat to the audio book industry. Not to put words in Mr. Blount’s mouth, but I think his issue is that in five or ten years, the technology may have advanced enough that it will be much harder to distinguish between a computer-read book and a human-read book. As hard as that is to believe, it could certainly be yet another in the long list of things that we take for granted today that would have been difficult to imagine a few years ago. If that happens, then computer-read books might pose a real threat to the audio book industry, and so he wants to head that off before it becomes a problem. I can understand that, but history is full of new inventions that were supposed to kill off entire industries and didn’t. Remember how the VCR was going to kill the movie industry? Remember how PVR’s were going to make commercials obsolete? MP3 players have reduced sales of CDs, but they haven’t killed the industry entirely. Even sharing of digital media and places like Pirate Bay haven’t killed music sales or movie revenues. There have been many movies made with completely computer-generated characters, and the animation is getting better all the time, but I don’t hear the actor’s guild advocating that filmmakers abandon the use of computers.

The Amazon people could easily change their TTS feature to only read blogs, newspapers, and magazines, and would not read books, which would solve the problem of the guy listening to the newspaper while in the car. I don’t know the numbers for sure, but I’m sure there are thousands of books available for the Kindle that are not available as an audio book, so people who want to listen to those books are screwed.

It’s possible that the best solution to this “problem” is for the audio book industry to expand their advertising and PR to make sure that people know that audio books exist and how cool they are. They need to make sure that they stress the point that they have talented actors (sometimes the authors themselves) reading the books, not just some nobody off the street. Once people are hooked on audio books, the thought of having a computer read to them will be unthinkable, regardless of how good the technology gets.

Stephen Hawking performs Wil Wheaton


Roy Blount Jr., the President of the Authors Guild, has written an article* in the NY Times about the Amazon Kindle, and its built-in text-to-speech feature. He says that this feature essentially takes money out of the hands of authors and publishers because it’s essentially turning any book you buy for your Kindle into an audio book, without paying audio book royalties. This is ridiculous beyond belief.

* I originally read the article without logging into NYTimes (since I don’t have an account there), but now when I visit that link, it says I have to log in. Don’t know why. If you don’t have a login, you can use “bugmenot555” as the user and “bugmenot” as the password. Thanks bugmenot.com.

Wil Wheaton, an author and audio book performer himself, wrote a blog post about it today, in which he attached a ten-minute audio snippet. The file contained a short portion of his latest book which he read himself, and then the same portion read by some software on his computer. Not surprisingly, there’s just no comparison. The text-to-speech software was actually more impressive than I expected. It wasn’t just words said in a monotone computer voice, it did almost sound like someone reading it aloud, complete with pauses where a comma would be found. The intonation (not sure if that’s the right word) was mostly correct, meaning that the person’s voice went down at the end of a sentence, and things like that. There was even the sound of someone taking a breath at the beginning of sentences. But it was still obviously a computer voice.

Wheaton’s reading was just so much more expressive. In some cases, there were pauses missing in the computer version, and even though there was no way for the computer to know that there was supposed to be a pause there because there’s no punctuation, the way the sentence or paragraph is written makes it obvious to a human reader. The one part where Wil talks semi-sarcastically about a Walkman being something like a iPod “that used these things called “cassette tapes”” — to a human reader, it’s obvious that that sentence should be read in a slightly different tone than the surrounding sentences, but there’s no way to encode that in the text passed to the software. You just gotta know.

While reading the Times article, my first thought was “I guess I shouldn’t be reading to my kids at night”, and Blount indeed addresses this at the end of his piece:

For the record: no, the Authors Guild does not expect royalties from anybody doing non-commercial performances of “Goodnight Moon.” If parents want to send their children off to bed with the voice of Kindle 2, however, it’s another matter.

Why is it another matter? If I’m reading to my kids, I’m being as expressive as possible. If someone in the book is happy, I try to sound happy. If someone is unhappy, I try to sound unhappy. I even sometimes try accents (though that got old really quick during the first Harry Potter book, when we realized that almost every character would have an English accent. I always did it for Hagrid though). So my “performance” would be a lot closer to the one you might get if you bought an audio book than the one the Kindle would give you. Wouldn’t that be more “threatening” to the audio book publishers?

The other obvious point that Mr. Blount missed is that the Kindle can read any text that it has. Once the Authors Guild provides an audio recording of every book available through the Kindle, plus the daily newspapers (New York Times, Wall Street Journal), weekly magazines (Time, Newsweek), and over 1000 blogs, in real time, then maybe Amazon will remove this feature. Until then, I cannot believe that authors really feel threatened by this. People who like audio books are not going to stop buying them because they can get their Kindle to read them. It’s just not the same. The people using this feature are people who might want to listen to the newspaper during their morning commute or while riding the stationary bike at the gym. I imagine this would be a great feature for the blind (though as Blount points out, using the on-screen controls would be impossible for the blind anyway).

I’ve bought a couple of audio books, and they’re OK. (I joined audible.com for a while (got a free book because I listen to TWiT, and then bought another), but I quit it because the way their accounts work, you have to buy a book every month. If I could be a member and then just buy books whenever I wanted to, I might do that.) Either way, the computer voice is just not real enough for me to listen to a computer read me stuff for any length of time. I think the technology still has a long way to go before it’s even going to be remotely comparable.

Dumb question of the day


I am investigating a couple of compression libraries, and comparing both their compression and speed. zlib is what we use now, and is moderately fast and gives pretty good compression. fastlz is blindingly fast, but doesn’t compress quite as well, and the code isn’t “stable”. lzma is one that I just started looking at, and my initial tests were abysmal. It gave the best compression ratio, but compression took over sixty times as long as fastlz (0.27 seconds vs. 16.47 seconds for the same 11 MB file). I posted a question on their forum asking what I was doing wrong, and got this reply (emphasis mine):

What do you compress and why do you need it faster?

Excuse me? Your algorithm runs an order of magnitude slower than the others I’m looking at, and you are seriously asking why I need it faster? To his credit, the suggestion he gave me did speed it up so that it was about 3.2 seconds; still the slowest of the bunch, but at least it’s now acceptable. And it did still have the best compression ratio. I’m just stunned that any software guy would ask such a question.

The New OS from Micro-apple


I recently started using sitemeter.com to measure stats on my blog — which pages are read, where people came from (the vast majority from google searches), how long they stay, stuff like that. Sitemeter can also give you information on each visit, like what kind of browser was being used, what OS, even what screen resolution. But this one puzzled me:

What exactly is “Macintosh WinXP”?

Ship it now, test it later


There’s a question on StackOverflow entitled “What real life bad habits has programming given you?”, which is quite hilarious for programmers. Answers include things like thinking 256 is a nice round number, wanting to use Ctrl-F on an actual book, or starting to count items at 0 and ending up with one less than everyone else.

This may seem unrelated, but bear with me. Shortly after Ryan was first born, I decided that children, particularly babies, were badly designed:

  • babies need to eat, but don’t know how right away, and frequently spit up what they’ve already eaten or refuse to eat. It takes years before a child can even make himself a bowl of cereal.
  • babies need to sleep, but getting them to go to sleep (or stay asleep) can be challenging. When Ryan was a baby, he wouldn’t go to sleep by himself; we had to walk with him until he fell asleep in our arms and then gently put him in his crib. If he wasn’t sufficiently asleep (read: unconscious), he’d wake up and you’d have to start all over. Sometimes we’d have to walk with him for 45 minutes before we could go back to sleep ourselves.
  • babies can’t roll over for a few months after they’re born, can’t crawl until six months, and can’t walk for the better part of a year. Baby deer are walking within minutes of birth.
  • children are self-centred. They have tantrums when things don’t go the way they want, even if the circumstances are beyond anyone’s control, or if getting their way would inconvenience or even hurt others. Older kids have been known to give their parents attitude (and I’m one of the lucky parents whose children have reached that stage), and teenagers sometimes take “attitude” to a whole new level.
  • some babies in the animal kingdom are on their own from the moment they are born. Others are under the care of their parents for a few years. Human children frequently live with their parents for twenty years (sometimes more), or about 25% of the average human lifespan.

Despite these challenges, parents continue to love and nurture their children, so obviously parents are generally better designed than children. However, children turn into parents without having been “re-designed”, so it occurs to me that the real problem is not with design, which means it must be implementation. Obviously babies are born before they’re really ready — before all the bugs have been worked out, before things have been streamlined and optimized.

The real problem is that babies are shipped while still in early beta.

Buzzword overload


There’s a question on StackOverflow about the best source code comments people have seen or written. My favourite answer is this one, which doesn’t require any programming knowledge to understand, since it’s impossible to understand anyway. I found it quite hysterical:

/**
This method leverages collective synergy to drive "outside of the box" thinking and formulate key objectives into a win-win game plan with a quality-driven approach that focuses on empowering key players to drive-up their core competencies and increase expectations with an all-around initiative to drive down the bottom-line. I really wanted to work the word "mandrolic" in there, but that word always makes me want to punch myself in the face.
*/

Nothing about “solutioning”, though; perhaps it’s out of date. I’d never heard the word “mandrolic” before, so perhaps I’m out of date.

Gail has on occasion actually used some of these words, particularly “synergy” and “solutioning”, and doesn’t understand why I laugh every time. I have an idea what it “means”, or what it’s supposed to mean, but to me, “synergy” is just the quintessential buzzword that doesn’t actually mean anything.

Failure is not an option


Our camera stores pictures on a compact flash memory card. The other day when changing the card, Gail managed to bend a pin inside the camera, so it wouldn’t recognize any card. We took it into a camera repair shop yesterday, and it’s going to cost us $200 to get fixed. The repair guy said that Gail likely tried to put the card in sideways or backwards or something and that it’s not that uncommon. For a fairly expensive piece of equipment, this seems like a blatant design flaw. If a card should only go in one way, why can’t they design them so that it’s physically impossible to put it in wrong? Make it so that it’s impossible to screw it up. Failure should not be an option.

We had the same problem with an old wireless PCMCIA card. We had a PCI card in the computer, and that card had a slot that the PCMCIA card could slip into. But it was entirely possible to put the card in the wrong way, in which case it simply wouldn’t work. (Luckily it didn’t damage the card.) Unfortunately, it wasn’t plug and play, so you had to shut the computer down, put the card in, and boot it up again. If you got it backwards, you’d have to shut the computer down again, reverse the card, then boot it back up.

The designers of the SD card that’s in my kids’ $89 cameras seemed to get it right:

  1. Make it a rectangle that’s longer than it is wide, so you can’t put it in sideways
  2. Put a notch in one corner so that if you put it in backwards, the notch makes it not fit before the card gets to the pins.

Update: I wrote the above before talking to the camera guy. Turns out it is impossible to put the card in backwards or upside down, but it is not impossible to put it in sideways. If they had made it “portrait instead of landscape” as the camera guy said, this possibility would have been removed as well.

The might know science, but they don’t know web programming


While trying to buy tickets online for the Ontario Science Centre, I saw this:

Seriously, how hard is it to accept a postal code without a space and add it yourself?

Later on in the transaction, I hit an SSL error because the certificate was valid for “www.ontariosciencecentre.ca” but I happened to type “ontariosciencecentre.ca” into my browser, and none of the links after that redirected me to “www.”. If their certificate relies on the “www.” prefix, then their web server should be redirecting me.

They might know everything there is to know about science (and they do, I’ve loved going to the Science Centre since I was a kid), but their webmaster has a few things to learn.

Update: As you can see in the comments, Ken Huxley from the Science Centre has fixed the postal code problem and is working on fixing the SSL problem as well. Kudos to him, and my apologies for my condescending “has a few things to learn” comment above (not to mention the title of the post). When he mentioned that he was going to reconfigure DNS to fix the SSL problem, I realized that I understand at a high level what he’s going to do, but I have no idea how to actually do it. I guess I have a few things to learn as well. But if I hadn’t whined written about the problems I found, they wouldn’t have gotten fixed, so it’s nice to know that my blog has made the world a better place.

That was random


One thing that computers are really bad at is coming up with random numbers. You ask the computer to do something specific and tell it exactly how, and it will do it exactly right every time. But ask it to come up with a random number, and it will ask “how?” Well, there is no “how” for that question, you just pick a number at random. A human can do it better, but even we’re not that good at it. If you ask a million people to each pick a random number between 1 and 1000, the number of people who choose numbers less than 10 or greater than 990 should be around 2% but is likely to be much less. And who’s going to pick 500? Or 666? That’s not very random, right?

In recent years, some computers have been built with real random number generators; these use things like thermal noise generated by the processor itself to come up with truly random numbers. In the majority of cases, though, this isn’t available, or at least it may not be so you have to assume it isn’t. You are then forced to use a pseudo-random number generator (PRNG), which gives you a sequence of numbers that appear to be random, but are actually completely reproducible if you “seed” the generator with the same value each time. This is helpful for us programmers when trying to reproduce a problem, but in general you want your pseudo-random numbers to be closer to real randomness, so you need to seed the PRNG with a different value each time. Coming up with enough entropy in your PRNG seed can be a difficult problem.

Many programs choose the number of seconds since midnight January 1, 1970, since it’s a fairly easy number to obtain in the C language (for historical reasons that I am not going to go into here, mainly because I have no idea what they are). However, if you have multiple programs starting at the same time, they can end up using the same seed and therefore the same sequence of pseudo-random numbers, which can be a serious security hole. So some programs go much further in picking a seed — combining the current time with the process ID or some other piece of data that changes frequently in an unpredictable way. I have heard of programs that ask the user to type a sentence, and calculate entropy by analyzing the typing pattern of the user — the average number of milliseconds between keystrokes and stuff like that.

The end result is that programs sometimes have to go to a lot of trouble to come up with a seed for the PRNG that has sufficient entropy. Essentially, you need to come up with a good pseudo-random number in order to generate pseudo-random numbers.