Posts Tagged With: Usability

Phone Number for SFMTA Temporary Sign Office

Posted on March 11, 2019

The phone number for the SFMTA Temporary Sign Office is very difficult to find. The SFMTA Temporary Sign web page directs you to 311. 311 does not know the right procedures for the Temporary Sign Office.

The email address on the website is also slow to get back to requests. The Temporary Sign department address listed on the website, at 1508 Bancroft Avenue, is not open to the public — it's just a locked door.

To contact the Temporary Sign Office, call 415-550-2716. This is the direct line to the department. I reached someone in under a minute.

If your event is more than 90 days in the future, don't expect an update. They don't start processing signage applications until 90 days before the event.

Here's a photo of my large son outside of the SFMTA Temporary Sign Office, where I did not find anyone to speak with, but I found the phone number that got me the right phone number to get someone to give me an update on my application.

Liked what you read? I am available for hire.

Don’t Use Sails (or Waterline)

Posted on September 6, 2015

The Shyp API currently runs on top of the Sails JS framework. It's an extremely popular framework - the project has over 11,000 stars on Github, and it's one of the top 100 most popular projects on the site. However, we've had a very poor experience with it, and with Waterline, the ORM that runs underneath it. Remember when you learned that java.net.URL does a DNS lookup to check whether a URL is equivalent to another URL? Imagine finding an issue like that every two weeks or so and that's the feeling I get using Sails and Waterline.

The project's maintainers are very nice, and considerate - we have our disagreements about the best way to build software, but they're generally responsive. It's also clear from the error handling in the project (at least the getting started docs) that they've thought a lot about the first run experience, and helping people figure out the answers to the problems they encounter trying to run Sails.

That said, here are some of the things we've struggled with:

The sailsjs.org website broke all incoming Google links ("sails views", "sails models", etc), as well as about 60,000 of its own autogenerated links, for almost a year. Rachael Shaw has been doing great work to fix them again, but it was pretty frustrating that documentation was so hard to find for that long.
POST and PUT requests that upload JSON or form-urlencoded data sleep for 50ms in the request parser. This sleep occupied about 30% of the request time on our servers, and 50-70% of the time in controller tests.
The community of people who use Sails doesn't seem to care much about performance or correctness. The above error was present for at least a year and not one person wondered why simple POST requests take 50ms longer than a simple GET. For a lot of the issues above and below it seems like we are the only people who have ran into them, or care.
By default Sails generates a route for every function you define in a controller, whether it's meant to be public or not. This is a huge security risk, because you generally don't think to write policies for these implicitly-created routes, so it's really easy to bypass any authentication rules you have set up and hit a controller directly.

Blueprints are Sails's solution for a CRUD app and we've observed a lot of unexpected behavior with them. For one example, passing an unknown column name as the key parameter in a GET request (?foo=bar) will cause the server to return a 500.
If you want to test the queries in a single model, there's no way to do it besides loading/lifting the entire application, which is dog slow - on our normal sized application, it takes at least 7 seconds to begin running a single test.
Usually when I raise an issue on a project, the response is that there's some newer, better thing being worked on, that I should use instead. I appreciate that, but porting an application has lots of downside and little upside. I also worry about the support and correctness of the existing tools that are currently in wide use.
Hardcoded typos in command arguments.
No documented responsible disclosure policy, or information on how security vulnerabilities are handled.

Waterline

Waterline is the ORM that powers Sails. The goal of Waterline is to provide the same query interface for any database that you would like to use. Unfortunately, this means that the supported feature set is the least common denominator of every supported database. We use Postgres, and by default this means we can't get a lot of leverage out of it.

These issues are going to be Postgres oriented, because that's the database we use. Some of these have since been fixed, but almost all of them (apart from the data corruption issues) have bit us at one point or another.*

No support for transactions. We had to write our own transaction interface completely separate from the ORM.
No support for custom Postgres types (bigint, bytea, array). If you set a column to type 'array' in Waterline, it creates a text field in the database and serializes the array by calling JSON.stringify.
If you define a column to be type 'integer', Waterline will reject things that don't look like integers... except for Mongo IDs, which look like "4cdfb11e1f3c000000007822". Waterline will pass these through to the database no matter what backend data store you are using.
Waterline offers a batch interface for creating items, e.g. Users.create([user1, user2]). Under the hood, however, creating N items issues N insert requests for one record each, instead of one large request. 29 out of 30 times, the results will come back in order, but there used to be a race where sometimes create will return results in a different order than the order you inserted them. This caused a lot of intermittent, hard-to-parse failures in our tests until we figured out what was going on.
Waterline queries are case insensitive; that is, Users.find().where(name: 'FOO') will turn into SELECT * FROM users WHERE name = LOWER('FOO');. There's no way to turn this off. If you ask Sails to generate an index for you, it will place the index on the uppercased column name, so your queries will miss it. If you generate the index yourself, you pretty much have to use the lowercased column value & force every other query against the database to use that as well.
The .count function used to work by pulling the entire table into memory and checking the length of the resulting array.
No way to split out queries to send writes to a primary and reads to a replica. No support for canceling in-flight queries or setting a timeout on them.
The test suite is shared by every backend adapter; this makes it impossible for the Waterline team to write tests for database-specific behavior or failure handling (unique indexes, custom types, check constraints, etc). Any behavior specific to your database is poorly tested at best.
"Waterline truncated a JOIN table". There are probably more issues in this vein, but we excised all .populate, .associate, and .destroy calls from our codebase soon after this, to reduce the risk of data loss.
When Postgres raises a uniqueness or constraint violation, the resulting error handling is very poor. Waterline used to throw an object instead of an Error instance, which means that Mocha would not print anything about the error unless you called console.log(new Error(err)); to turn it into an Error. (It's since been fixed in Waterline, and I submitted a patch to Mocha to fix this behavior, but we stared at empty stack traces for at least six months before that). Waterline attempts to use regex matching to determine whether the error returned by Postgres is a uniqueness constraint violation, but the regex fails to match other types of constraint failures like NOT NULL errors or partial unique indexes.
The error messages returned by validation failures are only appropriate to display if the UI can handle newlines and bullet points. Parsing the error message to display any other scenario is very hard; we try really hard to dig the underlying pg error object out and use that instead. Mostly nowadays we've been creating new database access interfaces that wrap the Waterline model instances and handle errors appropriately.

Conclusion

I appreciate the hard work put in by the Sails/Waterline team and contributors, and it seems like they're really interested in fixing a lot of the issues above. I think it's just really hard to be an expert in sixteen different database technologies, and write a good library that works with all of them, especially when you're not using a given database day in and day out.

You can build an app that's reliable and performant on top of Sails and Waterline - we think ours is, at least. You just have to be really careful, avoid the dangerous parts mentioned above, and verify at every step that the ORM and the router are doing what you think they are doing.

The sad part is that in 2015, you have so many options for building a reliable service, that let you write code securely and reliably and can scale to handle large numbers of open connections with low latency. Using a framework and an ORM doesn't mean you need to enter a world of pain. You don't need to constantly battle your framework, or worry whether your ORM is going to delete your data, or it's generating the correct query based on your code. Better options are out there! Here are some of the more reliable options I know about.

Instagram used Django well through its $1 billion dollar acquisition. Amazing community and documentation, and the project is incredibly stable.
You can use Dropwizard from either Java or Scala, and I know from experience that it can easily handle hundreds/thousands of open connections with incredibly low latency.
Hell, the Go standard library has a lot of reliable, multi-threaded, low latency tools for doing database reads/writes and server handling. The third party libraries are generally excellent.

I'm not amazingly familiar with backend Javascript - this is the only server framework I've used - but if I had to use Javascript I would check out whatever the Walmart and the Netflix people are using to write Node, since they need to care a lot about performance and correctness.

_{*: If you are using a database without traditional support for
transactions and constraints, like Mongo, correct behavior is going to be very
difficult to verify. I wish you the best of luck.}

Liked what you read? I am available for hire.

Node’s `require` is dog slow

Posted on April 19, 2015

Our test environment takes 6-9 seconds to load before any tests get run. I tire of this during the ~30 times I run the test suite a day,¹ so I wanted to make it faster.

For better or worse, the API runs on Sails.js. Before running model/controller tests, a bootstrap file in our tests calls sails.lift.

require('sails').lift(function(err) {
    // Run the tests
});

This lift call generates about 400 queries against Postgres to retrieve database schema, that each look like this:

SELECT x.nspname || '.' || x.relname as "Table", x.attnum
as "#", x.attname as "Column", x."Type", case x.attnotnull
when true then 'NOT NULL' else '' end as "NULL",
r.conname as "Constraint", r.contype as "C", r.consrc,
fn.nspname || '.' || f.relname as "F Key", d.adsrc as "Default"
FROM (SELECT c.oid, a.attrelid, a.attnum, n.nspname, c.relname,
a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod) as "Type",
a.attnotnull FROM pg_catalog.pg_attribute a, pg_namespace n,
pg_class c WHERE a.attnum > 0 AND NOT a.attisdropped
AND a.attrelid = c.oid and c.relkind not in ('S','v')
and c.relnamespace = n.oid and n.nspname
not in ('pg_catalog','pg_toast','information_schema')) x
left join pg_attrdef d on d.adrelid = x.attrelid
and d.adnum = x.attnum left join pg_constraint r on r.conrelid = x.oid
and r.conkey[1] = x.attnum left join pg_class f on r.confrelid = f.oid
left join pg_namespace fn on f.relnamespace = fn.oid
where x.relname = 'credits' order by 1,2;

I'm not really sure what those queries do, since Sails seems to ignore the schema that's already in the database when generating table queries. Since we use the smallest, safest subset of the ORM we can find, I tried commenting out the sails-postgresql module code that makes those queries to see if the tests would still pass, and the tests did pass... but the load time was still slow.

The next step was to instrument the code to figure out what was taking so long. I wanted to have the Sails loader log the duration of each part of the load process, but this would have required a global variable, and a whole bunch of calls to console.log. It turns out the Unix function ts can do this for you, if log lines are generated at the appropriate times. Basically, it's an instant awesome tool for generating insight into a program's runtime, without needing to generate timestamps in the underlying tool.

NAME
       ts - timestamp input

SYNOPSIS
       ts [-r] [-i | -s] [format]

DESCRIPTION
       ts adds a timestamp to the beginning of each line of input.

I set the Sails logging level to verbose and piped the output through ts '[%Y-%m-%d %H:%M:%.S]'. I pretty quickly found a culprit..

[2015-04-19 21:53:45.730679] verbose: request hook loaded successfully.
[2015-04-19 21:53:45.731032] verbose: Loading the app's models and adapters...
[2015-04-19 21:53:45.731095] verbose: Loading app models...
[2015-04-19 21:53:47.928104] verbose: Loading app adapters...
[2015-04-19 21:53:47.929343] verbose: Loading blueprint middleware...

That's a 2 second gap between loading the models and loading the adapters.

I started adding profiling to the code near the "Loading app models..." line. I expected to see that attaching custom functions (findOne, update, etc) to the Sails models was the part that took so long. Instead I found out that a module called include-all accounted for almost all of the 2.5 seconds. That module simply requires every file in the models directory, about 30 files.

Further reading/logging revealed that each require call was being generated in turn. I've found it, I thought, just use different CPU's to require them all at the same time, and see a speedup. Unfortunately, the require operation is synchronous in Node, and runs on one CPU, so the process can still only perform one require at a time.

I tried just loading one model to see how slow that would be. This script took an average of 700 milliseconds to run, on my high end Macbook Pro:

var start = Date.now();
require('api/models/Drivers');
console.log(Date.now() - start);

700 milliseconds to require a model file, and that file's dependencies! I can send a packet to and from New York 8 times in that amount of time. What the hell is it actually doing? For this I turned to good old strace (or dtruss, as it's ported on Macs). First start up a shell to record syscalls for any process that is called node.

# (You'll want to kill any other node processes before running this.)
sudo dtruss -d -n 'node' > /tmp/require.log 2>&1

Open up another shell session and run your little script that calls require and prints the startup time and then exits. You should have a few thousand lines in a file called /tmp/require.log. Here's what I found near the start:

 1186335 stat64("/Users/burke/code/api/api/models/node_modules/async\0", 0x7FFF5FBFE608, 0x9) 1186382 stat64("/Users/burke/code/api/api/models/node_modules/async.js\0", 0x7FFF5FBFE5B8, 0x9) 1186405 stat64("/Users/burke/code/api/api/models/node_modules/async.json\0", 0x7FFF5FBFE5B8, 0x9) 1186423 stat64("/Users/burke/code/api/api/models/node_modules/async.node\0", 0x7FFF5FBFE5B8, 0x9) 1186438 stat64("/Users/burke/code/api/api/models/node_modules/async.coffee\0", 0x7FFF5FBFE5B8, 0x9) 1186473 open("/Users/burke/code/api/api/models/node_modules/async/package.json\0", 0x0, 0x1B6) 1186501 stat64("/Users/burke/code/api/api/models/node_modules/async/index.js\0", 0x7FFF5FBFE5B8, 0x1B6) 1186519 stat64("/Users/burke/code/api/api/models/node_modules/async/index.json\0", 0x7FFF5FBFE5B8, 0x1B6) 1186534 stat64("/Users/burke/code/api/api/models/node_modules/async/index.node\0", 0x7FFF5FBFE5B8, 0x1B6) 1186554 stat64("/Users/burke/code/api/api/models/node_modules/async/index.coffee\0", 1186580 stat64("/Users/burke/code/api/api/node_modules/async\0", 0x7FFF5FBFE608, 0x1B6) 1186598 stat64("/Users/burke/code/api/api/node_modules/async.js\0", 0x7FFF5FBFE5B8, 0x1B6) 1186614 stat64("/Users/burke/code/api/api/node_modules/async.json\0", 0x7FFF5FBFE5B8, 0x1B6) 1186630 stat64("/Users/burke/code/api/api/node_modules/async.node\0", 0x7FFF5FBFE5B8, 0x1B6) 1186645 stat64("/Users/burke/code/api/api/node_modules/async.coffee\0", 0x7FFF5FBFE5B8, 0x1B6) 1186670 open("/Users/burke/code/api/api/node_modules/async/package.json\0", 0x0, 0x1B6) 1186694 stat64("/Users/burke/code/api/api/node_modules/async/index.js\0", 0x7FFF5FBFE5B8, 0x1B6) 1186712 stat64("/Users/burke/code/api/api/node_modules/async/index.json\0", 0x7FFF5FBFE5B8, 0x1B6) 1186727 stat64("/Users/burke/code/api/api/node_modules/async/index.node\0", 0x7FFF5FBFE5B8, 0x1B6) 1186742 stat64("/Users/burke/code/api/api/node_modules/async/index.coffee\0", 0x7FFF5FBFE5B8, 0x1B6) 1186901 stat64("/Users/burke/code/api/node_modules/async\0", 0x7FFF5FBFE608, 0x1B6) 1186963 stat64("/Users/burke/code/api/node_modules/async.js\0", 0x7FFF5FBFE5B8, 0x1B6) 1187024 stat64("/Users/burke/code/api/node_modules/async.json\0", 0x7FFF5FBFE5B8, 0x1B6) 1187050 stat64("/Users/burke/code/api/node_modules/async.node\0", 0x7FFF5FBFE5B8, 0x1B6) 1187074 stat64("/Users/burke/code/api/node_modules/async.coffee\0", 0x7FFF5FBFE5B8, 0x1B6) 1656 __semwait_signal(0x10F, 0x0, 0x1)                = -1 Err#60 1187215 open("/Users/burke/code/api/node_modules/async/package.json\0"0x0, 0x1B6)

= -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 0x7FFF5FBFE5B8, 0x1B6) = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = 0 0 = -1 Err#2 = -1 Err#2 = -1 Err#2 = -1 Err#2 = 11 0

That's a lot of wasted open()'s, and that's just to find one dependency.² To load the single model, node had to open and read 300 different files,³ and every require which wasn't a relative dependency did the same find-the-node-module-folder dance. The documentation seems to indicate this is desired behavior, and it doesn't seem like there is any way around this.

Now, failed stat's are not the entire reason require is slow, but if I am reading the timestamps correctly in the strace log, they are a fair amount of it, and the most obviously wasteful thing. I could rewrite every require to be relative, e.g. require('../../node_modules/async') but that seems cumbersome and wasteful, when I can define the exact rule I want before hand: if it's not a relative file path, look in node_modules in the top level of my project.

So that's where we are; require for a single model takes 700 milliseconds, require for all the models takes 2.5 seconds, and there don't seem to be great options for speeding that up. There's some discouraging discussion here from core developers about the possibility of speeding up module import.

You are probably saying "load fewer dependencies", and you are right and I would love to, but that is not an option at this point in time, since "dependencies" basically equals Sails, and while we are trying to move off of Sails, we're stuck on it for the time being. Where I can, I write tests that work with objects in memory and don't touch our models/controllers, but Destroy All Software videos only get you so far.

I will definitely think harder about importing another small module vs. just copying the code/license wholesale into my codebase, and I'll definitely look to add something that can warn about unused imports or variables in the code base.

I'm left concluding that importing modules in Node is dog slow. Yes, 300 imports is a lot, but a 700ms load time seems way too high. Python imports can be slow, but as far as I remember, it's mostly for things that compile C on the fly, and for everything else you can work around it by rearranging sys.path to match the most imports first (impossible here). If there's anything I can do - some kind of compile option, or saving the V8 bytecode or something, I'm open to suggestions.

^{1. If you follow along with the somewhat-crazy convention of crashing
your Node process on an error and having a supervisor restart it, that means
that your server takes 6-9 seconds of downtime as well.}

^{2. The story gets even worse if you are writing Coffeescript, since
require will also look for files matching .litcoffee and .coffee.md at
every level. You can hack require.extensions to delete these keys.}

^{3. For unknown reasons, node occasionally decided not to stat/open
some imports that were require'd in the file.}

Liked what you read? I am available for hire.

“Invalid Username or Password”: a useless security measure

Posted on December 1, 2014

"Invalid Username or Password": a useless security measure

Login attempts fail because computer users can't remember their email or didn't input the right password. Most websites on the Internet won't tell you which one is actually incorrect.

Amazon:

Amazon

Shoprunner:

Shoprunner

Hacker News:

If you tell an attacker the email address is wrong, they'll try a different one. If you tell them the password is wrong, then an attacker knows that the username is correct, and can go on to try a bunch of passwords for that username until they hit the right one. So sites won't tell you which one is wrong, to try and avoid the information disclosure.

Unfortunately this assumes that there's no other way for an attacker to discover whether a username/email address is registered for a service. This assumption is incorrect.

99.9% of websites on the Internet will only let you create one account for each email address. So if you want to see if an email address has an account, try signing up for a new account with the same email address.

Here are all of the websites above, confirming that an account exists with my email address/username:

Amazon:

Amazon

Shoprunner:

Shoprunner

Hacker News:

So what we've done by promoting "Invalid username or password" is made our login form UX much, much worse, without increasing the security of our product.

If people don't log in to your site every day (every site on the web except Facebook or Google), not remembering credentials is a huge barrier to accessing your site. Don't make it harder by adding a vague error message that doesn't increase your site's security at all.

But there's a tradeoff there between security and UX, I hear you say. I am trying to show you there is no tradeoff, as presented above; you are choosing between a better user experience and a worse user experience.

What should I do instead?

Here is an actual UX/security tradeoff: you can make the signup process email based. When someone attempts to sign up with an email address, you send them an email to complete the registration process. If they don't control the email inbox, they can't see whether the email address has an account already. This is much more arduous and requires two context switches (go into your email, avoid distraction, wait for email to arrive, click link in email, remember what you were doing on site). I don't recommend this, because of the context switches, though you can implement it.

Otherwise, accept that your login page and your signup pages are targets for malicious behavior, and design appropriately.

Rate limiting can go a fair way to preventing brute force attacks. To find email addresses, an attacker is going to need to try a lot of email addresses and/or a lot of passwords, and get a lot of them wrong. Consider throttling invalid login attempts by IP address or subnet. Check submitted passwords against a dictionary of common passwords (123456, monkey, etc) and ban that traffic extra hard. Exponential backoff (forcing attackers to try again after 1, 2, 4, 8, 16.. seconds) is useful.
Give guidance to users about creating strong passwords. Allow easy integration with LastPass or 1Password.
Add a 2-factor auth option to your website. Encourage users to use it.
Warn users about malicious behavior ("someone is trying to snoop your password") and contact them about suspicious logins.

Liked what you read? I am available for hire.

Nix: caveat emptor

Posted on July 26, 2014

I'm setting up a new website, which gave me an excuse to try out Nix, the stateless package manager, and Docker, the tool that lets you run all of your apps in light-weight containers on a host.

Nix may be a great tool, and help you avoid the possibility of moving parts in your builds, but there are still a lot of edges to smooth out, just simple stuff to make it easier to figure out what's going on and build things.

Installation

Installation was very easy! There are 2 choices - install an entire operating system, or install just the package manager. You can't really deploy the operating system to anything besides AWS, so I chose the package manager.

You install it with

bash <(curl https://nixos.org/nix/install)

No wgetting sources, configuring installation directories, "Permission Denied" errors, it just works and this is one of the main reasons I like this format even if everyone yells all the time that it's a security risk. It is, but there are no better ways to install things at the command line, currently.

Trying it out

Here was where things started to go wrong. The obvious name of the command for a tool named nix is nix, right?

$ nix
zsh: command not found: nix

$ find /  -name nix -executable -type f
No matches found

Okay, this is annoying, back into the docs to see what actually got installed onto my system. From the install page, there's a link to read more about Nix, but no link to a quickstart, or "Try it out", or anything like that. I click "Help" and get this sentence:

This makes me think I am going to need to parse a PDF to find the information I need.

The manual

I open the very large Nix manual. Missed the section that said "Quick Start" while scanning the Table of Contents, maybe because it came before the chapter on "Installation". Instead I start reading "Basic package management". The command I am looking for is nix-env and finally there is something I can type into a shell and run, I'm not quite sure what this does, but this way I can at least verify that it was installed properly:

However, I don't get the same list of packages.

[nix@gazelle ~]$ nix-env -qaf nixpkgs-version '*'
error: getting information about `/home/nix/nixpkgs-version': No such file or directory

This is frustrating, and the note ("nixpkgs-version is where you've unpacked the release") is not very helpful, as nix handled the installation for me, and I don't know where the release is unpacked.

At this point I abandon the manual and Google around for anyone who's tried installing Nix. I find a nice tutorial explaining how to install Nix, search for packages and install them. Problem solved, and a good reminder that documentation should be designed for people that don't read.

Note, my DigitalOcean box with 512MB of memory was not enough to run Nix; I got a "error: unable to fork: Cannot allocate memory" when I tried starting the program, and had to add a 256MB swapfile.

Seeing what else I can do

Normally when I download a new tool I'll pull up the help menu to see all of the things that are possible with the command. For example, if you type python -h, you get:

usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-B     : don't write .py[co] files on import; also PYTHONDONTWRITEBYTECODE=x
-c cmd : program passed in as string (terminates option list)
-d     : debug output from parser; also PYTHONDEBUG=x
-E     : ignore PYTHON* environment variables (such as PYTHONPATH)
-h     : print this help message and exit (also --help)
-i     : inspect interactively after running script; forces a prompt even
         if stdin does not appear to be a terminal; also PYTHONINSPECT=x

Nix doesn't provide anything for "-h", and typing "--help" pulls up the man page, which has the information I want but is pretty heavy weight. Also, with a new user running Bash, the man page came up without the ANSI escape sequences getting escaped. Haven't figured out whether this is my problem or Nix's.

The existence of an extraordinarily large footgun

One time I typed nix-env --install and hit Enter without specifying a package. Nix was a second away from trying to install literally every single package it has, over 5000 of them. This seems like something that no one would want to do, yet it's currently extremely easy to do by accident.

The most frustrating problem of the day

Soon after this, lots of network operations began failing with the cryptic error message 20: unable to get local issuer certificate. The answers on StackOverflow and curl.haxx.se suggest that this is due to a certificate not being there. I was very confused, because there was a certificate in /etc/ssl/certs, other SSL operations were working just fine, and the debug output from curl at the command line indicated it was using the certificate bundle.

It finally took an strace command to see that the network requests were not actually looking in /etc/ssl/certs for the certificate, but somewhere deep in the /nix directory. Setting GIT_SSL_CAINFO=/etc/ssl/certs/ca-bundle.crt in the environment fixed the issue. Once I figured this out, I found people complaining about this problem all over the place.

This means the default installation of git and curl will certainly break git clone for everyone, and really should ship with certificates, or at least a warning when you download the package that you need to get an up-to-date certificate store from somewhere.

Conclusion

There is a stateless package manager, and it can download packages and all of their dependencies. That's really cool, but for the moment there are quite a few usability problems that make this really hard for people to get started with.

Liked what you read? I am available for hire.

How to create rich links in your Sphinx documentation

Posted on December 28, 2013

This will be short, but it seems there's some difficulty doing this, so I thought I'd share.

The gist is, any time you reference a class or method in your own library, in the Python standard library, or in another third-party extension, you can provide a link directly to that project's documentation. This is pretty amazing and only requires a little bit of extra work from you. Here's how.

The Simplest Type of Link

Just create a link using the full import path of the class or attribute or method. Surround it with backticks like this:

Use :meth:`requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use requests.Request.get to make HTTP Get requests.

There are a few different types of declarations you can use at the beginning of that phrase:

:attr:
:class:
:meth:
:exc:

The full list is here.

I Don't Want to Link the Whole Thing

To specify just the method/attribute name and not any of the modules or classes that precede it, use a squiggly, like this:

Use :meth:`~requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use get to make HTTP Get requests.

I Want to Write My Own Text

This gets a little trickier, but still doable:

Use :meth:`the get() method <requests.Request.get>` to make HTTP Get requests.

That link will show up in text as:

Use the get() method to make HTTP Get requests.

I want to link to someone else's docs

In your docs/conf.py file, add 'sphinx.ext.intersphinx' to the end of the extensions list near the top of the file. Then, add the following anywhere in the file:

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.intersphinx',
    ]
    # Add mappings
    intersphinx_mapping = {
        'urllib3': ('http://urllib3.readthedocs.org/en/latest', None),
        'python': ('http://docs.python.org/3', None),
    }

You can then link to other projects' documentation and then reference it the same way you do your own projects, and Sphinx will magically make everything work.

I want to write the documentation inline in my source code and link to it

Great! I love this as well. Add the 'sphinx.ext.autodoc' extension, then write your documentation. There's a full guide to the inline syntax on the Sphinx website; confusingly, it is not listed on the autodoc page.

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.autodoc',
    ]

Hope that helps! Happy linking.

Liked what you read? I am available for hire.

Submit forms using Javascript without breaking the Internet, a short guide

Posted on February 16, 2013

Do you write forms on the Internet? Are you planning to send them to your server with Javascript? You should read this.

The One-Sentence Summary

It's okay to submit forms with Javascript. Just don't break the internet.

What Do You Mean, Break the Internet?

Your browser is an advanced piece of software that functions in a specific way, often for very good reasons. Ignore these reasons and annoy your users. User annoyance translates into lower revenue for you.

Here are some of the ways your Javascript form submit can break the Internet.

Submitting to a Different Endpoint Than the Form Action

A portion of your users are browsing the web without Javascript enabled. Some of them, like my friend Andrew, are paranoid. Others are on slow connections and want to save bandwidth. Still others are blind and browse the web with the help of screen readers.

All of these people, when they submit your form, will not hit your fancy Javascript event handler; they will submit the form using the default action and method for the form - which, if unspecified, default to a GET to the current page. Likely, this does not actually submit the form. Which leads to my favorite quote from Mark Pilgrim:

There is an easy fix: make the form action and method default to the same endpoint that you are POSTing to with Javascript.

You are probably returning some kind of JSON object with an error or success message and then redirecting the user in Javascript. Just change your server endpoint to redirect if the request is not an AJAX request. You can do this because all browsers attach an X-Requested-With: XMLHttpRequest HTTP header to asynchronous Javascript requests.

Changing Parameter Names

Don't change the names of the submitted parameters in Javascript - just submit the same names that you had in your form. In jQuery this is easy, just call the serialize method on the form.

var form = $("#form-id");
$.post('endpoint', $(form).serialize(), function(response) {
    // do something with the response.
});

Attaching the Handler to a Click Action

Believe it or not, there are other ways of submitting a form besides clicking on the submit button. Screen readers, for example, don't click, they submit the form. Also there are lots of people like me who use tab to move between form fields and press the spacebar to submit forms. This means if your form submit starts with:

$("#submit-button").click(function() {
    // Submit the form.
});

You are doing it wrong and breaking the Internet for people like me. You would not believe how many sites don't get this right. Examples in the past week: WordPress, Mint's login page, JetBrains's entire site.

The correct thing to do is attach the event handler to the form itself.

$("#form-id").submit(function() {
    // Write code to submit the form with Javascript
    return false; // Prevents the default form submission
});

This will attach the event to the form however the user submits it. Note the use of return false to avoid submitting the form.

Validation

It's harder to break the Internet with validation. To give fast feedback loop to the user, you should detect and prevent invalid input on the client side.

The annoying thing is you have to do this on both the client side and the server side, in case the user gets past the client side checks. The good news is the browser can help with most of the easy stuff. For example, if you want to check that an email address is valid, use the "email" input type:

<input type="email" />

Then your browser won't actually submit a form that doesn't have a valid email. Similarly you can note required fields with the required HTML attribute. This makes validation on the client a little easier for most of the cases you're trying to check for.

Summary

You can submit forms with Javascript, but most of the time you'll have to put in extra effort to duplicate functionality that already exists in your browser. If you're going to go down that road, please put in the extra effort.

Liked what you read? I am available for hire.

Helping Beginners Get HTML Right

Posted on February 12, 2013

If you've ever tried to teach someone HTML, you know how hard it is to get the syntax right. It's a perfect storm of awfulness.

Newbies have to learn all of the syntax, in addition to the names of HTML elements. They don't have the pattern matching skills (yet) to notice when their XML is not right, or the domain knowledge to know it's spelled "href" and not "herf".
The browser doesn't provide feedback when you make mistakes - it will render your mistakes in unexpected and creative ways. Miss a closing tag and watch your whole page suddenly acquire italics, or get pasted inside a textarea. Miss a quotation mark and half the content disappears. Add in layouts with CSS and the problem doubles in complexity.
Problems tend to compound. If you make a mistake in one place and don't fix it immediately, you can't determine whether future additions are correct.

This leads to a pretty miserable experience getting started - people should be focused on learning how to make an amazingly cool thing in their browser, but instead they get frustrated trying to figure out why the page doesn't look right.

Let's Make Things A Little Less Awful

What can we do to help? The existing tools to help people catch HTML mistakes aren't great. Syntax highlighting helps a little, but sometimes the errors look as pretty as the actual text. XML validators are okay, but tools like HTML Validator spew out red herrings as often as they do real answers. Plus, you have to do work - open the link, copy your HTML in, read the output - to use it.

We can do better. Most of the failures of the current tools are due to the complexity of HTML - which, if you are using all of the features, is Turing complete. But new users are rarely exercising the full complexity of HTML5 - they are trying to learn the principles. Furthermore the mistakes they are making follow a Pareto distribution - a few problems cause the majority of the mistakes.

Catching Mistakes Right Away

To help with these problems I've written an validator which checks for the most common error types, and displays feedback to the user immediately when they refresh the page - so they can instantly find and correct mistakes. It works in the browser, on the page you're working with, so you don't have to do any extra work to validate your file.

Best of all, you can drop it into your HTML file in one line:

</p>

<script type="text/javascript" src="https://raw.github.com/kevinburke/tecate/master/tecate.js"></script>

<p>

Then if there's a problem with your HTML, you'll start getting nice error messages, like this:

error message

Read more about it here, and use it in your next tutorial. I hope you like it, and I hope it helps you with debugging HTML!

It's not perfect - there are a lot of improvements to be made, both in the errors we can catch and on removing false positives. But I hope it's a start.

PS: Because the browser will edit the DOM tree to wipe the mistakes users make, I have to use raw regular expressions to check for errors. I have a feeling I will come to regret this. After all, when parsing HTML with regex, it's clear that the <center> cannot hold. I am accepting this tool will give wrong answers on some HTML documents; I am hoping that the scope of documents turned out by beginning HTML users is simple enough that the center can hold.

Liked what you read? I am available for hire.

Designing a better shower faucet

Posted on February 9, 2013

How should you design the controls for a shower? Let's take a quick look.

Affordance

a hammer

A device should make clear by its design how to use it. Take a hammer for example.

No one has ever looked at a hammer and wondered which end you are supposed to grab and which part you're supposed to pound nails with. This is an example of good affordance.

Some things do not have such good affordance, like the shower at my friend's house. It looked like this, except the handles were perfectly horizontal.

shower faucet and handles

The shower handles have one good affordance - you know where you are supposed to grab, and it's clear you are supposed to rotate the handles. However they leave the following questions unanswered.

Which one is hot and which one is cold?
How far do I have to turn the handles to reach the desired temperature?
What combination of hot and cold do I want?
Which direction do I turn the handles, up or down?

That's pretty bad for a device which doesn't need to do much. Maybe not as bad as this sink with two faucets, one for hot and one for cold:

sink fail

But it leaves a lot for the user to figure out, especially when there is usually a lag between when you move the handle and when the temperature changes, making it tough to figure out what's going on.

Designing a Better Showerhead

Functionally, a device should have two properties:

Allow you to do the tasks you want
Make it easy for you to do those tasks.

That's it - if the device is pretty on top of this, that's a big bonus. What do we want a shower to do?

Turn on hot water
Occasionally, make it even hotter
Turn off the water

That's it. These tasks don't map terribly well to the current set of faucets, which ask you to perform a juggling act to get water at the right temperature.

So how can we design a tool to do just this? I'll assume for the moment we have to stick with a physical interface - a tablet for a shower control would allow interesting choices like customizing the shower temperature per user, but would put this out of the reach of most homes. A good start would be a simple control to turn the water on and off. It's not necessary that the control shows the state of water, on or off, as you get that feedback from the hot water - it could just be a button that you press.

That's a good start, now how to control the temperature? I wasn't able to find good data, but my guess is that most people want showers in a 15 degree range of hot to very hot. Either way, there should be a sliding control that lets you select temperatures in this range.

The sliding shower handle comes close and this is one of the better designs I've seen:

Sliding shower handle

However it still has two problems. It shouldn't have a cold range at all, or select a temperature which will burn you.

Second, sliding the handle changes both pressure and temperature. You should get the best pressure available the moment you slide the handle a little bit.

Third, the feedback you get when you turn the shower off could be better. The device could offer a little resistance, and then slide into place when turning it on or off - this way you know that the shower is on or off, similar to the way stoves and iPhone headphones slide into place with a satisfying click.

A shower handle that gave resistance when turning it on or off, turned on full blast straight away, and only let you slide between various hot temperatures. That would be nice.

Liked what you read? I am available for hire.

How to design your API SDK

Posted on January 29, 2013

I've worked with Twilio's client libraries pretty much every day for the last year and I wanted to share some of the things we've learned about helper libraries.

Should you have helper libraries?

You should think about helper libraries as a more accessible interface to your API. Your helper libraries trade the details of your authentication scheme and URL structure for the ease of "do X in N lines of code." If the benefits of a more accessible interface outweigh the costs, do it.

If people are paying to access your API (Twilio, AWS, Sendgrid, Stripe, for example), then you probably should write helper libraries. A more accessible API translates directly into more revenue for your company.

If you're two founders in a garage somewhere, maybe not. The gap between your company's success and failure is probably not a somewhat easier API interface. Writing a helper library is a lot of work, maybe one to four man-weeks depending on the size of your API and your familiarity with the language in question, plus ongoing maintenance.

You might not need a client library if your customers are all highly experienced programmers. For example the other day I wrote my own client for the Recaptcha API. I knew how I wanted to consume it and learning/installing a Recaptcha library would have been unnecessary overhead.

You may also not need a client library if standard libraries have very good HTTP clients. For example, the Requests library dramatically lowers the barrier for writing a client that uses HTTP basic auth. Developers who are familiar with Requests will have an easier time writing http clients. Implementing HTTP basic auth remains a large pain point in other languages.

How should you design your helper libraries?

Realize that if you are writing a helper library, for many of your customers the helper library will be the API. You should put as much care into its design as you do your HTTP API. Here are a few guiding principles.

If you've designed your API in a RESTful way, your API endpoints should map to objects in your system. Translate these objects in a straightforward way into classes in the helper library, making the obvious transformations - translate numbers from strings in the API representation into integers, and translate date strings such as "2012-11-05" into date objects.
Your library should be flexible. I will illustrate this with a short story. After much toil and effort, the Twilio SMS team was ready to ship support for Unicode messages. As part of the change, we changed the API's 'Content-Type' header from

application/json

application/json; charset=utf-8

We rolled out Unicode SMS and there was much rejoicing; fifteen minutes later, we found out we'd broken three of our helper libraries, and there was much wailing and gnashing of teeth. It turns out the libraries had hard-coded a check for an application/json content-type, and threw an exception when we changed the Content-Type header.

Your library should complain loudly if there are errors. Per the point on flexibility above, your HTTP API should validate inputs, not the client library. For example let's say we had the library raise an error if you tried to send an SMS with more than 160 characters in it. If Twilio ever wanted to ship support for concatenated SMS messages, no one who had this library installed would be able to send multi-message texts. Instead, let your HTTP API do the validation and pass through errors in a transparent way.
Your library use consistent naming schemes. For example, the convention for updating resources should be the same everywhere. Hanging up a call and changing an account's FriendlyName both represent the same concept, updating a resource. You should have methods to update each that look like:

$account->update('FriendlyName', 'blah');
$call->update('Status', 'completed');

It's okay, even good, to have methods that map to readable verbs:

$account->reserveNumber('+14105556789');
$call->hangup();

However, these should always be thin wrappers around the update() methods.

class Call {
    function hangup() {
        return $this->update('Status', 'completed');
    }
}

Having only the readable-verb names is a path that leads to madness. It becomes much tougher to translate from the underlying HTTP request to code, and much trickier to add new methods or optional parameters later.

Your library should include a user agent with the library name and version number, that you can correlate against your own API logs. Custom HTTP clients rarely (read: never) will add their own user agent, and standard library maintainers don't like default user agents much.
Your library needs to include installation instructions, preferably written at a beginner level. Users have varying degrees of experience with things you might take for granted, like package managers, and will try to run your code in a variety of different environments (VPS, AWS, on old versions of programming languages, behind a firewall without admin rights, etc). Any steps your library can take to make things easier are good. As an example, the Twilio libraries include the SSL cert necessary for connecting to the Twilio API.

How should you test your library?

The Twilio API has over 20 different endpoints, split into list resources and instance resources, which support the HTTP methods GET, POST, and sometimes DELETE. Let's say there are 50 different combinations of endpoints and HTTP methods in total. Add in implementations for each helper library, and the complexity grows very quickly - if you have 5 helper libraries you're talking about 250 possible methods, all of which could have bugs.

One solution to this is to write a lot of unit tests. The problem is these take a lot of time to write, and at some level you are going to have to mock out the API, or stop short of making the actual API request. Instead we've taken the following approach to testing.

Start with a valid HTTP request, and the parameters that go with it.
Parse the HTTP request and turn it into a piece of sample code that exercises an aspect of your helper library.
Run that code sample, and intercept the HTTP request made by the library.
Compare the output with the original HTTP request.

This approach has the advantage of actually checking against the HTTP request that gets made, so you can test things like URL encoding issues. You can reuse the same set of HTTP requests across all of your libraries. The HTTP "integration" tests will also detect actions that should be possible with the API but are not implemented in the client.

You might think it's difficult to do automated code generation, but it actually was not that much work, and it's very easy if you've written your library in a consistent way. Here's a small sample that generates snippets for our Python helper library.

def process_instance_resource(self, resource, sid, method="GET", params=None):
    """ Generate code snippets for an instance resource """
    get_line = '{} = {}.get("{}")'.format(self.instance_name, self.base, sid)

    if method == "GET":
        interesting_line = 'print {}.{}'.format(self.instance_name,
            self.get_interesting_property(resource))
        return "\n".join([get_line, interesting_line])

    elif method == "POST":
        update_line = '{} = {}.update("{}", {})'.format(
            self.instance_name, self.base, sid, self.transform_params(params))
        interesting_line = 'print {}.{}'.format(
            self.instance_name, self.get_interesting_property(resource))
        return "\n".join([update_line, interesting_line])

    elif method == "DELETE":
        return '{}.delete("{}")'.format(self.base, sid)

    else:
        raise ValueError("Method {} not supported".format(method))

Generating code snippets has the added advantage that you can then easily embed these into your customer-facing documentation, as we've done in our documentation.

How do people use helper libraries?

While pretty much every resource gets used in the aggregate, individual accounts tend to only use one or two resources. This suggests that your API is only being referenced from one or two places within a customer's codebase.

How should you document your helper library?

Per the point above, your library is probably being used in only one or two places in a customer's codebase. This suggests your customer is hiring your API to do a specific job. Your documentation hierarchy should be aligned around those jobs. Combined with the integration test/code snippet generator above, and you should have a working code example for every useful action in your API. You will probably also want to have documentation for the public library interface, such as the types and parameters for each method, but the self-service examples will be OK for 85% of your users.

Liked what you read? I am available for hire.