Author Archives: kevin

About kevin

I write the posts

You Shouldn’t Use Faker (or other test randomization libraries)

Posted on November 4, 2017

It's a common, and distressing, pattern to have factories in tests that call out to a library like Faker to "randomize" the returned object. So you might see something like this:

User.create({
    id: uuid(),
    email: faker.random.email(),
    first_name: faker.random.firstName(),
    last_name: faker.random.lastName(),
    address: faker.random.address(),
});

package = Package.create({
    width: faker.math.range(100),
    height: faker.math.range(200),
    length: faker.math.range(300),
})

This is a bad practice and it's distressing to see it in such widespread use. I want to explain some reasons why it's bad.

The corpus is not large enough to avoid uniqueness failures

Let's say you write a test that creates two different users, and for whatever reason, the test fails if the users have the same name (due to a database constraint failure, write to same map key, array having 1 value instead of 2, &c). Faker has about three thousand different possible values for a first name. This means that every 3000th run on average, the randomly chosen names will collide, and your test will fail. In a CI environment or on a medium size team, it's easy to run the tests 3000 times in a week. Alternatively, if the same error exists in ten tests, it will emerge on average once every 300 runs.

The onus on catching failures falls on your team

Because the error only appears once in every 3000 test runs, the odds are you haven't fully or partially probed the space of possible test values when you submit your code and tests for review. This means that the other members of your team will be the ones who run into the error caused by your tests. They might lack the context necessary to effectively debug the problem. This is a very frustrating experience for your teammates.

It's hard to reproduce failures

If you don't have enough logging to see the problem on the first failure, it's going to be difficult to track down a failure that appears only once every 3,000 test runs. Run the test ten or even fifty times in a loop and it still won't appear.

An environment where tests randomly fail for unknown reasons is corrosive to build stability. It encourages people to just hit the "Rebuild" button and hope for a pass on the next try. Eventually this attitude will mask real problems with the build. If your test suite takes a long time to run (upwards of three minutes) it can be demoralizing to push the deploy or rebase button and watch your tests fail for an unrelated reason.

It's a large library (and growing)

It's common for me to write Node.js tests and have an entire test file with 50 or 100 tests exit in about 10 milliseconds.

By definition a "fixture" library needs to load a lot of fake data. In 2015 it took about 60 milliseconds to import faker. It now takes about 300 milliseconds:

var a = Date.now()
require('faker');
console.log(Date.now() - a);

$ node faker-testing/index.js
303

You can hack around this by only importing the locale you need, but I've never seen anyone do this in practice.

What you need instead

So faker is a bad idea. What should you use? It depends on your use case.

Unique random identifiers. If you need a random unique identifier, just use a uuid generator directly. If you create 103 trillion objects in a given test, the odds of a uuid collision are still only one in a billion, so the odds of a collision are much lower than using the faker library. Alternatively, you can use an incrementing integer each time you call your factory, so you get "Person 0", "Person 1", "Person 2", etc.
A fuzzer, or QuickCheck. Faker has tools for randomizing values within some input space, say a package width that varies between 0 and 100. The problem is that the default set up is to only exhaust one of those values each time you run the test, so you're barely covering the input space.

What you really want is a fuzzer, that will generate a ton of random values within a set of parameters, and test a significant percentage of them each time the test runs. This gives a better guarantee that there are no errors. Alternatively, if there are known edge cases, you can add explicit tests for those edge cases (negative, 0, one, seven, max, or whatever). Tools like Haskell's QuickCheck or afl-fuzz make it easy to add this type of testing to your test suite.

If you want random human-looking data I suggest adding it yourself and using some scheme that makes sense for your company. For example, users named Alice and Bob are always friends that complete actions successfully, and Eve always makes errors. Riders with Android phones always want to get picked up in Golden Gate Park and taken to the Mission; riders with iPhones want to get picked up in the Tenderloin and taken to Russian Hill (or something). This scheme will provide more semantic value, and easily pinpoint errors (a "Bob" error means we should have submitted something successfully but failed to, for example).

That's all; happy testing, and may your tests never be flaky!

Liked what you read? I am available for hire.

Proxying to a subcommand with Go

Posted on October 22, 2017

There are a number of Unix commands that manipulate something about the environment or arguments in some way before starting a subcommand. For example:

xargs reads arguments from standard input and appends them to the end of the command.
chpst changes the process state before calling the resulting command.
envdir manipulates the process environment, loading variables from a directory, before starting a subcommand

Go is a popular language for writing tools that shell out to a subprocess, for example aws-vault or chamber, both of which have exec options for initializing environment variables and then calling your subcommand.

Unfortunately there are a number of ways that you can make mistakes when proxying to a subcommand, especially one where the end user is specifying the subcommand to run, and you can't determine anything about it ahead of time. I would like to list some of them here.

First, create the subcommand.

cmd := exec.Command("mybinary", "arg1", "-flag")
// All of our setup work will go here
if err := cmd.Run(); err != nil {
    log.Fatal(err)
}

Standard Output/Input/Error

You probably want to proxy through the subcommand's standard output and standard error to the parent terminal. Just reassign stdout/stderr to point at the parent command. Obviously, you can redirect these later.

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// If you need
cmd.Stdin = os.Stdin

Environment Variables

By default the subcommand gets all of the environment variables that belong to the Go process. But if you reassign cmd.Env it will only get the variables you provide, so you have to be careful with your implementation.

cmd.Env = []string{"TZ=UTC", "MYVAR=myval"}

Signals

If someone sends a signal to the parent process, you probably want to forward it through to the child. This gets tricky because you can receive multiple signals, that all should be sent to the client and handled appropriately.

You also can't send signals until you start the process, so you need to use a goroutine to handle them. I've copied this code from aws-vault, with some modifications.

if err := cmd.Start(); err != nil {
    log.Fatal(err) // Command not found on PATH, not executable, &c.
}
// wait for the command to finish
waitCh := make(chan error, 1)
go func() {
    waitCh <- cmd.Wait()
    close(waitCh)
}()
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan)

// You need a for loop to handle multiple signals
for {
    select {
    case sig := <-sigChan:
        if err := cmd.Process.Signal(sig); err != nil {
            // Not clear how we can hit this, but probably not
            // worth terminating the child.
            log.Print("error sending signal", sig, err)
        }
    case err := <-waitCh:
        // Subprocess exited. Get the return code, if we can
        var waitStatus syscall.WaitStatus
        if exitError, ok := err.(*exec.ExitError); ok {
            waitStatus = exitError.Sys().(syscall.WaitStatus)
            os.Exit(waitStatus.ExitStatus())
        }
        if err != nil {
            log.Fatal(err)
        }
        return
    }
}

Just Use Exec

The above code block is quite complicated. If all you want to do is start a subcommand and then exit the process, you can avoid all of the above logic by calling exec. Unix has a notion of parent and children processes - when you start a process with cmd.Run, it becomes a child process. exec is a special system call that sees the child process become the parent. Consider the following program:

cmd := exec.Command("sleep", "3600")
if err := cmd.Run(); err != nil {
    log.Fatal(err)
}
fmt.Println("terminated")

If I run it, the process tree will look something like this:

 | |   \-+= 06167 kevin -zsh
 | |     \-+= 09471 kevin go-sleep-command-proxy
 | |       \--- 09472 kevin sleep 3600

If I run exec, the sleep command will replace the Go command with the sleep command. In Go, this looks something like this:

 | |   \-+= 06167 kevin -zsh
 | |     \--= 09862 kevin sleep 3600

This is really useful for us, because:

We don't need to proxy through signals or stdout/stderr anymore! Those are automatically handled for us.
The return code of the exec'd process becomes the return code of the program.
We don't have to use memory running the Go program.

Another reason to use exec: even if you get the proxying code perfectly right, proxying commands (as we did above) can have errors. The Go process can't handle SIGKILL, so there's a chance that someone will send SIGKILL to the Go process and then the subprocess will be "orphaned." The subprocess will be reattached to PID 1, which makes it tougher to collect.

I should note: Syscalls can vary from operating system to operating system, so you should be careful that the arguments you pass to syscall.Exec make sense for the operating system you're using. Rob Pike and Ian Lance Taylor also think it's a bad idea.

Liked what you read? I am available for hire.

Running Bazel tests on Travis CI

Posted on October 21, 2017

Bazel is a build system that was recently open sourced by Google. Bazel operates on configuration files - a WORKSPACE file for your entire project, and then per-directory BUILD.bazel files. These declare all of the dependencies for your project, using a language called Skylark. There are a number of Skylark rules for things like downloading and caching HTTP archives, and language-specific rules.

This can be a pain to set up for a new project, but the advantage is you get builds that are perfectly hermetic. In addition, if your project has a lot of artifacts, Bazel has useful tools for caching those artifacts, and builds a dependency graph that makes it easy to see when things change. For example, if you compile Javascript from Typescript, you can cache the compiled Javascript. If the Typescript source file changes, Bazel knows that only the file that changed needs to be regenerated.

Anyway there are some gotchas about running Bazel on Travis CI that I wanted to cover, and when I was looking at instructions they weren't great.

1) Install Bazel

The official instructions ask you to add the Bazel apt repository, and then install from there. You can shave a few seconds by only updating that repository, instead of all of them.

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl --silent https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update -o Dir::Etc::sourcelist="sources.list.d/bazel.list" \
    -o Dir::Etc::sourceparts="-" \
    -o APT::Get::List-Cleanup="0"
sudo apt-get install openjdk-8-jdk bazel

You can shave a few more seconds by just getting and installing the deb directly, though this requires you to manually update the version number when Bazel releases a new version. (This is in a Makefile, you'll need to change the syntax for Bash or a travis.yml file).

BAZEL_VERSION := "0.7.0"
BAZEL_DEB := "bazel_$(BAZEL_VERSION)_amd64.deb"

curl --silent --output /tmp/$(BAZEL_DEB) "https://storage.googleapis.com/bazel-apt/pool/jdk1.8/b/bazel/$(BAZEL_DEB)"
sudo dpkg --force-all -i /tmp/$(BAZEL_DEB)

2) Run Tests

Bazel ships with a ton of flags but these are probably the ones you want to know about/use:

--batch: Run in batch mode, as opposed to client-server mode.
--noshow_progress / --noshow_loading_progress: Travis pretends to be a TTY so it can show colors, but this causes Bazel to dump a whole bunch of progress messages to the screen. It would be nice if you could say "I can handle ANSI escape sequences relating to colors, but not screen redraws" but you only get the binary "I'm a TTY" / "I'm not", which is unfortunate. Anyway these options turn off the "show progress" log spam.
--test_output=errors: By default Bazel logs test failures to a log file. This option will print them to the screen instead.
--features: Enable features specific to a test runner. In particular for Go you may want the --features=race rule to run tests with the race detector enabled.

3) Cache Results

Bazel is much faster when it can load the intermediate results from a cache; on one library I maintain, tests run in 13 seconds (versus 47 seconds) when they are run with a full cache.

However, the Bazel caches are enormous. Caches for Go tests include the Go binary and source tree, which runs about 95MB alone. Add on other artifacts (including the Java JDK's used to run Bazel) and it's common to see over 200MB. If you are uploading or downloading caches from somewhere like S3, this can be a really slow operation. Also, it's difficult to remove large items from Bazel's cache without removing everything; the file structure makes the cache a little tough to remove parts of.

In particular, Travis's cache does go to S3, which means it skips the local caching proxies Travis sets up for e.g. apt and npm. Travis times out the cache download after 3 minutes, and Bazel quits if the cache is corrupt. In my experience Travis will not complete a 200MB cache download in 3 minutes, so your builds will fail.

That's it

For now it's probably best to eat the cold cache startup time. That's really unfortunate though; I hope we can find a better solution in the future. One way may be for Travis to run a local "remote cache", which responds to the WebDAV protocol. This would allow closer caching of Bazel objects.

It would also be nice if Travis had a "language: bazel" mode, which would come with the right version of Bazel installed for you.

Liked what you read? I am available for hire.

Make your Go Binaries Homebrew Installable

Posted on October 16, 2017

It's easier than you think to make a software package installable via Homebrew. If you depend on a very specific version of a software package (say, Postgres 9.5.3 with readline support), I highly recommend creating a Homebrew repository and publishing recipes to it. Then your team can install and update packages as easily as:

brew tap mycompany/packages
brew install mycompany/packages/postgresql

You can use the existing formulas as a jumping off point, and modify as you see fit. (Obviously this won't work for Linux folks on your team, however in my experience people running Linux in a Mac software shop have more experience building dependencies on their own).

Anyway, I wanted to describe how to install Go binaries via Homebrew. One way to do this is to compile binaries, upload them to Github releases, and install from there. However, the Homebrew core team requires that packages are buildable from the source code. (This helps check that a binary wasn't tampered with, and avoids compatibility problems with e.g. 32 bit and 64 bit systems).

If you vendor dependencies, and check in the vendor folder to Github, installation is super easy.

# Classname should match the name of the installed package.
class Hostsfile < Formula
  desc "CLI for manipulating /etc/hosts files"
  homepage "https://github.com/kevinburke/hostsfile"

  # Source code archive. Each tagged release will have one
  url "https://github.com/kevinburke/hostsfile/archive/1.2.tar.gz"
  sha256 "cc1f3c1cb505536044cbe01f44ad7da997e6a3928fac1f64590ef69d73da8acd"
  head "https://github.com/kevinburke/hostsfile"

  depends_on "go" => :build

  def install
    ENV["GOPATH"] = buildpath

    bin_path = buildpath/"src/github.com/kevinburke/hostsfile"
    # Copy all files from their current location (GOPATH root)
    # to $GOPATH/src/github.com/kevinburke/hostsfile
    bin_path.install Dir["*"]
    cd bin_path do
      # Install the compiled binary into Homebrew's `bin` - a pre-existing
      # global variable
      system "go", "build", "-o", bin/"hostsfile", "."
    end
  end

  # Homebrew requires tests.
  test do
    # "2>&1" redirects standard error to stdout. The "2" at the end means "the
    # exit code should be 2".
    assert_match "hostsfile version 1.2", shell_output("#{bin}/hostsfile version 2>&1", 2)
  end
end

Basically, download some source code, move it to $GOPATH/src/path/to/binary, build it, and put the compiled binary in $(brew --prefix)/bin.

If you don't vendor dependencies, the story gets a little more complicated because you need to download a version of all of your dependencies. Say for example I had one dependency in my project, I would add a go_resource line for each dependency, and then call stage_deps to download/install all of them in the correct places.

require "language/go"

# Classname should match the name of the installed package.
class Hostsfile < Formula
  desc "CLI for manipulating /etc/hosts files"
  homepage "https://github.com/kevinburke/hostsfile"

  # Source code archive. Each tagged release will have one
  url "https://github.com/kevinburke/hostsfile/archive/1.2.tar.gz"
  sha256 "cc1f3c1cb505536044cbe01f44ad7da997e6a3928fac1f64590ef69d73da8acd"
  head "https://github.com/kevinburke/hostsfile"

  go_resource "github.com/mattn/go-colorable" do
    url "https://github.com/mattn/go-colorable.git",
        :revision => "40e4aedc8fabf8c23e040057540867186712faa5"
  end


  depends_on "go" => :build

  def install
    ENV["GOPATH"] = buildpath


    bin_path = buildpath/"src/github.com/kevinburke/hostsfile"
    # Copy all files from their current location (GOPATH root)
    # to $GOPATH/src/github.com/kevinburke/hostsfile
    bin_path.install Dir["*"]

    # Stage dependencies. This requires the "require language/go" line above
    Language::Go.stage_deps resources, buildpath/"src"
    cd bin_path do
      # Install the compiled binary into Homebrew's `bin` - a pre-existing
      # global variable
      system "go", "build", "-o", bin/"hostsfile", "."
    end
  end

  # Homebrew requires tests.
  test do
    # "2>&1" redirects standard error to stdout. The "2" at the end means "the
    # exit code should be 2".
    assert_match "hostsfile version 1.2", shell_output("#{bin}/hostsfile version 2>&1", 2)
  end
end

And that's it! You can test your new package by creating a symlink from /usr/local/Homebrew/Library/Taps/homebrew to wherever you keep your homebrew-core checkout:

ln -s ~/code/homebrew-core /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core

Then you can just use brew install commands and they'll work just as you expect.

Liked what you read? I am available for hire.

CircleCI trusts 8 analytics companies with your source code and API tokens

Posted on October 9, 2017

When you navigate to your project in CircleCI's UI, Javascript from eight different analytics companies gets loaded and executed in your browser.

Pusher
Intercom
Launch Darkly
Amplitude
Appcues
Quora (??)
elev.io
Optimizely

You can see this in my Network tab here:

This is a problem because the CircleCI browser context has full access to the CircleCI API, which is hosted on the same domain, so all eight of those companies' scripts can make requests to CircleCI API endpoints. Furthermore CircleCI customers frequently either include credentials in source code or as environment variables in CircleCI. Set these, and you are trusting that CircleCI won't get compromised, or at least, your application is at most as secure as CircleCI is.

However, with eight different companies running Javascript in your browser with access to the CircleCI API, your source code and secrets are at most as secure as the union of eight different analytics companies' Javascript environments. If any of those eight gets compromised, it's trivial to execute Javascript that creates a new API token for your account. Once that token is created, an attacker can easily export it to a domain controlled by the attacker. Once an attacker has the token, they can use the "Test Commands" API to add new commands that will dump your environment variables and/or all files in source code to the logs, then download your logs or artifacts via the same API.

Analytics companies frequently get hacked, serve malware or otherwise contain vulnerabilities. Just last week it was revealed that "Code Hive" was running coin mining code on CBS's website. In 2014, jQuery.com was compromised and used to serve malware. If the same happened to any of the eight companies, you would be screwed.

This is frankly unacceptable for a company that manages source code and secrets for a large number of companies in the industry. It's unacceptable enough that your browsing data on CircleCI is potentially exposed to eight different companies, let alone API access to your source code. There are a number of steps you can take to mitigate the issue:

Put all domains used by the above servers in /etc/hosts, for each machine and each person on your team that accesses CircleCI. I have a tool that you can use to automate this process. However, this may break Javascript on CircleCI or other sites that are not coded in a defensive style.
Only use command line tools + API's to access CircleCI. I have written one such tool.
If there is an API to disable the "Test Commands" setting, turn it off, as an attacker can use that to put data in logs or artifacts without being able to control the circle.yml file or push to Github.
Don't put any sensitive keys in CircleCI, or in source code; inject them directly into the production runtime.
Demand that CircleCI take steps to make their dashboard and your source code more secure.

Finally, there are a number of steps CircleCI should enable immediately:

Don't load analytics scripts from eight different companies on pages that contain sensitive content, or that have access to the CircleCI "create token" API. Host the dashboard and API on a separate domain from the marketing page.
Send an email any time an API access token is created. Add a setting to allow org-wide disabling of API token creation.
Add an option to disable the "Test Commands" API, as this would allow an attacker without access to Github to place whatever content they want (source code/environment variables) in logs.
Add an option to delete old logs. If you have ever dumped env vars to the log file, an attacker can export these.
Enable subresource integrity, or serve JS from each of these companies from the CircleCI domain.

With embarrassing compromises literally every week, and new Congressional scrutiny into Facebook and Google's advertising practices in the 2016 election, as an industry we need to do better on issues like this.

Update: CircleCI posted their response.

Notes

Why don't you include a POC? I have one that demonstrates this attack, but I don't want to show script kiddies how. If you can't figure out how to construct it by reading the above description and the network traffic, you don't deserve to know how.

Liked what you read? I am available for hire.

Let’s talk about Javascript string encoding

Posted on September 1, 2017

Node string encoding is all over the place. Let's try to straighten out how it works.

First, some very basics about string encoding. A string is a series of bytes. A byte is 8 bits, each of which can be 0 or 1, so a byte can have 2⁸ or 256 different values. Encoding is the process of squashing the graphics you see on screen, say, 世 - into actual bytes. There are over a million possible Unicode characters - think about all of the different languages and emoji and symbols on the planet.

You could easily represent all of the characters in the Unicode set with an encoding that says simply "assign one number, 4 bytes (or 32 bits) long, for each character in the Unicode set." One 32-bit combo for each character means you can have 4 billion distinct characters. But this would be really inefficient for most documents. For example, the English Bible or dictionary or people's email folders are mostly the characters a-z, A-Z, 0-9, and punctuation. It would be inefficient to waste 4 bytes on every "a" in the document - we want a way to represent it more efficiently.

The most common encoding is UTF-8. The main advantage of UTF-8 is that it needs a single byte (instead of four) to encode Unicode characters 0-127 - the so-called ASCII set, which includes the ones I listed above. Less common characters are represented with two bytes, and even less common characters with three bytes, and so on.

Because Javascript was invented twenty years ago in the space of ten days, it uses an encoding that uses two bytes to store each character, which translates roughly to an encoding called UCS-2, or another one called UTF-16. The letter 'a' is Unicode code point 97, so stored in a Javascript string, the first byte of a UTF-16 code point would be 97 and the second byte would be 0. The euro symbol (€) is Unicode code point 8364, so the first byte would be 172 and the second byte would be 32. (The relationship between them: 8364 = (32 << 8) + 172).

That only gets you Unicode code points 0 through (256*256 = ) 65536, though, so: if the first two-byte character is between 55296 and 56319, it's a surrogate, and you have to read the second two-byte character to figure out what Unicode code point it represents. Javascript will handle this multi-character read for you if you use the new codePointAt operator, which can return all Unicode code points, up to 1.1 million. The old charCodeAt operator only reads one character at a time (between 0 and 65536) and you'll have to read/decode the second one yourself.

Where this gets complicated

Because everything else in the world is moving to UTF-8, Node is also trying to move to UTF-8. This gets confusing because Node sometimes asks you to choose which encoding you want, in places where you wouldn't really expect it to ask.

Frequently, you want to convert from a UTF-16 encoded Javascript string to UTF-8 bytes in some form, whether a Buffer or a Uint8Array. Unfortunately Node doesn't offer easy API's to do that; it buries the conversion logic in C++ code, which eventually calls out to the V8 binary to handle it.

Buffers

Node offers a Buffer type, where a Buffer is an array of bytes, and an API, Buffer.from(string, encoding). encoding defaults to UTF-8. So far, so good! Unfortunately, sometimes encoding refers to the encoding of string, and sometimes it refers to the encoding of the bytes in the Buffer.

Buffer.from('7468697320697320612074c3a97374', 'hex') will decode the input as a series of hex characters, and store the bytes corresponding to each 2-digit hex character in the Buffer. But in var a = 'tést'; Buffer.from(a, 'utf8'); the 'utf8' refers to how the bytes will be stored in the resulting Buffer. Remember, strings are always two bytes per character, so declaring they are 'utf8' doesn't make sense.

Similarly, the encoding parameter in buf.toString(encoding) does not refer to the encoding of the output string - it refers to the encoding of the data in the buffer. Unless of course you specify hex or base64, in which case it does refer to the encoding of the output string. Got it?

HTTP

Incoming HTTP requests are often encoded using UTF-8. Node will give you data as a Buffer via the HTTP request's .on('data') event handler, which can be decoded using the content-type from the HTTP headers. In Express, this is handled in the body-parser module, which defers to the iconv-lite module for the actual encoding and decoding.

But if you expect e.g. a JSON or XML input, you will probably eventually want to turn that Buffer into a string - see the JSON section below.

When writing string data as an HTTP response, Node will wait until the last possible minute to convert that string into a Buffer that can be written to the socket. This leads to odd behaviors like the net library needing to know what encoding to use for a string you pass to it.

JSON

JSON.parse operates on a string and returns a JSON object, which may be a string or contain strings. JSON.parse and JSON.stringify don't do anything with the string encoding - if you pass in a garbage string, you get a garbage string back out.

So you need to depend on something else to determine the character encoding, before you pass a string to JSON.parse. Usually this will be your HTTP middleware; you have to trust (or verify) that it handles character sets correctly, before creating UTF-8 strings.

Files on disk

Node source files are expected to be encoded with UTF-8. That means you can encode UTF-8 source characters in a string, like this:

var x = "¢"

Where the cent character is the UTF-8 encoded byte sequence "\xc2\xa2". When Node starts and you try to reference x in your program, it will be re-encoded as a UTF-16 string. If you type the literal characters:

var x = "\xc2\xa2";

This will be turned into the UTF-16 string "\xc2\x00\xa2\x00". So be careful to mind your inputs and outputs.

Conclusion

Encoding in Node is extremely confusing, and difficult to get right. It helps, though, when you realize that Javascript string types will always be encoded as UTF-16, and most of the other places strings in RAM interact with sockets, files, or byte arrays, the string gets re-encoded as UTF-8.

This is all massively inefficient, of course. Most strings are representable as UTF-8, and using two bytes to represent their characters means you are using more memory than you need to, as well as paying an O(n) tax to re-encode the string any time you encounter a HTTP or filesystem boundary.

There's nothing stopping us from packing UTF-8 bytes into a UTF-16 string: to use each of the two bytes to store one UTF-8 character. We would need custom encoders and decoders, but it's possible. And it would avoid the need to re-encode the string at any system boundary.

Liked what you read? I am available for hire.

Things to Use Instead of JWT

Posted on May 7, 2017

You might have heard that you shouldn't be using JWT. That advice is correct - you really shouldn't use it. In general, specifications that allow the attacker to choose the algorithm for negotiation have more problems than ones that don't (see TLS). N libraries need to implement M different encryption and decryption algorithms, and an attacker only needs to find a vulnerability in one of them, or a vulnerability in their combination. JWT has seen both of these errors; unlike TLS, it hasn't already been deployed onto billions of devices around the world.

This is a controversial opinion, but implementation errors should lower your opinion of a specification. An error in one implementation means other implementations are more likely to contain the same or different errors. It implies that it's more difficult to correctly implement the spec. JWT implementations have been extremely buggy.

But The Bad Implementations Were Written by Bad Authors

In the 1800's rail cars were coupled by an oval link on one end and a socket on the other. A railway worker would drop a pin down through the socket, keeping the link in place.

The train engineer could not see the coupler at the time of coupling, and the operation was fraught. Many couplers had their hands mangled. Worse, there was no buffer between the cars, and it was easy to get crushed if the coupling missed. Tens of thousands of people died.

Link-and-pin railway coupler

Still, the railroads stuck with them because link-and-pin couplers were cheap. You could imagine excuses being made about the people who died, or lost their fingers or hands; they were inattentive, they weren't following the right procedure, bad luck happens and we can't do anything about it, etc.

In 1893 Congress outlawed the link-and-pin coupler and deaths fell by one third within a year, and that's despite the high cost of switching to automatic couplers.

Alternatives

Update, January 2018: PAST is an excellent library implementing many of the suggestions here. Use PAST. Follow its development for use in your language.

What should you be using instead of JWT? That depends on your use case.

I want users to authenticate with a username and secret token

Have them make a request to your server over TLS; you don't need any additional encryption. TLS provides an encryption layer, you don't need any additional encryption or hashing besides TLS.

I want to post a public key and have users send me encrypted messages with it

The technical name for this is asymmetric encryption; only the user with the private key can decrypt the message. This is pretty magical; the magic is that people don't need the private key to send you messages that you can read. It was illegal to ship this technology outside of the US for most of the 90's.

JWT supports public key encryption with RSA, but you don't want to use it for two reasons. One, RSA is notoriously tricky to implement, especially when compared with elliptic curve cryptography. Thomas Ptáček explains:

The weight of correctness/safety in elliptic curve systems falls primarily on cryptographers, who must provide a set of curve parameters optimized for security at a particular performance level; once that happens, there aren't many knobs for implementors to turn that can subvert security. The opposite is true in RSA. Even if you use RSA-OAEP, there are additional parameters to supply and things you have to know to get right.

You don't want the random person implementing your JWT library to be tuning RSA. Two, the algorithm used by JWT doesn't support forward secrecy. With JWT, someone can slurp all of your encrypted messages, and if they get your key later, they can decrypt all of your messages after the fact. With forward secrecy, even if your keys are exposed later, an attacker can't read previous messages.

A better elliptic curve library is Nacl's box, which only uses one encryption primitive, and doesn't require any configuration. Or you can have users send you messages with TLS, which also uses public key encryption.

I want to encrypt some data so third parties can't read it, and then be able to decrypt it later

You might use this for browser cookies (if you don't want the user to be able to read or modify the payload), or for API keys / other secrets that need to be stored at rest.

You don't want to use JWT for this because the payload (the middle part) is unencrypted. You can encrypt the entire JWT object, but if you are using a different, better algorithm to encrypt the JWT token, or the data in it, there's not much point in using JWT.

The best algorithm to use for two-way encryption is Nacl's secretbox. Secretbox is not vulnerable to downgrade or protocol switching attacks and the Go secretbox implementation was written by a world-renowned cryptographer who also writes and verifies cryptographic code for Google Chrome.

I want to send some data and have users send it back to me and verify that it hasn't been tampered with

This is the JWT use case. The third part of a JWT is the signature, which is supposed to verify that the header and the payload have not been tampered with since you signed them.

The problem with JWT is the user gets to choose which algorithm to use. In the past, implementations have allowed users to pass "none" as the verification algorithm. Other implementations have allowed access by mixing up RSA and HMAC protocols. In general, implementations are also more complicated than they need to be because of the need to support multiple different algorithms. For example in jwt-go, it's not enough to check err == nil to verify a good token, you also have to check the Valid parameter on a token object. I have seen someone omit the latter check in production.

The one benefit of JWT is a shared standard for specifying a header and a payload. But the server and the client should support only a single algorithm, probably HMAC with SHA-256, and reject all of the others.

If you are rejecting all of the other algorithms, though, you shouldn't leave the code for them lying around in your library. Omitting all of the other algorithms makes it impossible to commit an algorithm confusion error. It also means you can't screw up the implementations of those algorithms.

For fun, I forked jwt-go and ripped out all of the code not related to the HMAC-SHA256 algorithm. That library is currently 2600 lines of Go, and supports four distinct verification algorithms. My fork is only 720 lines and has much simpler API's.

// old
func Parse(tokenString string, keyFunc func(*Token) (interface{}, error)) (*Token, error)
func (m *SigningMethodHMAC) Sign(signingString string, key interface{}) (string, error)
func (m *SigningMethodHMAC) Verify(signingString, signature string, key interface{}) error

// new
func Parse(tokenString string, key *[32]byte) (*Token, error)
func Sign(signingString string, key *[32]byte) string
func Verify(signingString, signature string, key *[32]byte) error

These changes increased the type safety and reduced the number of branches in the code. Reducing the number of branches helps reduce the chance of introducing a defect that compromises security.

It's important to note that my experiment is not JWT. When you reduce JWT to a thing that is secure, you give up the "algorithm agility" that is a proud part of the specification.

We should have more "JWT"-adjacent libraries that only attempt to implement a single algorithm, with a 256-bit random key only, for their own sake and for their users. Or we should give up on the idea of JWT.

Liked what you read? I am available for hire.

Pragmatic Web Development in Go

Posted on April 10, 2017

You should write your next web server in Go. Yes, you! Compared with Ruby, PHP, Python, or Javascript, you're going to get great memory and latency performance, and libraries that do what you expect.

The standard library can be a bit lacking though, if you are used to developing with a tool like Rails. I found myself adding the same helpers to every web project that I started in Go, so I compiled those tools into a starter pack. Most of the code is thin wrappers around the standard library, and any/all of it can be ripped out. I wanted to go over some of the tools in the starter pack.

Serving Static Assets

I compile static assets into the binary. This way you don't need to worry about copying static assets to your deployment server, passing in the relative path from the working directory to the static asset directory, or ensuring that you have the right version of the static assets for the right version of the binary.

A make target runs go-bindata, which compiles everything in the static directory and the templates directory into a single Go file. When a request comes in for static assets, we find the right one and serve it:

data, err := assets.Asset(strings.TrimPrefix(r.URL.Path, "/"))
if err != nil {
    rest.NotFound(w, r)
    return
}
http.ServeContent(w, r, r.URL.Path, s.modTime, bytes.NewReader(data))

Tests ensure that the static files on disk and the Go assets are up to date.

Routing

The Go http router only offers exact match, or the ability to route a directory, e.g. http.Handle("/static/", staticHandler()). I use a http.Handler that takes a regular expression as the first argument, instead of a string, and lets you specify which methods you want to handle.

r := new(handlers.Regexp) // github.com/kevinburke/handlers
r.HandleFunc(regexp.MustCompile("^/$"), []string{"GET"}, func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    io.WriteString(w, "<html><body><h1>Hello World</h1></body></html>")
})

You can use regexp.FindStringSubmatch to pull parameters out of the URL, or replace this with your own routing framework.

HTTP/2 Server Push

"We shall go south," Paul said.

"Even if I say we shall turn back to the north when this day is over?" Stilgar said. "We shall go south," Paul repeated.

r.HandleFunc(regexp.MustCompile(`^/$`), []string{"GET"}, func(w http.ResponseWriter, r *http.Request) {
    if pusher, ok := w.(http.Pusher); ok {
        pusher.Push("/static/style.css", nil)
    }
    // Render the homepage HTML
})

It's really, really easy to do server push in Go, and the idea of pushing resources your clients are about to ask for is so exciting that I wanted to use it even if the benefits aren't going to be that large for your project.

Server push requires that you terminate TLS in your Go server. Run make generate_cert to generate a self-signed certificate for use locally. Use autocert.NewListener() for one-line certificate provisioning in production.

If you can't terminate TLS (say you are running on App Engine Flex or Heroku), the starter pack falls back to sending a preload Link header.

Logging

Being able to see details about every incoming request and response helps with debugging and is rarely a performance bottleneck on a domain that's being browsed by humans. In particular it's useful to see the user agent, the status code, the request ID (for correlating with downstream services), and any username used for HTTP basic authentication.

INFO[07:52:24.810-07:00] method=GET path=/ time=0 bytes=468 status=200
remote_addr=[::1]:50170 host=localhost:7065 user_agent=HTTPie/1.0.0-dev
request_id=a8cf0a8c-4f30-4b01-a49d-396611b3ca15

I use a logging middleware for this but you are welcome to rip it out and use your own.

Flash Messages

You'll probably want to show error and success messages to your end users. Again this is trying to take the simplest possible approach - a flash message is set by encrypting a string and setting the result as a cookie. Retrieving the message deletes the cookie.

func FlashSuccess(w http.ResponseWriter, msg string, key *[32]byte) {
	c := &http.Cookie{
		Name:     "flash-success",
		Path:     "/",
		Value:    opaque(msg, key),
		HttpOnly: true,
	}
	http.SetCookie(w, c)
}

We use the secretbox library to encrypt and decrypt cookies using a secret key - you can use the same to encrypt session ID's or other data in a tamper-safe way.

Loading Configuration

You will probably need to configure the server somehow. I prefer loading configuration from a YAML file to the command line or to environment variables, because it's more difficult to accidentally leak secrets from YAML to e.g. Sentry (as you can with environment variables), and your secrets can't be read from /proc/<pid>/env by other users.

main.go walks you through loading a YAML file and using it to configure the server.

Release Binaries

Run make release version=0.x.y to increment the server's Version, cross compile binaries for Windows, Mac, and Linux, and push them to the releases page on Github. You'll want to set a GITHUB_TOKEN in the environment with permission to push new releases to your project, and you'll probably need to change the username in your version of the Makefile.

Conclusion

It's not too difficult to get a lot of the core functionality of a larger framework in a different language, with a lot better performance and a more understandable set of libraries. It should be pretty easy to rip any of these parts out; you should deploy this by copying the files into your own project and modifying as you see fit.

I hope the starter pack is useful and I hope you choose Go for your next web development project!

Liked what you read? I am available for hire.

We Can’t Keep Ignoring the Bay’s Housing Politics

Posted on March 30, 2017

Do you work in the tech industry in the Bay Area? You should start learning about, and getting involved in, local housing politics.

The prognosis for housing and rent prices is bad, and things are likely going to get worse for tech workers in the Bay, unless we start taking action. I will explain why prices will keep going up, and what you can do to help.

Why Should You Care?

Salaries in the tech industry are really good! Rent may be high, but you make more, and can count on raises to outpace rent increases. Why should you care?

You may want your children to be able to afford to live where they grew up. Thirty years ago, you could buy a house in the Bay Area for 3-4x the average income. Frequently now that number is 8-10x, and that house may be out in Antioch.¹
San Francisco has a huge number of people commute downtown every day, which stress our highways and tunnels. Long commutes are correlated with lower happiness, and are more harmful for the environment. If we built more housing near where people work, commutes would be shorter.
Lower rents mean that there's more money in your pocket.
You may want to buy a home, and you may not have rich parents, or have been an early employee at a rocketship startup.
Cheaper housing helps less fortunate people make it and get a leg up. The Bay Area has one of the tightest job markets and most dynamic local economies in the US. People not making very much in other areas can move here, get a job and earn a higher salary than they can in, say, Reno. The high starting prices for housing discourage this, which means it's more difficult for the middle class to get a foothold in the Bay Area.
You may want to send your children to school in the area. Schoolteachers largely can't afford to live in San Francisco, which makes it harder to recruit good teachers for your kids, in public or private schools. Your employees have this same problem.
If you want to start a company, you'll need to hire employees. Higher home prices and rents mean that you have to pay higher salaries and more for rent. This makes your startup less viable.
If you want to fund startups, high salaries and rents mean you have to have larger rounds, and that your money doesn't go as far.

How an Empty Plot of Land Becomes Housing You Can Live In

You have to jump through many hoops to build housing in San Francisco. This section is long, but it's important to know how many different opportunities there are for NIMBY's to stall a project they don't like.

Buy land you want to develop on. There are many underdeveloped properties in San Francisco - parking lots, unused office spaces, or undeveloped lots owned by the City.

Submit a building plan to the City that follows the zoning code. Put up signs in the neighborhood explaining what you are building. Start working on permits. This part is pretty standard across all cities.

If your property is on the waterfront, your project needs to be approved by a majority of city residents in the next citywide election, thanks to 2014's Proposition B, which requires any new waterfront development to be voted on by the entire city. If you want to build in SOMA or the Mission and your housing would replace a production, distribution or repair business, you need to create one elsewhere or add space in your building for it, thanks to 2016's Prop X.

You have to submit an "Environmental Impact Report" (EIR) which explains the environmental impact your building will have. There were over 200 different "impacts" that can be considered - noise, traffic, crime, etc. All of these were given equal weight until a few years ago. These "impacts" are local to the area — you can't count "People will have to commute from Stockton if we don't build in SF" as an impact, even though it's true, and bad. Most of the time, you can reuse an existing Environmental Impact Report that has been prepared for a given neighborhood. More on this later.
If a neighbor doesn't like your project, they can pay just $578 to ask for a "Discretionary Review" by the Planning Commission. This is supposed to be for extraordinary circumstances, but pretty much anyone can file for any reason. Common ones are because a project will block your view, will cast a shadow on your beer garden, won't fit with "the character of the neighborhood," or requires a "variance," some small change from the zoning code.

You are supposed to meet with the community at this point. Your neighbors probably won't like your project. They may invoke the words "3 story monstrosity" to describe it, pass around flyers saying it will "ruin neighborhood character", or say there will be increased crime, a harder time finding parking, etc. If they can't block the project, they want you to make it shorter and smaller. But resolving the issue by reducing the number of units makes your project less viable.

If you can't resolve the issue it goes to the Planning Commission, which has seven members, four of whom are appointed by the Mayor, three by the City Supervisors. You get ten minutes to explain why you should be allowed to build. The opponents get ten minutes to complain about their views. There is a "public comment" section where members of the public get 2 minutes to talk about the project.²

The Planning Commission may ask you to compromise, approve the project, deny the project, or punt the decision by a month. They frequently deny projects. Let's say they approve the project. Hurray!
Your neighbors may appeal the decision to the full Board of Supervisors. There are 11 supervisors, one for each district in San Francisco. The Board is currently split between people who want more housing and people who say they want more housing, but repeatedly vote for rules that make it harder to build housing.

The most frequent appeal angle is to say that the EIR is not valid. Recently, one guy in the Mission appealed a 100% affordable, 94-unit building for senior citizens. You can read the appeal for yourself. The main reason the appellant thinks there should be a new EIR is because these poor senior citizens will cause vagrancy, crime, and littering in the neighborhood. Certainly these effects would be much worse if we didn't build the housing, and these seniors would be on the street.³

You are about six to eighteen months into the permit process at this point. And the Board may vote to turn down your project! Recently a 157-unit project in the Mission, with 39 affordable units, was denied by the Board.

Neighborhood groups may try to make a "deal" where the developer essentially buys their support. Calle 24, an anti-gentrification group in the Mission, recently negotiated a "deal" with another building where they would drop their protests in exchange for $1 million.

At any time during this process your permit might run out, the bank might decide to cancel your loan, market conditions may change, the City might vote to make your project infeasible. This is why your neighbors use so many stalling tactics - the longer they can stall, the more likely you will pull out of the project.
If the BoS approves your project, your neighbors have one more recourse: they can file a California Environmental Quality Act (CEQA) lawsuit. The lawsuit alleges, essentially, that a project would be harmful to the environment, and the developers haven't sufficiently considered those impacts.

If you were considering the environment at a regional level, you would probably want to build as densely as possible, and optimize for short commutes - so you'd file lawsuits to block low density projects that build on undeveloped land. However, a recent study found that CEQA lawsuits target infill projects (which increase density) by a 4 to 1 ratio. CEQA is well intentioned, but frequently abused. In one instance, abortion opponents filed a CEQA lawsuit to block a Planned Parenthood. They said that the Planned Parenthood hadn't adequately considered the noise that the protesters themselves would generate!.
Finally, if you navigate this intimidating gauntlet, and are determined to stick with the project, and your permits haven't expired, you can break ground on your project. It's extremely expensive to build here, and while the sums are large and the rent is high, developers do not make much money. One developer, Boston Properties, targets a 7% return, which is not large given the risk involved.

How does this work in places that keep prices low? States like Texas have lots of land and loose zoning codes. Other countries, like Japan, work around this by setting housing policy at the regional or national level. Essentially, they don't let NIMBY's have a say in the process.

Other Housing Errata

The city of San Francisco is half as dense as Brooklyn. We can achieve large decreases in rent prices by just building 4-5 stories on empty lots around the city.

If you live in a building that is older than 1979, you are entitled to rent control, which means that your landlord can only increase your rent by a few percent per year (the exact percentage is controlled by the SF Rent Board and tied to inflation. Last year it was 2.2%). You have a wide array of rights as a tenant; you may even have tenants rights if you have been living in a place for more than 30 days and don't have a formal contract with your landlord, or are not on the lease.

Econ 101 classes will teach you that rent control depresses supply, but I don't think it's too relevant to the SF housing crunch. If we didn't touch a single rent controlled unit, and just built 4-5 story units on the underdeveloped lots around the city, we'd be in a great place, housing wise.
There are well intentioned people who believe that building new housing supply actually creates more demand, and show up to meetings opposing any new development. This theory is not borne out by the evidence; the city has built a ton of new housing since 2015, and as a result, market rate rents have fallen by about 5% since the peak of the market. Other cities like Denver and Seattle have also seen rent decreases in response to new housing coming on the market. Furthermore, if increased supply really increased demand, the opposite would also be true - reducing supply would reduce demand even further! But no one suggests that destroying housing in SF would lower prices.
There are well intentioned people who believe that the only new housing in SF should be 100% affordable. 100% affordable projects are not profitable or viable for private developers, so they require subsidies. There is only so much money for subsidies, and in addition the Trump administration is interested in cutting federal subsidies for affordable housing.
There are well intentioned people who believe that the only way to prevent gentrification is to prevent any new buildings from being built - groups like MEDA and Calle 24, who have successfully fought projects in the Mission. Blocking new supply doesn't do anything about high demand, of course, and the result is existing buildings getting resold for millions of dollars, as we're seeing in the Mission, and ordinary people can't find new places to live, or can't move within their neighborhood.

There are less well intentioned landlords and real estate agents who oppose new housing because no new supply drives up the demand for existing housing, which increases their property values.
The most frequent complaints are that a project has too many luxury units and that it's too tall. These complaints and process impediments drive up the price of building here, which mean that the only viable projects are (1) targeted at the high end of the market, and/or (2) contain lots of units. Ironically, the complaints about luxury and height make it harder to build projects that aren't very tall and very high end!
Construction unions often oppose projects (and sometimes file CEQA lawsuits to block them) if the construction wages aren't high enough, or if the developer wants to use non-union labor.

No one here thinks they are a NIMBY. Even the guy filing a lawsuit to block 94 units of affordable housing told me in an email, "I am not a NIMBY."
Denying or asking a development to reduce the number of units is illegal in many circumstances, but cities still do it. Developers and pro-housing groups are starting to file lawsuits when projects get rejected or downsized.

You can complain to the Planning Commission that a project will cause shadows or block your view. In 99% of the country, these disputes are resolved by buying an easement, a contract that prevents your neighbor from blocking your view or casting a shadow.

The Coase Theorem says this should be good enough; if you don't have an easement, we can conclude your neighbors value building high more than you value your view or your sunlight. Except in San Francisco! If your neighbor doesn't want to grant you an easement, you can complain to the Planning Commission and block their project.

How to Get Involved

All of this means that building new housing is really difficult in San Francisco, and as a result, rents and home prices are going to continue to increase, you will pay more in rent, you won't be able to hire employees or buy a house, and your kids' teachers won't have a place to live.

The problem is that prices are too high and there are too many roadblocks to building. The political goal is to lower prices by a) building more, and b) making it easier to build more. I support pretty much every project - 100% affordable, super high end market rate - as long as it gets a shovel in the ground. Every new unit helps - even if it's not in your price range, it means people in that price range aren't outbidding you for a place in your range.

Things that don't work

Bitching on Twitter - This accomplishes nothing, as the last election showed. The same way Indivisible Team and others are encouraging people to show up to town halls and flood their reps offices with phone calls - you need to do this for your local supervisors. The good news is that your local officials pick up the phone! And they don't get as many calls, so they're more responsive to your phone calls.
Building an app - We are not going to hackathon our way out of the housing crisis. We need to show up to planning meetings and whip support for bills.
Apathy - The default mode for tech employees, which is to not care about local politics and then outbid other residents for apartments. As long as this is true homes will be unaffordable, your kids schools will have tired teachers and your rents will keep rising.

If you have 5 minutes a week

Calling actually makes a difference! Call your supervisor or state rep and ask them to support building more housing, and to make the approval process simpler.

The most pressing SF issue concerns the percentage of affordable units per new development. Two SF supervisors want to force all new developments to have 28% of units be affordable, which is unworkable for many projects - 28% of zero new projects means that zero new units get built. A competing plan would lower the percentage to 18%, which would increase the total number of affordable units, despite being a lower percentage, because it would make more projects viable. Call your supervisor and ask them to support the Breed/Safai Prop C plan.

The most pressing area issue is what will happen to an industrial park in Brisbane. A developer wants to build 4,300 apartments on empty land, but the city is fighting back. If you live in Brisbane, call the mayor's office and tell them you support the development, or show up to the nightly meetings.
The next election, figure out how the candidates stand on housing, and vote for the pro-housing candidates. This is tricky, because everyone says they are pro-housing, but many will not help get shovels in the ground. "Preserve neighborhood character" is a red flag. "100% affordable only" is a red flag. Any reference to wind tunnels, shadows, or height is a red flag.
Sign up for SF Do Something. Follow SFYimby or East Bay Forward on Twitter, and call your supervisor when you hear about a helpful piece of legislation, or a stalled housing project.
There are several bills in the California State Legislature that help. SB 35 (sponsored by SF's Senator, Scott Wiener) would make it more difficult to stall projects if cities are not meeting their state-mandated housing goals. SF's Representative, David Chiu, sponsored a bill to tax vacation homes and use the money to support affordable housing. SB 167 would make it harder for cities to block housing projects. Call your state senator and representative and ask them to support these bills.

If you have an hour a week (especially if your work hours are flexible)

SF Yimby holds meetings once a month teaching new people how to make a difference in their community; check their Facebook for notifications about the next meeting.

Show up to your local city hearings and speak in support of projects. The Planning Commission meets at City Hall every Thursday and posts their agenda here. You can show up and give a comment.

Take BART or MUNI to City Hall, a block from the Civic Center station.
Ask the City Hall door guard where the Planning Commission meeting is.
There are little cards at the front of the room. Write down your name, and check the box that says "Support."
Sit in the meeting. When the Commissioner asks for comment, line up by the microphone. Say you are a city resident, and you support building the project. Here is a template:

Hi, I'm [name here], a [renter|homeowner] in District [your district]. I'm speaking in support of the project. We are in the middle of a housing crisis, and the more housing we build, the more people can afford to live here. This project will help [X number of people] live in the city, close to where they work, and should go forward. Thank you for your time.

The wild thing is that the commissioners and supervisors apply ad-hoc, per project guidelines and often vote based on how many people comment in support or in opposition of the project. So your voice really does matter! But you have to show up.

The meetings can take a long time. City Hall has good wifi; I've sat in the back of the room and worked for hours at a time before while waiting for a bill.

You can also write an email to the Planning Commission in support of a particular project.

If you work in SF and commute to a megacorp in the South Bay

Menlo Park, Palo Alto, Mountain View and co love to build office space, but hate to build housing. Consider calling the city managers in the city you work in, and ask them to support more housing, so you can afford to live near where you work.

Your C-level managers or venture capital backers may not realize how bad the problem is for employees. Ask them to support pro-housing candidates and organizations.

Ask your company to try to build apartments in the area; Facebook is trying this.

If you have money

The tech community is learning slowly that it's not enough to air drop cash three months before an election; we have to build organizations and coalitions, get elected to committees like the local DCCC, and show up to meetings between elections. Donate to these local pro-housing organizations:

CARLA sues cities that break California state law to deny proposed projects. This is a surprisingly effective technique that is already producing results throughout the Bay Area, but lawsuits are expensive.
[Yimby Action][yimby] is a pro-housing lobbying group.
East Bay Forward is focused on building housing in the East Bay.
Donate to local pro-housing candidates. Volunteer for their campaigns; knock on doors.

Conclusion

Demand for housing in the Bay Area far outstrips supply, but it's really hard to build here, and there are a lot of people and procedures that want to keep things expensive and scarce. If you want to ever afford to buy a house here, it's time to start getting involved.

¹ This doesn't stop old people from showing up to planning meetings and saying "I worked hard and built a house here and these spoiled young people can too!" It's three or four times as difficult now as it was in 1970.

²The comment quality can vary widely - at one recent public comment section, a commenter worried what would happen to the "rare plants" on a vacant lot.

³70% of San Francisco's homeless were homed in San Francisco before they were on the street; it's a very local problem.

Liked what you read? I am available for hire.

Why Leap Seconds are Tricky and How to Deal With Them

Posted on January 3, 2017

You may have seen this on New Year's Eve:

Another leap second, another slew of outages. Handling time correctly is hard!https://t.co/kJepOfsKkv pic.twitter.com/Fwz2Xtpzkd
— Dan Luu (@danluu) January 1, 2017

I'd heard a little about this problem, but I didn't understand how it broke code, and what to do about it. So here is an explainer.

Background

Earlier this year, the International Earth Rotation and Reference Systems Service decided to add an additional second to the year. This is due to the fact that the Earth's day-to-day rotation does not perfectly line up with a 86400 second day.

So when the clock hit 23:59:59 on New Year's Eve, it usually ticks over to 00:00:00, but this year, repeated the second 23:59:59. So if you had measurement code that looked like:

a := time.Now() // Dec 31 2016 23:59:59 and 800ms
// While this operation is in progress, the leap second starts
doSomeExpensiveOperation()
duration := time.Since(a) // Taken at Dec 31 2016 23:59:59 (#2!) and 100ms
fmt.Println(duration)

We usually assume in our code that time can't go backwards, but the duration there is -700ms! This can cause big problems if you have code that always expects nonnegative time durations. In particular code that accepts timeouts might panic or immediately error if it gets a negative value. Cloudflare passed a negative duration to rand.Int63n, which crashed some of their servers.

But this is only a problem on leap seconds, which occur once a year at most. That's not good, but I can handle one problem a year. That would be nice, but unfortunately time on your servers can jump around quite a bit. An individual server might get a few seconds ahead or behind the rest of your servers. When it gets the correct time from the NTP server, the time on the server might jump forward or backward a few seconds. If you run any number of servers you are bound to hit this problem sooner or later.

Ok, how do I deal with it?

Good question! The Linux community implemented CLOCK_MONOTONIC as an option to clock_gettime to solve this problem. CLOCK_MONOTONIC returns an integer number of nanoseconds that only ever increases. You can use this to get more accurate (and always nonnegative) deltas than getting the system time twice and subtracting the samples.

You'll have to check whether your programming language uses CLOCK_MONOTONIC for implementing calls to get the system time. Many languages don't use it for the simple "what time is it" call, because it returns you time from a random starting point, not the epoch.

So you can't really translate between a CLOCK_MONOTONIC value and any given human date; it's only useful when you take multiple samples and compare them.

In particular

The Go standard library does not use CLOCK_MONOTONIC for calls to time.Now(). Go has a function that implements CLOCK_MONOTONIC, but it's not a public part of the standard library. You can access it via the monotime library - replace your calls to time.Now() with monotime.Now(). Note you will get back a uint64 that only makes sense if you call monotime.Now() again a little while later (on the same machine) and subtract it from the first value you got.

import "time"
import "github.com/aristanetworks/goarista/monotime"
func main() {
    a := monotime.Now()
    someExpensiveOperation()
    b := monotime.Now()
    // b - a will *always* be greater than 0
    fmt.Println(time.Duration(b - a))
}

Porting code errata

I ported Logrole (a Twilio log viewer) to implement monotonic time. In the past it was really easy to call t := time.Now() and then, later, somewhere else in the code, call diff := time.Since(t) to get a Duration. This let you both use t as a wall clock time, and get an elapsed amount of time since t, at some unspecified point later in the codebase and in time. With CLOCK_MONOTONIC you have to separate these two use cases; you can get a delta or you can get a Time but you can't get both.

The patch was pretty messy and an indication of the problems you might have in your own codebase.

Another problem to watch out for is a uint64 underflow. If you have code like this:

if since := time.Since(aDeadline); since > 0 {
    return nil, errors.New("deadline exceeded")
}

You might port it to something like:

now := monotime.Now()
deadline := now + uint64(500*time.Millisecond)
someExpensiveOperation()
if monotime.Now() - deadline > 0 {
    return nil, errors.New("deadline exceeded")
}

The problem here is you are subtracting uint64 values and if now is under the deadline, you'll underflow and get a very large uint64 value (which is greater than 0), so you'll always execute the if condition. Instead you need to avoid the subraction as well and use

if monotime.Now() > deadline {

instead.

Conclusion

It is hard to get this right, but it's probably a good practice to separate out wall clock time values from time values used for deltas and timeouts. If you don't do this from the start though you are going to run into problems. It would be nice if this was better supported in your language's standard library; ideally you'd be forced to figure out which use case you needed a time value for, and use it for that.

Liked what you read? I am available for hire.