Author Archives: kevin

About kevin

I write the posts

The SSL library ecosystem needs diversity

The Heartbleed bug was really bad for OpenSSL - it let you ask a server a simple question like "How are you" and then have the server tell you anything it wants (password data, private keys that could be used to decrypt all traffic), and the server would have no idea it was happening.

A lot of people have said that we should ditch OpenSSL because this bug is so bad, and because there are parts of the codebase that are odd, and would usually indicate bad programmers, except that they are found in a library that is deployed everywhere.

Ditching OpenSSL is not going to happen any time soon because it is the standard implementation for any server that has to terminate SSL traffic, and writing good crypto libraries is very difficult. So this is not a promising approach.

However this bug and the subsequent panic (as well as the flood of emails telling me to reset passwords etc) indicate the problem with having every software company in the world rely on the same library. Imagine that there were three different SSL software tools that each had a significant share of the market. A flaw in one could affect, at most, the users of that library. Diversification reduces the value of any one exploit and makes it more difficult to find general attacks against web servers.

This diversity is what makes humans so robust against things like the Spanish Flu, which killed ~100 million people but didn't make a dent on the overall human population. Compare that with the banana, which is susceptible to a virus that could wipe out the entire stock of bananas around the world.

You can see the benefits of diversity in two places. One, even inside the OpenSSL project, users had different versions of the library installed on their servers. This meant that servers that didn't have versions 1.0.1a-f installed (like Twilio) were not vulnerable, which was good.

The second is that servers use different programming languages and different frameworks. This means that the series of Rails CVE's were very bad for Rails deployments but didn't mean anything for anyone else (another good thing).

After Heartbleed I donated $100 to the OpenSSL Foundation, in part because it is really important and in part because it's saved me from having to think about encrypting communication with clients (most of the time) which is really, really neat. I will match that donation to other SSL libraries, under these conditions:

  • The library's source code is available to the public.

  • There is evidence that the code has been used in a production environment to terminate SSL connections.

  • The project has room for more funding.

This is not a very large incentive, but it's at least a step in the right direction; if you want to join my pledge, I'll update the dollar amounts and list your name in this post. A prize of $10 million put a rocket into space; I'm hoping it will help spur diversity in the SSL ecosystem as well.

Liked what you read? I am available for hire.

Source code stolen from Github.com

The open source community was shocked to learn Tuesday that millions of lines of source code had gone missing from Github.com, a popular online version control website.

Github stores source code in "reposotories", which are big chunks of code that can be edited by Github members. Most version control websites will keep a small portion of the source code online (collectively known as the "hot repos") and store the rest of the repos offline, to prevent a mass download of all of the source code. Instead of using hot repos and cold repos, Github stored all of the source code online, which allowed the attackers to download all of it.

It's unclear how long the source code has been missing. Slides from a leaked Keynote deck indicated that Github's main strategy was to "just kinda ask people to push their code back up to the site without noticing anything". On Twitter, some people attributed the theft to an honest mistake (Github left the popular port 22 open for the attackers), while others speculated that the founders absconded with the code after building up trust in Github.

Github is based on a "distributed version control" system, designed so that many different copies of the source code can live on different computers. But because everyone stores their source code in Github, it became very easy for the attackers to download all of the source code from one place.

"My code could be running on anyone's computer right now, anywhere in the world," said open source developer Andrew Benton. "Frankly, that is terrifying." Other members of the community laughed at anyone who thought their source code was secure when hosted with a version control system that runs in the cloud.

Github could not be reached for comment, but they did release a special "Hackedocat" to commemorate the occasion.

Hackedocat!

At press time, the top comments on Hacker News were from a person complaining about how dumb Github is for losing the code, another person explaining to everyone that this article is satire, and a third person explaining that while he understands this is satire, the article is "dumb" and "not that funny", and seven non-sequiturs about the wisdom of free markets.

with thanks to Kyle Conroy, Andrew Benton, and Gabriel Gironda for reading drafts, and to Kyle for the Hackedocat

Liked what you read? I am available for hire.

Go to Hong Kong Disneyland

On a Friday in April, the lines for Space Mountain and Thunder Valley did not exceed 10 minutes all day. Astro Blasters twice had 4 people in line. The price is $60, which is ~$30 cheaper than Anaheim.

It was about 80 degrees during the day and 70 at night. Also when you exit Disneyland HK you are in Hong Kong, which is much nicer than Orange County, but maybe not for dim sum. You can take the metro from downtown and get there in under 45 minutes.

There are analogues for most rides in Disneyland, noticeable absences are Indiana Jones and Pirates of the Caribbean.

Big draws are the character shows, Winnie the Pooh, and Fantasyland, so if you’re not really interested in those you are in lots of luck.

Liked what you read? I am available for hire.

A look at the new retry behavior in urllib3

"Build software like a tank." I am not sure where I read this, but I think about it a lot, especially when writing HTTP clients. Tanks are incredible machines - they are designed to move rapidly and protect their inhabitants in any kind of terrain, against enemy gunfire, or worse.

HTTP clients often run in unfriendly territory, because they usually involve a network connection between machines. Connections can fail, packets can be dropped, the other party may respond very slowly, or with a new unknown error message, or they might even change the API from under you. All of this means writing an HTTP client like a tank is difficult. Here are some examples of things that a desirable HTTP client would do for you, that are never there by default.

  • If a request fails to reach the remote server, we would like to retry it no matter what. We don't want to wait around for the server forever though, so we want to set a timeout on the connection attempt.

  • If we send the request but the remote server doesn't respond in a timely manner, we want to retry it, but only on requests where it is safe to send the request again - so called idempotent requests.

  • If the server returns an unexpected response, we want to always retry if the server didn't do any processing - a 429, 502 or a 503 response usually indicate this - as well as all idempotent requests.

  • Generally we want to sleep between retries to allow the remote connection/server to recover, so to speak. To help prevent thundering herd problems, we usually sleep with an exponential back off.

Here's an example of how you might code this:

def resilient_request(method, uri, retries):
    while True:
        try:
            resp = requests.request(method, uri)
            if resp.status < 300:
                break
            if resp.status in [429, 502, 503]:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue
            if resp.status >= 500 and method in ['GET', 'PUT', 'DELETE']:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue
        except (ConnectionError, ConnectTimeoutError):
            retries -= 1
            if retries <= 0:
                raise
            time.sleep(2 ** (3 - retries))
        except TimeoutError:
            if method in ['GET', 'PUT', 'DELETE']:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue

Holy cyclomatic complexity, Batman! This suddenly got complex, and the control flow here is not simple to follow, reason about, or test. Better hope we caught everything, or we might end up in an infinite loop, or try to access resp when it has not been set. There are some parts of the above code that we can break into sub-methods, but you can't make the code too much more compact than it is there, since most of it is control flow. It's also a pain to write this type of code and verify its correctness; most people just try once, as this comment from the pip library illustrates. This is a shame and the reliability of services on the Internet suffers.

A better way

Andrey Petrov and I have been putting in a lot of work make it really, really easy for you to write resilient requests in Python. We pushed the complexity of the above code down into the urllib3 library, closer to the request that goes over the wire. Instead of the above, you'll be able to write this:

def retry_callable(method, response):
    """ Determine whether to retry this
    return ((response.status >= 400 and method in IDEMPOTENT_METHODS)
            or response.status in (429, 503))
retry = urllib3.util.Retry(read=3, backoff_factor=2,
                           retry_callable=retry_callable)
http = urllib3.PoolManager()
resp = http.request(method, uri, retries=retry)

You can pass a callable to the retries object to determine the retry behavior you'd like to see. Alternatively you can use the convenience method_whitelist and codes_whitelist helpers to specify which methods to retry.

retry = urllib3.util.Retry(read=3, backoff_factor=2,
                           codes_whitelist=set([429, 500]))
http = urllib3.PoolManager()
resp = http.request(method, uri, retries=retry)

And you will get out the same results as the 30 lines above. urllib3 will do all of the hard work for you to catch the conditions mentioned above, with sane (read: non-intrusive) defaults.

This is coming soon to urllib3 (and with it, to Python Requests and pip); we're looking for a bit more review on the pull request before we merge it. We hope this makes it easier for you to write high performance HTTP clients in Python, and appreciate your feedback!

Thanks to Andrey Petrov for reading a draft of this post.

Liked what you read? I am available for hire.

How to create rich links in your Sphinx documentation

This will be short, but it seems there's some difficulty doing this, so I thought I'd share.

The gist is, any time you reference a class or method in your own library, in the Python standard library, or in another third-party extension, you can provide a link directly to that project's documentation. This is pretty amazing and only requires a little bit of extra work from you. Here's how.

The Simplest Type of Link

Just create a link using the full import path of the class or attribute or method. Surround it with backticks like this:

Use :meth:`requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use requests.Request.get to make HTTP Get requests.

There are a few different types of declarations you can use at the beginning of that phrase:

:attr:
:class:
:meth:
:exc:

The full list is here.

I Don't Want to Link the Whole Thing

To specify just the method/attribute name and not any of the modules or classes that precede it, use a squiggly, like this:

Use :meth:`~requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use get to make HTTP Get requests.

I Want to Write My Own Text

This gets a little trickier, but still doable:

Use :meth:`the get() method <requests.Request.get>` to make HTTP Get requests.

That link will show up in text as:

Use the get() method to make HTTP Get requests.

I want to link to someone else's docs

In your docs/conf.py file, add 'sphinx.ext.intersphinx' to the end of the extensions list near the top of the file. Then, add the following anywhere in the file:

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.intersphinx',
    ]
    # Add mappings
    intersphinx_mapping = {
        'urllib3': ('http://urllib3.readthedocs.org/en/latest', None),
        'python': ('http://docs.python.org/3', None),
    }

You can then link to other projects' documentation and then reference it the same way you do your own projects, and Sphinx will magically make everything work.

I want to write the documentation inline in my source code and link to it

Great! I love this as well. Add the 'sphinx.ext.autodoc' extension, then write your documentation. There's a full guide to the inline syntax on the Sphinx website; confusingly, it is not listed on the autodoc page.

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.autodoc',
    ]

Hope that helps! Happy linking.

Liked what you read? I am available for hire.

New blog post about HAProxy

Over on the Twilio Engineering Blog, I have a new post about optimizing your HAProxy configuration. I wrote this mostly because we had some confusion in our configuration about setting options, and if I had it I figured others would as well. Here's a sample:

retries 2
option redispatch
When I said a 30 second connect timeout meant HAProxy would try a bad connection for 30 seconds, I lied. It turns out that by default HAProxy will retry the connect attempt 3 times. So our 30 second connect timeout is actually a 120 second connect timeout, blowing through our SLA and meaning we're returning an empty response to the customer.

Read the full post to learn more about HAProxy.

Liked what you read? I am available for hire.

Automating your IPython Notebook Setup (and getting launchctl to work)

Recently I've fallen in love with the IPython Notebook. It's the Python REPL on steroids and I've probably just scratched the surface of what it can actually do. This will be a short post because long posts make me feel pain when I think about blogging more again. This is also really more about setting up launchctl than IPython, but hopefully that's useful too.

Starting it from the command line is kind of a pain (it tries to save .ipynb files in your current directory, it warns you to save files before closing tabs) so I thought I'd just set it up to run in the background each time I run ipython. Here's how you can get that set up.

Create a virtualenv with iPython

First, you need to install the ipython binary, and the other packages you need to run IPython Notebook.

    # Install virtualenvwrapper, then source it
    pip install virtualenvwrapper
    source /path/to/virtualenvwrapper.sh

mkvirtualenv ipython
pip install ipython tornado pyzmq

Starting IPython When Your Mac Boots

Open a text editor and add the following:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
      <key>Label</key>
      <string>com.kevinburke.ipython</string>
      <key>ProgramArguments</key>
      <array>
          <string>/Users/kevin/.envs/ipython/bin/ipython</string>
          <string>notebook</string>
      </array>
      <key>RunAtLoad</key>
      <true/>
      <key>StandardOutPath</key>
      <string>/Users/kevin/var/log/ipython.log</string>
      <key>StandardErrorPath</key>
      <string>/Users/kevin/var/log/ipython.err.log</string>
      <key>ServiceDescription</key>
      <string>ipython notebook runner</string>
      <key>WorkingDirectory</key>
      <string>/Users/kevin/.ipython_notebooks</string>
    </dict>
    </plist>

You will need to replace the word kevin with your name and relevant file locations on your file system. I also save my notebooks in a directory called .ipython_notebooks in my home directory, you may want to add that as well.

Save that in /Library/LaunchDaemons/<yourname>.ipython.plist. Then change the owner to root:

sudo chown root:wheel /Library/LaunchDaemons/<yourname>.ipython.plist

Finally load it:

sudo launchctl load -w /Library/LaunchDaemons/<yourname>.ipython.plist

If everything went ok, IPython should open in a tab. If it didn't go okay, check /var/log/system.log for errors, or one of the two logfiles specified in your plist.

Additional Steps

That's it! I've also found it really useful to run an nginx redirecter locally, as well as a new rule in /etc/hosts, so I can visit http://ipython and get redirected to my notebooks. But that is a topic for a different blog post.

Liked what you read? I am available for hire.

Speeding up test runs by 81% and 13 minutes

Yesterday I sped up our unit/integration test runs from 16 minutes to 3 minutes. I thought I'd share the techniques I used during this process.

  • We had a hunch that an un-mocked network call was taking 3 seconds to time out. I patched this call throughout the test code base. It turns out this did not have a significant effect on the runtime of our tests, but it's good to mock out network calls anyway, even if they fail fast.

  • I ran a profiler on the code. Well that's not true, I just timed various parts of the code to see how long they took, using some code like this:

    import datetime
    start = datetime.datetime.now()
    some_expensive_call()
    total = (datetime.datetime.now() - start).total_seconds()
    print "some_expensive_call took {} seconds".format(total)
    

    It took about ten minutes to zero in on the fixture loader, which was doing something like this:

    def load_fixture(fixture):
        model = find_fixture_in_db(fixture['id'])
        if not model:
            create_model(**fixture)
        else:
            update_model(model, fixture)
    

    The call to find_fixture_in_db was doing a "full table scan" of our SQLite database, and taking about half of the run-time of the integration tests. Moreover in our case it was completely unnecessary, as we were deleting and re-inserting everything with every test run.

    I added a flag to the fixture loader to skip the database lookup if we were doing all inserts. This sped up observed test time by about 35%.

  • I noticed that the local test runner and the Jenkins build runner were running different numbers of tests. This was really confusing. I ended up doing some fancy stuff with the xunit xml output to figure out which extra tests were running locally. Turns out, the same test was running multiple times. The culprit was a stray line in our Makefile:

    nosetests tests/unit tests/unit/* ...
    

    The tests/unit/* change was running all of the tests in compiled .pyc files as well! I felt dumb because I actually added that tests/unit/* change about a month ago, thinking that nosetests wasn't actually running some of the tests in subfolders of our repository. This change cut down on the number of tests run by a factor of 2, which significantly helped the test run time.

  • The Jenkins package install process would remove and re-install the virtualenv before every test run, to ensure we got up-to-date dependencies with every run. Well that was kind of stupid, so instead we switched to running

    pip install --upgrade .
    

on our setup.py file, which should pull in the correct version of dependencies when they changed (most of them are specified either with double-equals, == or greater-than, >=, signs). Needless to say, skipping the full test run every time saved about three to four minutes.

  • I noticed that pip would still uninstall and reinstall packages that were already there. This happened for two reasons. One, our Jenkins box is running an older version of pip, which doesn't have this change from pip 1.1:

    Fixed issue #49 - pip install -U no longer reinstalls the same versions of packages. Thanks iguananaut for the pull request.

    I upgraded the pip and virtualenv versions inside of our virtualenv.

    Also, one dependency in our tests/requirements.txt would install the latest version of requests, which would then be overridden in setup.py by a very specific version of requests, every single time the tests ran. I fixed this by explicitly setting the requests version in the tests/requirements.txt file.

That's it! There was nothing major that was wrong with our process, just fixing the way we did a lot of small things throughout the build. I have a couple of other ideas to speed up the tests, including loading fewer fixtures per test and/or instantiating some objects like Flask's test_client globally instead of once per test. You might not have been as dumb as we were but you'll likely find some speedups if you check your build process as well.

Liked what you read? I am available for hire.

Eliminating more trivial inconveniences

I really enjoyed Sam Saffron's post about eliminating trivial inconveniences in his development process. This resonated with me as I tend to get really distracted by minor hiccups in the development process (page reload taking >2 seconds, switch to a new tab, etc). I took a look at my development process and found a few easy wins.

Automatically run the unit tests in the current file

Twilio's PHP test suite are really slow - we're sloppy about trying to have unit tests avoid hitting the disk, which means that the suite takes a while to run. I wrote a short vim command that will run only the tests in the current file. This tends to make the test iteration loop much, much faster and I can run the entire suite of tests once the current file is passing. The <leader> function in Vim is excellent and I recommend you become familiar with it.

nnoremap <leader>n :execute "!" . "/usr/local/bin/phpunit " . bufname('%') . ' \| grep -v Configuration \| egrep -v "^$" '<CR>

bufname('%') is the file name of the current Vim buffer, and the last two commands are just grepping away output I don't care about. The result is awesome:

Unit test result in vim

Auto reloading the current tab when you change CSS

Sam has a pretty excellent MessageBus option that listens for changes to CSS files, and auto-refreshes a tab when this happens. We don't have anything that good yet but I added a vim leader command to refresh the current page in the browser. By the time I switch from Vim to Chrome (or no time, if I'm viewing them side by side), the page is reloaded.

function! ReloadChrome()
    execute 'silent !osascript ' . 
                \'-e "tell application \"Google Chrome\" " ' .
                \'-e "repeat with i from 1 to (count every window)" ' .
                \'-e "tell active tab of window i" ' . 
                \'-e "reload" ' .
                \'-e "end tell" ' .
                \'-e "end repeat" ' .
                \'-e "end tell" >/dev/null'
endfunction

nnoremap <leader>o :call ReloadChrome()<CR>:pwd<cr>

Then I just hit <leader>o and Chrome reloads the current tab. This works even if you have the "Developer Tools" open as a separate window, and focused - it reloads the open tab in every window of Chrome.

Pushing the current git branch to origin

It turns out that the majority of my git pushes are just pushing the current git branch to origin. So instead of typing git push origin <branch-name> 100 times a day I added this to my .zshrc:

    push_branch() {
        branch=$(git rev-parse --symbolic-full-name --abbrev-ref HEAD)
        git push $1 $branch
    }
    autoload push_branch
    alias gpob='push_branch origin'

I use this for git pushes almost exclusively now.

Auto reloading our API locally

The Twilio API is based on the open-source flask-restful project, running behind uWSGI. One problem we had was changes to the application code would require a full uWSGI restart, which made local development a pain. Until recently, it was pretty difficult to get new Python code running in uWSGI besides doing a manual reload - you had to implement a file watcher yourself, and then communicate to the running process. But last year uWSGI enabled the py-auto-reload feature, where uWSGI will poll for changes in your application and automatically reload itself. Enable it in your uWSGI config with

py-auto-reload = 1   # 1 second between polls

Or at the command line with uwsgi --py-auto-reload=1.

Conclusion

These changes have all made me a little bit quicker, and helped me learn more about the tools I use on a day to day basis. Hope they're useful to you as well!

Liked what you read? I am available for hire.

Disability benefits: the new unemployment checks

This American Life is an excellent podcast, but occasionally puts out episodes on subjects I don't care for - fiction, reminisces about home life, etc. There is one heuristic you should use for filtering American Life podcasts: listen to the podcasts they release that tell one story for the whole hour.

Example whole-hour podcasts, that are great stories: the NUMMI plant in Fremont, a story about Amanda Williams and juvenile justice in Georgia, a story on the Social Contract and why it's so hard to fix the country's current budget problems.

The latest episode of This American Life spans an entire episode, and is similarly excellent. Ostensibly it's about healthcare in the US, but the true story is about a class of US citizens who are no longer fit for the workplace, and the steps they're taking to cope.

Occupational change over time is completely normal, and in fact, a very good thing for everyone. At one point in time, 98% of US workers were farmers. Imagine if the government had implemented protective measures for jobs in farming that were at risk of disappearing, as farming tools got better and workers became more productive. It would have prolonged the use of inefficient farming techniques and delayed moves into more productive industries.

Historically, sectoral shifts in the US economy have been handled without too much disruption to society. Workers retire in less productive sectors, and new graduates enter in promising industries. Of course in individual instances a mill may shut down and leave people without a job but on the whole it's worked out okay.

Lately there's been lots of evidence that the economy is starting to shift much faster than the retirement/new entry process can adjust to. The result is a giant swath of society that is unable to contribute in a meaningful way, or earn their keep. This American Life focuses in on this group of people, currently numbering in the tens of millions (as well as the group of rent seekers catering to this group). I'd suggest you tune in, because this problem is not going away.

I don't have solutions or criticism; the story is more sad than anything. You should be tuned into what is happening with the workforce in the US today, especially when most of us live in areas surrounded by people that share our socioeconomic background and status.

I'd encourage you to read Kevin Kelly's recent post on The Post-Productive Economy. It's one view of where we might be heading.

Liked what you read? I am available for hire.