Posts Tagged With: Code

Easy maintenance of your AUTHORS file

It's important to recognize the people that have contributed to your project, but it can be annoying to keep your project's AUTHORS file up to date, and annoying to ask everyone to add themselves in the correct format.

So I did what any good engineer should do, and automated the process! I added a simple make target that will update the authors file based on the current Git history:

    # Run this command to update AUTHORS.md with the latest contributors.
    authors:
        echo "Authors\n=======\nWe'd like to thank the following people for their contributions.\n\n" > AUTHORS.md
        git log --raw | grep "^Author: " | sort | uniq | cut -d ' ' -f2- | sed 's/^/- /' >> AUTHORS.md

Essentially, that snippet gets every author from the git history, sorts and unique-ifies the list, strips the word "Author: " from the line, and outputs the new list to the AUTHORS.md file. Now updating the authors list is as easy as running make authors from the command line.

Liked what you read? I am available for hire.

The SSL library ecosystem needs diversity

The Heartbleed bug was really bad for OpenSSL - it let you ask a server a simple question like "How are you" and then have the server tell you anything it wants (password data, private keys that could be used to decrypt all traffic), and the server would have no idea it was happening.

A lot of people have said that we should ditch OpenSSL because this bug is so bad, and because there are parts of the codebase that are odd, and would usually indicate bad programmers, except that they are found in a library that is deployed everywhere.

Ditching OpenSSL is not going to happen any time soon because it is the standard implementation for any server that has to terminate SSL traffic, and writing good crypto libraries is very difficult. So this is not a promising approach.

However this bug and the subsequent panic (as well as the flood of emails telling me to reset passwords etc) indicate the problem with having every software company in the world rely on the same library. Imagine that there were three different SSL software tools that each had a significant share of the market. A flaw in one could affect, at most, the users of that library. Diversification reduces the value of any one exploit and makes it more difficult to find general attacks against web servers.

This diversity is what makes humans so robust against things like the Spanish Flu, which killed ~100 million people but didn't make a dent on the overall human population. Compare that with the banana, which is susceptible to a virus that could wipe out the entire stock of bananas around the world.

You can see the benefits of diversity in two places. One, even inside the OpenSSL project, users had different versions of the library installed on their servers. This meant that servers that didn't have versions 1.0.1a-f installed (like Twilio) were not vulnerable, which was good.

The second is that servers use different programming languages and different frameworks. This means that the series of Rails CVE's were very bad for Rails deployments but didn't mean anything for anyone else (another good thing).

After Heartbleed I donated $100 to the OpenSSL Foundation, in part because it is really important and in part because it's saved me from having to think about encrypting communication with clients (most of the time) which is really, really neat. I will match that donation to other SSL libraries, under these conditions:

  • The library's source code is available to the public.

  • There is evidence that the code has been used in a production environment to terminate SSL connections.

  • The project has room for more funding.

This is not a very large incentive, but it's at least a step in the right direction; if you want to join my pledge, I'll update the dollar amounts and list your name in this post. A prize of $10 million put a rocket into space; I'm hoping it will help spur diversity in the SSL ecosystem as well.

Liked what you read? I am available for hire.

A look at the new retry behavior in urllib3

"Build software like a tank." I am not sure where I read this, but I think about it a lot, especially when writing HTTP clients. Tanks are incredible machines - they are designed to move rapidly and protect their inhabitants in any kind of terrain, against enemy gunfire, or worse.

HTTP clients often run in unfriendly territory, because they usually involve a network connection between machines. Connections can fail, packets can be dropped, the other party may respond very slowly, or with a new unknown error message, or they might even change the API from under you. All of this means writing an HTTP client like a tank is difficult. Here are some examples of things that a desirable HTTP client would do for you, that are never there by default.

  • If a request fails to reach the remote server, we would like to retry it no matter what. We don't want to wait around for the server forever though, so we want to set a timeout on the connection attempt.

  • If we send the request but the remote server doesn't respond in a timely manner, we want to retry it, but only on requests where it is safe to send the request again - so called idempotent requests.

  • If the server returns an unexpected response, we want to always retry if the server didn't do any processing - a 429, 502 or a 503 response usually indicate this - as well as all idempotent requests.

  • Generally we want to sleep between retries to allow the remote connection/server to recover, so to speak. To help prevent thundering herd problems, we usually sleep with an exponential back off.

Here's an example of how you might code this:

def resilient_request(method, uri, retries):
    while True:
        try:
            resp = requests.request(method, uri)
            if resp.status < 300:
                break
            if resp.status in [429, 502, 503]:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue
            if resp.status >= 500 and method in ['GET', 'PUT', 'DELETE']:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue
        except (ConnectionError, ConnectTimeoutError):
            retries -= 1
            if retries <= 0:
                raise
            time.sleep(2 ** (3 - retries))
        except TimeoutError:
            if method in ['GET', 'PUT', 'DELETE']:
                retries -= 1
                if retries <= 0:
                    raise
                time.sleep(2 ** (3 - retries))
                continue

Holy cyclomatic complexity, Batman! This suddenly got complex, and the control flow here is not simple to follow, reason about, or test. Better hope we caught everything, or we might end up in an infinite loop, or try to access resp when it has not been set. There are some parts of the above code that we can break into sub-methods, but you can't make the code too much more compact than it is there, since most of it is control flow. It's also a pain to write this type of code and verify its correctness; most people just try once, as this comment from the pip library illustrates. This is a shame and the reliability of services on the Internet suffers.

A better way

Andrey Petrov and I have been putting in a lot of work make it really, really easy for you to write resilient requests in Python. We pushed the complexity of the above code down into the urllib3 library, closer to the request that goes over the wire. Instead of the above, you'll be able to write this:

def retry_callable(method, response):
    """ Determine whether to retry this
    return ((response.status >= 400 and method in IDEMPOTENT_METHODS)
            or response.status in (429, 503))
retry = urllib3.util.Retry(read=3, backoff_factor=2,
                           retry_callable=retry_callable)
http = urllib3.PoolManager()
resp = http.request(method, uri, retries=retry)

You can pass a callable to the retries object to determine the retry behavior you'd like to see. Alternatively you can use the convenience method_whitelist and codes_whitelist helpers to specify which methods to retry.

retry = urllib3.util.Retry(read=3, backoff_factor=2,
                           codes_whitelist=set([429, 500]))
http = urllib3.PoolManager()
resp = http.request(method, uri, retries=retry)

And you will get out the same results as the 30 lines above. urllib3 will do all of the hard work for you to catch the conditions mentioned above, with sane (read: non-intrusive) defaults.

This is coming soon to urllib3 (and with it, to Python Requests and pip); we're looking for a bit more review on the pull request before we merge it. We hope this makes it easier for you to write high performance HTTP clients in Python, and appreciate your feedback!

Thanks to Andrey Petrov for reading a draft of this post.

Liked what you read? I am available for hire.

How to create rich links in your Sphinx documentation

This will be short, but it seems there's some difficulty doing this, so I thought I'd share.

The gist is, any time you reference a class or method in your own library, in the Python standard library, or in another third-party extension, you can provide a link directly to that project's documentation. This is pretty amazing and only requires a little bit of extra work from you. Here's how.

The Simplest Type of Link

Just create a link using the full import path of the class or attribute or method. Surround it with backticks like this:

Use :meth:`requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use requests.Request.get to make HTTP Get requests.

There are a few different types of declarations you can use at the beginning of that phrase:

:attr:
:class:
:meth:
:exc:

The full list is here.

I Don't Want to Link the Whole Thing

To specify just the method/attribute name and not any of the modules or classes that precede it, use a squiggly, like this:

Use :meth:`~requests.Request.get` to make HTTP Get requests.

That link will show up in text as:

Use get to make HTTP Get requests.

I Want to Write My Own Text

This gets a little trickier, but still doable:

Use :meth:`the get() method <requests.Request.get>` to make HTTP Get requests.

That link will show up in text as:

Use the get() method to make HTTP Get requests.

I want to link to someone else's docs

In your docs/conf.py file, add 'sphinx.ext.intersphinx' to the end of the extensions list near the top of the file. Then, add the following anywhere in the file:

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.intersphinx',
    ]
    # Add mappings
    intersphinx_mapping = {
        'urllib3': ('http://urllib3.readthedocs.org/en/latest', None),
        'python': ('http://docs.python.org/3', None),
    }

You can then link to other projects' documentation and then reference it the same way you do your own projects, and Sphinx will magically make everything work.

I want to write the documentation inline in my source code and link to it

Great! I love this as well. Add the 'sphinx.ext.autodoc' extension, then write your documentation. There's a full guide to the inline syntax on the Sphinx website; confusingly, it is not listed on the autodoc page.

    # Add the "intersphinx" extension
    extensions = [
        'sphinx.ext.autodoc',
    ]

Hope that helps! Happy linking.

Liked what you read? I am available for hire.

New blog post about HAProxy

Over on the Twilio Engineering Blog, I have a new post about optimizing your HAProxy configuration. I wrote this mostly because we had some confusion in our configuration about setting options, and if I had it I figured others would as well. Here's a sample:

retries 2
option redispatch
When I said a 30 second connect timeout meant HAProxy would try a bad connection for 30 seconds, I lied. It turns out that by default HAProxy will retry the connect attempt 3 times. So our 30 second connect timeout is actually a 120 second connect timeout, blowing through our SLA and meaning we're returning an empty response to the customer.

Read the full post to learn more about HAProxy.

Liked what you read? I am available for hire.

Automating your IPython Notebook Setup (and getting launchctl to work)

Recently I've fallen in love with the IPython Notebook. It's the Python REPL on steroids and I've probably just scratched the surface of what it can actually do. This will be a short post because long posts make me feel pain when I think about blogging more again. This is also really more about setting up launchctl than IPython, but hopefully that's useful too.

Starting it from the command line is kind of a pain (it tries to save .ipynb files in your current directory, it warns you to save files before closing tabs) so I thought I'd just set it up to run in the background each time I run ipython. Here's how you can get that set up.

Create a virtualenv with iPython

First, you need to install the ipython binary, and the other packages you need to run IPython Notebook.

    # Install virtualenvwrapper, then source it
    pip install virtualenvwrapper
    source /path/to/virtualenvwrapper.sh

mkvirtualenv ipython
pip install ipython tornado pyzmq

Starting IPython When Your Mac Boots

Open a text editor and add the following:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
      <key>Label</key>
      <string>com.kevinburke.ipython</string>
      <key>ProgramArguments</key>
      <array>
          <string>/Users/kevin/.envs/ipython/bin/ipython</string>
          <string>notebook</string>
      </array>
      <key>RunAtLoad</key>
      <true/>
      <key>StandardOutPath</key>
      <string>/Users/kevin/var/log/ipython.log</string>
      <key>StandardErrorPath</key>
      <string>/Users/kevin/var/log/ipython.err.log</string>
      <key>ServiceDescription</key>
      <string>ipython notebook runner</string>
      <key>WorkingDirectory</key>
      <string>/Users/kevin/.ipython_notebooks</string>
    </dict>
    </plist>

You will need to replace the word kevin with your name and relevant file locations on your file system. I also save my notebooks in a directory called .ipython_notebooks in my home directory, you may want to add that as well.

Save that in /Library/LaunchDaemons/<yourname>.ipython.plist. Then change the owner to root:

sudo chown root:wheel /Library/LaunchDaemons/<yourname>.ipython.plist

Finally load it:

sudo launchctl load -w /Library/LaunchDaemons/<yourname>.ipython.plist

If everything went ok, IPython should open in a tab. If it didn't go okay, check /var/log/system.log for errors, or one of the two logfiles specified in your plist.

Additional Steps

That's it! I've also found it really useful to run an nginx redirecter locally, as well as a new rule in /etc/hosts, so I can visit http://ipython and get redirected to my notebooks. But that is a topic for a different blog post.

Liked what you read? I am available for hire.

Speeding up test runs by 81% and 13 minutes

Yesterday I sped up our unit/integration test runs from 16 minutes to 3 minutes. I thought I'd share the techniques I used during this process.

  • We had a hunch that an un-mocked network call was taking 3 seconds to time out. I patched this call throughout the test code base. It turns out this did not have a significant effect on the runtime of our tests, but it's good to mock out network calls anyway, even if they fail fast.

  • I ran a profiler on the code. Well that's not true, I just timed various parts of the code to see how long they took, using some code like this:

    import datetime
    start = datetime.datetime.now()
    some_expensive_call()
    total = (datetime.datetime.now() - start).total_seconds()
    print "some_expensive_call took {} seconds".format(total)
    

    It took about ten minutes to zero in on the fixture loader, which was doing something like this:

    def load_fixture(fixture):
        model = find_fixture_in_db(fixture['id'])
        if not model:
            create_model(**fixture)
        else:
            update_model(model, fixture)
    

    The call to find_fixture_in_db was doing a "full table scan" of our SQLite database, and taking about half of the run-time of the integration tests. Moreover in our case it was completely unnecessary, as we were deleting and re-inserting everything with every test run.

    I added a flag to the fixture loader to skip the database lookup if we were doing all inserts. This sped up observed test time by about 35%.

  • I noticed that the local test runner and the Jenkins build runner were running different numbers of tests. This was really confusing. I ended up doing some fancy stuff with the xunit xml output to figure out which extra tests were running locally. Turns out, the same test was running multiple times. The culprit was a stray line in our Makefile:

    nosetests tests/unit tests/unit/* ...
    

    The tests/unit/* change was running all of the tests in compiled .pyc files as well! I felt dumb because I actually added that tests/unit/* change about a month ago, thinking that nosetests wasn't actually running some of the tests in subfolders of our repository. This change cut down on the number of tests run by a factor of 2, which significantly helped the test run time.

  • The Jenkins package install process would remove and re-install the virtualenv before every test run, to ensure we got up-to-date dependencies with every run. Well that was kind of stupid, so instead we switched to running

    pip install --upgrade .
    

on our setup.py file, which should pull in the correct version of dependencies when they changed (most of them are specified either with double-equals, == or greater-than, >=, signs). Needless to say, skipping the full test run every time saved about three to four minutes.

  • I noticed that pip would still uninstall and reinstall packages that were already there. This happened for two reasons. One, our Jenkins box is running an older version of pip, which doesn't have this change from pip 1.1:

    Fixed issue #49 - pip install -U no longer reinstalls the same versions of packages. Thanks iguananaut for the pull request.

    I upgraded the pip and virtualenv versions inside of our virtualenv.

    Also, one dependency in our tests/requirements.txt would install the latest version of requests, which would then be overridden in setup.py by a very specific version of requests, every single time the tests ran. I fixed this by explicitly setting the requests version in the tests/requirements.txt file.

That's it! There was nothing major that was wrong with our process, just fixing the way we did a lot of small things throughout the build. I have a couple of other ideas to speed up the tests, including loading fewer fixtures per test and/or instantiating some objects like Flask's test_client globally instead of once per test. You might not have been as dumb as we were but you'll likely find some speedups if you check your build process as well.

Liked what you read? I am available for hire.

Eliminating more trivial inconveniences

I really enjoyed Sam Saffron's post about eliminating trivial inconveniences in his development process. This resonated with me as I tend to get really distracted by minor hiccups in the development process (page reload taking >2 seconds, switch to a new tab, etc). I took a look at my development process and found a few easy wins.

Automatically run the unit tests in the current file

Twilio's PHP test suite are really slow - we're sloppy about trying to have unit tests avoid hitting the disk, which means that the suite takes a while to run. I wrote a short vim command that will run only the tests in the current file. This tends to make the test iteration loop much, much faster and I can run the entire suite of tests once the current file is passing. The <leader> function in Vim is excellent and I recommend you become familiar with it.

nnoremap <leader>n :execute "!" . "/usr/local/bin/phpunit " . bufname('%') . ' \| grep -v Configuration \| egrep -v "^$" '<CR>

bufname('%') is the file name of the current Vim buffer, and the last two commands are just grepping away output I don't care about. The result is awesome:

Unit test result in vim

Auto reloading the current tab when you change CSS

Sam has a pretty excellent MessageBus option that listens for changes to CSS files, and auto-refreshes a tab when this happens. We don't have anything that good yet but I added a vim leader command to refresh the current page in the browser. By the time I switch from Vim to Chrome (or no time, if I'm viewing them side by side), the page is reloaded.

function! ReloadChrome()
    execute 'silent !osascript ' . 
                \'-e "tell application \"Google Chrome\" " ' .
                \'-e "repeat with i from 1 to (count every window)" ' .
                \'-e "tell active tab of window i" ' . 
                \'-e "reload" ' .
                \'-e "end tell" ' .
                \'-e "end repeat" ' .
                \'-e "end tell" >/dev/null'
endfunction

nnoremap <leader>o :call ReloadChrome()<CR>:pwd<cr>

Then I just hit <leader>o and Chrome reloads the current tab. This works even if you have the "Developer Tools" open as a separate window, and focused - it reloads the open tab in every window of Chrome.

Pushing the current git branch to origin

It turns out that the majority of my git pushes are just pushing the current git branch to origin. So instead of typing git push origin <branch-name> 100 times a day I added this to my .zshrc:

    push_branch() {
        branch=$(git rev-parse --symbolic-full-name --abbrev-ref HEAD)
        git push $1 $branch
    }
    autoload push_branch
    alias gpob='push_branch origin'

I use this for git pushes almost exclusively now.

Auto reloading our API locally

The Twilio API is based on the open-source flask-restful project, running behind uWSGI. One problem we had was changes to the application code would require a full uWSGI restart, which made local development a pain. Until recently, it was pretty difficult to get new Python code running in uWSGI besides doing a manual reload - you had to implement a file watcher yourself, and then communicate to the running process. But last year uWSGI enabled the py-auto-reload feature, where uWSGI will poll for changes in your application and automatically reload itself. Enable it in your uWSGI config with

py-auto-reload = 1   # 1 second between polls

Or at the command line with uwsgi --py-auto-reload=1.

Conclusion

These changes have all made me a little bit quicker, and helped me learn more about the tools I use on a day to day basis. Hope they're useful to you as well!

Liked what you read? I am available for hire.

Submit forms using Javascript without breaking the Internet, a short guide

Do you write forms on the Internet? Are you planning to send them to your server with Javascript? You should read this.

The One-Sentence Summary

It's okay to submit forms with Javascript. Just don't break the internet.

What Do You Mean, Break the Internet?

Your browser is an advanced piece of software that functions in a specific way, often for very good reasons. Ignore these reasons and annoy your users. User annoyance translates into lower revenue for you.

Here are some of the ways your Javascript form submit can break the Internet.

Submitting to a Different Endpoint Than the Form Action

A portion of your users are browsing the web without Javascript enabled. Some of them, like my friend Andrew, are paranoid. Others are on slow connections and want to save bandwidth. Still others are blind and browse the web with the help of screen readers.

All of these people, when they submit your form, will not hit your fancy Javascript event handler; they will submit the form using the default action and method for the form - which, if unspecified, default to a GET to the current page. Likely, this does not actually submit the form. Which leads to my favorite quote from Mark Pilgrim:

Jakob Nielsen's dog

There is an easy fix: make the form action and method default to the same endpoint that you are POSTing to with Javascript.

You are probably returning some kind of JSON object with an error or success message and then redirecting the user in Javascript. Just change your server endpoint to redirect if the request is not an AJAX request. You can do this because all browsers attach an X-Requested-With: XMLHttpRequest HTTP header to asynchronous Javascript requests.

Changing Parameter Names

Don't change the names of the submitted parameters in Javascript - just submit the same names that you had in your form. In jQuery this is easy, just call the serialize method on the form.

var form = $("#form-id");
$.post('endpoint', $(form).serialize(), function(response) {
    // do something with the response.
});

Attaching the Handler to a Click Action

Believe it or not, there are other ways of submitting a form besides clicking on the submit button. Screen readers, for example, don't click, they submit the form. Also there are lots of people like me who use tab to move between form fields and press the spacebar to submit forms. This means if your form submit starts with:

$("#submit-button").click(function() {
    // Submit the form.
});

You are doing it wrong and breaking the Internet for people like me. You would not believe how many sites don't get this right. Examples in the past week: WordPress, Mint's login page, JetBrains's entire site.

The correct thing to do is attach the event handler to the form itself.

$("#form-id").submit(function() {
    // Write code to submit the form with Javascript
    return false; // Prevents the default form submission
});

This will attach the event to the form however the user submits it. Note the use of return false to avoid submitting the form.

Validation

It's harder to break the Internet with validation. To give fast feedback loop to the user, you should detect and prevent invalid input on the client side.

The annoying thing is you have to do this on both the client side and the server side, in case the user gets past the client side checks. The good news is the browser can help with most of the easy stuff. For example, if you want to check that an email address is valid, use the "email" input type:

<input type="email" />

Then your browser won't actually submit a form that doesn't have a valid email. Similarly you can note required fields with the required HTML attribute. This makes validation on the client a little easier for most of the cases you're trying to check for.

Summary

You can submit forms with Javascript, but most of the time you'll have to put in extra effort to duplicate functionality that already exists in your browser. If you're going to go down that road, please put in the extra effort.

Liked what you read? I am available for hire.

Helping Beginners Get HTML Right

If you've ever tried to teach someone HTML, you know how hard it is to get the syntax right. It's a perfect storm of awfulness.

  • Newbies have to learn all of the syntax, in addition to the names of HTML elements. They don't have the pattern matching skills (yet) to notice when their XML is not right, or the domain knowledge to know it's spelled "href" and not "herf".

  • The browser doesn't provide feedback when you make mistakes - it will render your mistakes in unexpected and creative ways. Miss a closing tag and watch your whole page suddenly acquire italics, or get pasted inside a textarea. Miss a quotation mark and half the content disappears. Add in layouts with CSS and the problem doubles in complexity.

  • Problems tend to compound. If you make a mistake in one place and don't fix it immediately, you can't determine whether future additions are correct.

This leads to a pretty miserable experience getting started - people should be focused on learning how to make an amazingly cool thing in their browser, but instead they get frustrated trying to figure out why the page doesn't look right.

Let's Make Things A Little Less Awful

What can we do to help? The existing tools to help people catch HTML mistakes aren't great. Syntax highlighting helps a little, but sometimes the errors look as pretty as the actual text. XML validators are okay, but tools like HTML Validator spew out red herrings as often as they do real answers. Plus, you have to do work - open the link, copy your HTML in, read the output - to use it.

We can do better. Most of the failures of the current tools are due to the complexity of HTML - which, if you are using all of the features, is Turing complete. But new users are rarely exercising the full complexity of HTML5 - they are trying to learn the principles. Furthermore the mistakes they are making follow a Pareto distribution - a few problems cause the majority of the mistakes.

Catching Mistakes Right Away

To help with these problems I've written an validator which checks for the most common error types, and displays feedback to the user immediately when they refresh the page - so they can instantly find and correct mistakes. It works in the browser, on the page you're working with, so you don't have to do any extra work to validate your file.

Best of all, you can drop it into your HTML file in one line:

</p>

<script type="text/javascript" src="https://raw.github.com/kevinburke/tecate/master/tecate.js"></script>

<p>

Then if there's a problem with your HTML, you'll start getting nice error messages, like this:

error message

Read more about it here, and use it in your next tutorial. I hope you like it, and I hope it helps you with debugging HTML!

It's not perfect - there are a lot of improvements to be made, both in the errors we can catch and on removing false positives. But I hope it's a start.

PS: Because the browser will edit the DOM tree to wipe the mistakes users make, I have to use raw regular expressions to check for errors. I have a feeling I will come to regret this. After all, when parsing HTML with regex, it's clear that the <center> cannot hold. I am accepting this tool will give wrong answers on some HTML documents; I am hoping that the scope of documents turned out by beginning HTML users is simple enough that the center can hold.

Liked what you read? I am available for hire.