Author Archives: kevin

About kevin

I write the posts

We Can’t Keep Ignoring the Bay’s Housing Politics

Do you work in the tech industry in the Bay Area? You should start learning about, and getting involved in, local housing politics.

The prognosis for housing and rent prices is bad, and things are likely going to get worse for tech workers in the Bay, unless we start taking action. I will explain why prices will keep going up, and what you can do to help.

Why Should You Care?

Salaries in the tech industry are really good! Rent may be high, but you make more, and can count on raises to outpace rent increases. Why should you care?

  • You may want your children to be able to afford to live where they grew up. Thirty years ago, you could buy a house in the Bay Area for 3-4x the average income. Frequently now that number is 8-10x, and that house may be out in Antioch.1

  • San Francisco has a huge number of people commute downtown every day, which stress our highways and tunnels. Long commutes are correlated with lower happiness, and are more harmful for the environment. If we built more housing near where people work, commutes would be shorter.

  • Lower rents mean that there's more money in your pocket.

  • You may want to buy a home, and you may not have rich parents, or have been an early employee at a rocketship startup.

  • Cheaper housing helps less fortunate people make it and get a leg up. The Bay Area has one of the tightest job markets and most dynamic local economies in the US. People not making very much in other areas can move here, get a job and earn a higher salary than they can in, say, Reno. The high starting prices for housing discourage this, which means it's more difficult for the middle class to get a foothold in the Bay Area.

  • You may want to send your children to school in the area. Schoolteachers largely can't afford to live in San Francisco, which makes it harder to recruit good teachers for your kids, in public or private schools. Your employees have this same problem.

  • If you want to start a company, you'll need to hire employees. Higher home prices and rents mean that you have to pay higher salaries and more for rent. This makes your startup less viable.

  • If you want to fund startups, high salaries and rents mean you have to have larger rounds, and that your money doesn't go as far.

How an Empty Plot of Land Becomes Housing You Can Live In

You have to jump through many hoops to build housing in San Francisco. This section is long, but it's important to know how many different opportunities there are for NIMBY's to stall a project they don't like.

  1. Buy land you want to develop on. There are many underdeveloped properties in San Francisco - parking lots, unused office spaces, or undeveloped lots owned by the City.
  1. Submit a building plan to the City that follows the zoning code. Put up signs in the neighborhood explaining what you are building. Start working on permits. This part is pretty standard across all cities.

    If your property is on the waterfront, your project needs to be approved by a majority of city residents in the next citywide election, thanks to 2014's Proposition B, which requires any new waterfront development to be voted on by the entire city. If you want to build in SOMA or the Mission and your housing would replace a production, distribution or repair business, you need to create one elsewhere or add space in your building for it, thanks to 2016's Prop X.

  1. You have to submit an "Environmental Impact Report" (EIR) which explains the environmental impact your building will have. There were over 200 different "impacts" that can be considered - noise, traffic, crime, etc. All of these were given equal weight until a few years ago. These "impacts" are local to the area — you can't count "People will have to commute from Stockton if we don't build in SF" as an impact, even though it's true, and bad. Most of the time, you can reuse an existing Environmental Impact Report that has been prepared for a given neighborhood. More on this later.

  2. If a neighbor doesn't like your project, they can pay just $578 to ask for a "Discretionary Review" by the Planning Commission. This is supposed to be for extraordinary circumstances, but pretty much anyone can file for any reason. Common ones are because a project will block your view, will cast a shadow on your beer garden, won't fit with "the character of the neighborhood," or requires a "variance," some small change from the zoning code.

    You are supposed to meet with the community at this point. Your neighbors probably won't like your project. They may invoke the words "3 story monstrosity" to describe it, pass around flyers saying it will "ruin neighborhood character", or say there will be increased crime, a harder time finding parking, etc. If they can't block the project, they want you to make it shorter and smaller. But resolving the issue by reducing the number of units makes your project less viable.

    If you can't resolve the issue it goes to the Planning Commission, which has seven members, four of whom are appointed by the Mayor, three by the City Supervisors. You get ten minutes to explain why you should be allowed to build. The opponents get ten minutes to complain about their views. There is a "public comment" section where members of the public get 2 minutes to talk about the project.2

    The Planning Commission may ask you to compromise, approve the project, deny the project, or punt the decision by a month. They frequently deny projects. Let's say they approve the project. Hurray!

  3. Your neighbors may appeal the decision to the full Board of Supervisors. There are 11 supervisors, one for each district in San Francisco. The Board is currently split between people who want more housing and people who say they want more housing, but repeatedly vote for rules that make it harder to build housing.

    The most frequent appeal angle is to say that the EIR is not valid. Recently, one guy in the Mission appealed a 100% affordable, 94-unit building for senior citizens. You can read the appeal for yourself. The main reason the appellant thinks there should be a new EIR is because these poor senior citizens will cause vagrancy, crime, and littering in the neighborhood. Certainly these effects would be much worse if we didn't build the housing, and these seniors would be on the street.3

    You are about six to eighteen months into the permit process at this point. And the Board may vote to turn down your project! Recently a 157-unit project in the Mission, with 39 affordable units, was denied by the Board.

    Neighborhood groups may try to make a "deal" where the developer essentially buys their support. Calle 24, an anti-gentrification group in the Mission, recently negotiated a "deal" with another building where they would drop their protests in exchange for $1 million.

  1. At any time during this process your permit might run out, the bank might decide to cancel your loan, market conditions may change, the City might vote to make your project infeasible. This is why your neighbors use so many stalling tactics - the longer they can stall, the more likely you will pull out of the project.

  2. If the BoS approves your project, your neighbors have one more recourse: they can file a California Environmental Quality Act (CEQA) lawsuit. The lawsuit alleges, essentially, that a project would be harmful to the environment, and the developers haven't sufficiently considered those impacts.

    If you were considering the environment at a regional level, you would probably want to build as densely as possible, and optimize for short commutes - so you'd file lawsuits to block low density projects that build on undeveloped land. However, a recent study found that CEQA lawsuits target infill projects (which increase density) by a 4 to 1 ratio. CEQA is well intentioned, but frequently abused. In one instance, abortion opponents filed a CEQA lawsuit to block a Planned Parenthood. They said that the Planned Parenthood hadn't adequately considered the noise that the protesters themselves would generate!.

  3. Finally, if you navigate this intimidating gauntlet, and are determined to stick with the project, and your permits haven't expired, you can break ground on your project. It's extremely expensive to build here, and while the sums are large and the rent is high, developers do not make much money. One developer, Boston Properties, targets a 7% return, which is not large given the risk involved.

How does this work in places that keep prices low? States like Texas have lots of land and loose zoning codes. Other countries, like Japan, work around this by setting housing policy at the regional or national level. Essentially, they don't let NIMBY's have a say in the process.

Other Housing Errata

  • The city of San Francisco is half as dense as Brooklyn. We can achieve large decreases in rent prices by just building 4-5 stories on empty lots around the city.
  • If you live in a building that is older than 1979, you are entitled to rent control, which means that your landlord can only increase your rent by a few percent per year (the exact percentage is controlled by the SF Rent Board and tied to inflation. Last year it was 2.2%). You have a wide array of rights as a tenant; you may even have tenants rights if you have been living in a place for more than 30 days and don't have a formal contract with your landlord, or are not on the lease.

    Econ 101 classes will teach you that rent control depresses supply, but I don't think it's too relevant to the SF housing crunch. If we didn't touch a single rent controlled unit, and just built 4-5 story units on the underdeveloped lots around the city, we'd be in a great place, housing wise.

  • There are well intentioned people who believe that building new housing supply actually creates more demand, and show up to meetings opposing any new development. This theory is not borne out by the evidence; the city has built a ton of new housing since 2015, and as a result, market rate rents have fallen by about 5% since the peak of the market. Other cities like Denver and Seattle have also seen rent decreases in response to new housing coming on the market. Furthermore, if increased supply really increased demand, the opposite would also be true - reducing supply would reduce demand even further! But no one suggests that destroying housing in SF would lower prices.

  • There are well intentioned people who believe that the only new housing in SF should be 100% affordable. 100% affordable projects are not profitable or viable for private developers, so they require subsidies. There is only so much money for subsidies, and in addition the Trump administration is interested in cutting federal subsidies for affordable housing.

  • There are well intentioned people who believe that the only way to prevent gentrification is to prevent any new buildings from being built - groups like MEDA and Calle 24, who have successfully fought projects in the Mission. Blocking new supply doesn't do anything about high demand, of course, and the result is existing buildings getting resold for millions of dollars, as we're seeing in the Mission, and ordinary people can't find new places to live, or can't move within their neighborhood.

  • There are less well intentioned landlords and real estate agents who oppose new housing because no new supply drives up the demand for existing housing, which increases their property values.

  • The most frequent complaints are that a project has too many luxury units and that it's too tall. These complaints and process impediments drive up the price of building here, which mean that the only viable projects are (1) targeted at the high end of the market, and/or (2) contain lots of units. Ironically, the complaints about luxury and height make it harder to build projects that aren't very tall and very high end!

  • Construction unions often oppose projects (and sometimes file CEQA lawsuits to block them) if the construction wages aren't high enough, or if the developer wants to use non-union labor.

  • You can complain to the Planning Commission that a project will cause shadows or block your view. In 99% of the country, these disputes are resolved by buying an easement, a contract that prevents your neighbor from blocking your view or casting a shadow.

    The Coase Theorem says this should be good enough; if you don't have an easement, we can conclude your neighbors value building high more than you value your view or your sunlight. Except in San Francisco! If your neighbor doesn't want to grant you an easement, you can complain to the Planning Commission and block their project.

How to Get Involved

All of this means that building new housing is really difficult in San Francisco, and as a result, rents and home prices are going to continue to increase, you will pay more in rent, you won't be able to hire employees or buy a house, and your kids' teachers won't have a place to live.

The problem is that prices are too high and there are too many roadblocks to building. The political goal is to lower prices by a) building more, and b) making it easier to build more. I support pretty much every project - 100% affordable, super high end market rate - as long as it gets a shovel in the ground. Every new unit helps - even if it's not in your price range, it means people in that price range aren't outbidding you for a place in your range.

Things that don't work

  • Bitching on Twitter - This accomplishes nothing, as the last election showed. The same way Indivisible Team and others are encouraging people to show up to town halls and flood their reps offices with phone calls - you need to do this for your local supervisors. The good news is that your local officials pick up the phone! And they don't get as many calls, so they're more responsive to your phone calls.

  • Building an app - We are not going to hackathon our way out of the housing crisis. We need to show up to planning meetings and whip support for bills.

  • Apathy - The default mode for tech employees, which is to not care about local politics and then outbid other residents for apartments. As long as this is true homes will be unaffordable, your kids schools will have tired teachers and your rents will keep rising.

If you have 5 minutes a week

Calling actually makes a difference! Call your supervisor or state rep and ask them to support building more housing, and to make the approval process simpler.

  • The most pressing SF issue concerns the percentage of affordable units per new development. Two SF supervisors want to force all new developments to have 28% of units be affordable, which is unworkable for many projects - 28% of zero new projects means that zero new units get built. A competing plan would lower the percentage to 18%, which would increase the total number of affordable units, despite being a lower percentage, because it would make more projects viable. Call your supervisor and ask them to support the Breed/Safai Prop C plan.
  • The most pressing area issue is what will happen to an industrial park in Brisbane. A developer wants to build 4,300 apartments on empty land, but the city is fighting back. If you live in Brisbane, call the mayor's office and tell them you support the development, or show up to the nightly meetings.

  • The next election, figure out how the candidates stand on housing, and vote for the pro-housing candidates. This is tricky, because everyone says they are pro-housing, but many will not help get shovels in the ground. "Preserve neighborhood character" is a red flag. "100% affordable only" is a red flag. Any reference to wind tunnels, shadows, or height is a red flag.

  • Sign up for SF Do Something. Follow SFYimby or East Bay Forward on Twitter, and call your supervisor when you hear about a helpful piece of legislation, or a stalled housing project.

  • There are several bills in the California State Legislature that help. SB 35 (sponsored by SF's Senator, Scott Wiener) would make it more difficult to stall projects if cities are not meeting their state-mandated housing goals. SF's Representative, David Chiu, sponsored a bill to tax vacation homes and use the money to support affordable housing. SB 167 would make it harder for cities to block housing projects. Call your state senator and representative and ask them to support these bills.

If you have an hour a week (especially if your work hours are flexible)

SF Yimby holds meetings once a month teaching new people how to make a difference in their community; check their Facebook for notifications about the next meeting.

Show up to your local city hearings and speak in support of projects. The Planning Commission meets at City Hall every Thursday and posts their agenda here. You can show up and give a comment.

  • Take BART or MUNI to City Hall, a block from the Civic Center station.

  • Ask the City Hall door guard where the Planning Commission meeting is.

  • There are little cards at the front of the room. Write down your name, and check the box that says "Support."

  • Sit in the meeting. When the Commissioner asks for comment, line up by the microphone. Say you are a city resident, and you support building the project. Here is a template:

    Hi, I'm [name here], a [renter|homeowner] in District [your district]. I'm speaking in support of the project. We are in the middle of a housing crisis, and the more housing we build, the more people can afford to live here. This project will help [X number of people] live in the city, close to where they work, and should go forward. Thank you for your time.

The wild thing is that the commissioners and supervisors apply ad-hoc, per project guidelines and often vote based on how many people comment in support or in opposition of the project. So your voice really does matter! But you have to show up.

The meetings can take a long time. City Hall has good wifi; I've sat in the back of the room and worked for hours at a time before while waiting for a bill.

You can also write an email to the Planning Commission in support of a particular project.

If you work in SF and commute to a megacorp in the South Bay

Menlo Park, Palo Alto, Mountain View and co love to build office space, but hate to build housing. Consider calling the city managers in the city you work in, and ask them to support more housing, so you can afford to live near where you work.

Your C-level managers or venture capital backers may not realize how bad the problem is for employees. Ask them to support pro-housing candidates and organizations.

Ask your company to try to build apartments in the area; Facebook is trying this.

If you have money

The tech community is learning slowly that it's not enough to air drop cash three months before an election; we have to build organizations and coalitions, get elected to committees like the local DCCC, and show up to meetings between elections. Donate to these local pro-housing organizations:

  • CARLA sues cities that break California state law to deny proposed projects. This is a surprisingly effective technique that is already producing results throughout the Bay Area, but lawsuits are expensive.

  • [Yimby Action][yimby] is a pro-housing lobbying group.

  • East Bay Forward is focused on building housing in the East Bay.

  • Donate to local pro-housing candidates. Volunteer for their campaigns; knock on doors.

Conclusion

Demand for housing in the Bay Area far outstrips supply, but it's really hard to build here, and there are a lot of people and procedures that want to keep things expensive and scarce. If you want to ever afford to buy a house here, it's time to start getting involved.

1 This doesn't stop old people from showing up to planning meetings and saying "I worked hard and built a house here and these spoiled young people can too!" It's three or four times as difficult now as it was in 1970.

2The comment quality can vary widely - at one recent public comment section, a commenter worried what would happen to the "rare plants" on a vacant lot.

370% of San Francisco's homeless were homed in San Francisco before they were on the street; it's a very local problem.

Liked what you read? I am available for hire.

Why Leap Seconds are Tricky and How to Deal With Them

You may have seen this on New Year's Eve:

I'd heard a little about this problem, but I didn't understand how it broke code, and what to do about it. So here is an explainer.

Background

Earlier this year, the International Earth Rotation and Reference Systems Service decided to add an additional second to the year. This is due to the fact that the Earth's day-to-day rotation does not perfectly line up with a 86400 second day.

So when the clock hit 23:59:59 on New Year's Eve, it usually ticks over to 00:00:00, but this year, repeated the second 23:59:59. So if you had measurement code that looked like:

a := time.Now() // Dec 31 2016 23:59:59 and 800ms
// While this operation is in progress, the leap second starts
doSomeExpensiveOperation()
duration := time.Since(a) // Taken at Dec 31 2016 23:59:59 (#2!) and 100ms
fmt.Println(duration)

We usually assume in our code that time can't go backwards, but the duration there is -700ms! This can cause big problems if you have code that always expects nonnegative time durations. In particular code that accepts timeouts might panic or immediately error if it gets a negative value. Cloudflare passed a negative duration to rand.Int63n, which crashed some of their servers.

But this is only a problem on leap seconds, which occur once a year at most. That's not good, but I can handle one problem a year. That would be nice, but unfortunately time on your servers can jump around quite a bit. An individual server might get a few seconds ahead or behind the rest of your servers. When it gets the correct time from the NTP server, the time on the server might jump forward or backward a few seconds. If you run any number of servers you are bound to hit this problem sooner or later.

Ok, how do I deal with it?

Good question! The Linux community implemented CLOCK_MONOTONIC as an option to clock_gettime to solve this problem. CLOCK_MONOTONIC returns an integer number of nanoseconds that only ever increases. You can use this to get more accurate (and always nonnegative) deltas than getting the system time twice and subtracting the samples.

You'll have to check whether your programming language uses CLOCK_MONOTONIC for implementing calls to get the system time. Many languages don't use it for the simple "what time is it" call, because it returns you time from a random starting point, not the epoch.

So you can't really translate between a CLOCK_MONOTONIC value and any given human date; it's only useful when you take multiple samples and compare them.

In particular

The Go standard library does not use CLOCK_MONOTONIC for calls to time.Now(). Go has a function that implements CLOCK_MONOTONIC, but it's not a public part of the standard library. You can access it via the monotime library - replace your calls to time.Now() with monotime.Now(). Note you will get back a uint64 that only makes sense if you call monotime.Now() again a little while later (on the same machine) and subtract it from the first value you got.

import "time"
import "github.com/aristanetworks/goarista/monotime"
func main() {
    a := monotime.Now()
    someExpensiveOperation()
    b := monotime.Now()
    // b - a will *always* be greater than 0
    fmt.Println(time.Duration(b - a))
}

Porting code errata

I ported Logrole (a Twilio log viewer) to implement monotonic time. In the past it was really easy to call t := time.Now() and then, later, somewhere else in the code, call diff := time.Since(t) to get a Duration. This let you both use t as a wall clock time, and get an elapsed amount of time since t, at some unspecified point later in the codebase and in time. With CLOCK_MONOTONIC you have to separate these two use cases; you can get a delta or you can get a Time but you can't get both.

The patch was pretty messy and an indication of the problems you might have in your own codebase.

Another problem to watch out for is a uint64 underflow. If you have code like this:

if since := time.Since(aDeadline); since > 0 {
    return nil, errors.New("deadline exceeded")
}

You might port it to something like:

now := monotime.Now()
deadline := now + uint64(500*time.Millisecond)
someExpensiveOperation()
if monotime.Now() - deadline > 0 {
    return nil, errors.New("deadline exceeded")
}

The problem here is you are subtracting uint64 values and if now is under the deadline, you'll underflow and get a very large uint64 value (which is greater than 0), so you'll always execute the if condition. Instead you need to avoid the subraction as well and use

if monotime.Now() > deadline {

instead.

Conclusion

It is hard to get this right, but it's probably a good practice to separate out wall clock time values from time values used for deltas and timeouts. If you don't do this from the start though you are going to run into problems. It would be nice if this was better supported in your language's standard library; ideally you'd be forced to figure out which use case you needed a time value for, and use it for that.

Liked what you read? I am available for hire.

Ethical Considerations for Software Engineers

The next president of the United States showed a willingness to violate historical norms while campaigning, and there's little evidence that he has any moral compass - the examples of this are legion, one of the worst is him cutting off medical treatment to his sick nephew over a legal dispute. His kids are going to run his businesses (with his name on them) while he is in office. He has also asked for security clearances for them. This is at best an unusual arrangement and at worst opens the door to massive corruption.

During the election the Russian government hacked and leaked the DCC's emails, then hacked and leaked the email of Hillary Clinton's campaign chief. Trump denied Russia's involvement publicly at a debate even though he'd been briefed on it. Trump has taken many sides on many issues but praise for Putin and Russia has been consistent. Trump just promoted a paid Russia Today commentator to his National Security Adviser. It is likely that Russian (and Chinese, Iranian, etc) hacking of US government offices and US companies will be tolerated over the next four years, especially if it benefits Trump and hurts his political allies.

It's important to note these attacks won't come out of the blue. It's not sunny one day and the next there are men in suits asking for data center access. There will probably be some pretext - a foreign war, a terror attack, something else, that'll be used to justify the unethical request. It's easy to imagine "Of course I will identify the racist thing!" and much harder in the moment, or in a context that's surrounded by fear.

Note also that if you are an engineer, these requests may come outside of normal channels. Last year, Yahoo fielded a request to search all emails for a given term. Yahoo's C-level executives went around the security team and asked engineers to implement this directly, at an extremely low level. Alex Stamos, Yahoo's CSO, resigned when he found out. You should be prepared to do the same. Don't expect unethical requests to show up on the backlog - it'll be a meeting you're pulled into with the CTO, or a man showing up at your apartment and threatening your immigration status unless you insert a backdoor.

Employees (and especially engineers) will be the key people to push back. Customers aren't always aware of shenanigans, and management can be under more pressure to make their company succeed. Especially in Silicon Valley, most employees have multiple job options, which gives us unique leverage. Every employee at a Silicon Valley company should be prepared for unethical or illegal requests, and (where appropriate) be prepared for state sponsored attacks, from the US government or another one. Every employee should be prepared to put pressure on management, and the legal team, to deny requests.

Here are some examples of ethical problems you might run into. I'd encourage you to have these discussions internally before you get put in the situation discussed below, and lay out bright lines for everyone in the company to follow, to make it clear where you stand and what's not acceptable. I would also encourage you to ask about these when you interview.

All

The pledge at neveragain.tech has covered this in more detail but here are some good questions to ask in an interview:

  • Do you encrypt messages that go from datacenter to datacenter? The NSA has spied on this data in the past.

  • Do you offer end-to-end encryption of messages sent between users?

  • Do you destroy sensitive data if it's not needed anymore? Do you destroy user data if they delete their accounts?

  • What is your policy to responding to requests from the US government and other governments?

  • Do you have data that would be valuable to foreign governments, or embarrassing to customers if it was made public? What's your strategy for protecting that data against sophisticated nation states?

  • Would you take money from the Trump Organization or its affiliates in exchange for an explicit or implicit guarantee of "protection"?

Venture Capitalists / CEO's

  • Donald Trump's children or their representatives may ask for a share in your fund, in exchange for favorable treatment from the federal government. Would you accept such a request? Note they may ask after they have successfully applied this approach to other companies.

  • You may be approached for an investment by a company or entity that has ties to the Russian government, or ties to the Trump Organization. This may be accompanied by a threat of harassment from the federal government, hacking, DDOS, or other. Would you accept the investment?

Slack

  • By default you store a company's entire conversation history, including DM's. Private information like this is easy to distort and take out of context. Russians hacked from the DCC and trickled emails to the press, with devastating effects. Should the default behavior for a Slack installation be to store a company's entire history?

  • What efforts are you making to educate users about the risks of storing their entire conversation history on Slack? What are the highest-value targets for hackers who'd like to compromise the Slack network?

  • What progress have you made on end-to-end encryption for Slack messages?

  • Is there a way to store the data where a compromise would not allow a hacker to access every message for every company in your system? Say you had three different datastore designs.

Uber/Lyft

  • Your companies store a massive amount of data on where users have been and where they are going. If exposed, this data could be used to embarrass people - why is this married Congressman requesting a ride from outside a gay bar, or a hotel in the middle of the day?

  • What options do users have for removing their trip history from your site?

  • What employees can access user data, and under what circumstances? What tools do you have for anonymizing data that's not viewed in aggregate?

  • Many Trump voters cited a feeling of being left behind as a reason to vote for him. Uber drivers are 1099 contractors, which means you are prohibited from providing them with training. What responsibility do corporations have to put their workers on an upwards career path?

  • Many of your 1099 contractors get health care from the government, or on government-mandated exchanges. These exchanges are being threatened by Republican governors in many states, and Republicans in Congress. What responsibility does Uber have to work for healthcare for its drivers?

  • Your legal page says "We generally require a valid request issued in accordance with applicable law before we can process private requests for information." What does "generally" mean in this context? If China passes a law that says "we can ask for everything," would Uber comply?

  • You've taken money from Saudi Arabia's public investment arm. Would you be say no to that money if the Saudi Arabian government asked for data on customers as a condition of the deal?

Stripe/Braintree

  • You collected millions of dollars in revenue from the Trump campaign in 2016. If Trump acts like an authoritarian in office, or severely restricts the rights of minorities or immigrants, will you support his campaign again in 2020?

  • Does Stripe receive requests from law enforcement? What is your policy for responding to subpoenas?

  • If Stripe processes a credit card payment, who can see the record of that transaction? Who should be able to see it, and/or remove it?

Twilio

  • Do you encrypt messages passing from datacenter to datacenter?

Facebook

  • Historically newspapers and other media organizations have had a strong understanding of their role in promoting democracy and enforcing accountability from the government and our business leaders. Facebook has become a very important part of how people figure out what's going on in the world around them. What responsibility does Facebook have to ensure people have a mostly-correct view of the world? Should Facebook have a role in promoting democracy and in rejecting authoritarianism?

  • Facebook tells advertisers that their ads can change users' minds. But Facebook also insists that the algorithms it uses to show information didn't sway the US election (or overseas elections). Which is it?

  • Has Facebook responded to queries from governments on the lines of "Muslims/blacks/immigrants living in state/city/county X"?

  • Facebook's current policy is to censor/restrict content according to local laws. If a law was passed to restrict speech in the United States, would Facebook comply?

  • Does Facebook encrypt data being sent from datacenter to datacenter?

Twitter

  • What line would Donald Trump have to cross for you to suspend or ban his account?

In sum

You are the most likely agent of change at your company. A lot of stuff may happen in the next four years and it's good to think and declare now, when things are relatively sane, what you'll agree to do or not do, because in the aftermath of another 9/11, or similar event, you may be asked to do a lot.

I've laid out my own consulting ethics guide here.

Liked what you read? I am available for hire.

Tradeoffs in Software Provisioning Tools

A while ago my friend Alan and I were discussing configuration management. In particular we wondered why every configuration management tool has to ship a DSL, or be loaded from YAML files.

We wondered if it would be possible to just write code that deploys servers — it might let you describe what you want to do much more precisely.

I started working on a library that lets you do this. Basically, turn every module + state combination in Ansible into a function, add any required arguments as part of the method signature, and add a Opts dictionary for any optional arguments. Right now it looks something like this.

if err := core.AddGroup(ctx, host, "wheel", core.GroupOpts{
    System: false,
    Gid: "1001",
}); err != nil {
    log.Fatal(err)
}

But starting to implement this led to several more non-obvious tradeoffs.

Abstraction

This is the most obvious reason to use a configuration management tool. Whether you are deploying OpenBSD or Ubuntu or Darwin, you still need to create users and create folders and install packages. A good provisioning tool will abstract these for you, and choose sensible defaults.

However abstractions can be leaky; maybe one filesystem offers a feature that others don't, and it can be hard to make this available while also saying "this is only supported on these systems."

Run Commands On Local Machine vs. Remote Machine

Do you want to run commands on the machine that triggered the provisioning process, or the machine being provisioned? Take mysql for example. If you have a mysql client on the local machine, you can issue commands to the remote machine using the mysql protocol on the wire.

This requires the remote machine to expose port 3306, which you might not want to do. You also need to trust mysql's ability to encrypt a connection and trust a remote certificate. Compared with SSH, mysql has had much less auditing of its security code, and is not as good of a bet for encrypting/safely compressing content going over the wire. (This becomes more salient when you have N protocols for issuing commands over the wire, instead of just SSH.)

Another option would be to SSH to the remote machine, then run a mysql client on that machine to configure/provision MySQL. But this requires that you have a MySQL client on the remote machine. It's also considerably trickier to issue multiple MySQL commands over a single SSH session, and take action based on the results.

Run Multiple Commands Per SSH Connection

A single task in Ansible for "create this recursive directory" embeds a ton of complexity. If the directory exists, but has the wrong permissions, or the wrong owner, the directories are recursively created and chowned/chmodded. You can do this with SSH commands, e.g. ssh host mkdir foo && chmod 755 foo && chown -R root:wheel foo, but it gets more and more complicated, and tougher to determine which command failed, the more commands you layer on.

You can work around this by issuing each command as part of a single SSH connection, then getting the result, and making some decision based on it. But this vastly increases the latency of what you're trying to do, even if you enable pipelining you're talking about 1 second latency per operation.

Ansible works around this by copying a Python file to the remote machine, then running that file on the remote machine with several arguments. This occurs for each directive that Ansible runs. This has two implications: each machine needs to have Python on it, and Ansible is really slow - think one second per directive you put in Ansible.

With Go, we could copy a binary to the remote host and then run it. This would let us take advantage of Go's standard libraries (instead of issuing Unix commands directly via SSH). We could either compile+SCP this binary on the fly, or ship precompiled binaries for each architecture as part of the distribution.

But if we are going to go to that length, why not just add tools to compile the user's entire program, SCP that to the remote filesystem, and run it there?

Run Multiple Directives Per SSH Connection

The only way you are going to get really fast execution is by executing multiple directives/tasks/modules as part of a single SSH connection to the host. The way to achieve the most benefits would be to compile the user's entire configuration program, SCP the binary to the host, then run the binary on the host.

But this requires giving up some flexibility as well. Some Ansible tasks involve copying data from the remote machine to a local machine - for example, mysql_db in target mode. You can do this over the SSH connection, but it might be tricky to separate output that's part of control flow - e.g. "RUN: add group wheel" - from output that's supposed to be copied to the local machine — e.g. a mysql dump. Alternatively, if you need to copy a file from the local machine to the remote machine, you need to bundle that file as part of the target you SCP to the remote machine.

For Go specifically, you'd either need a Go binary on the remote machine, and then copy all of the source files, or you'd need to cross compile the source on the local machine, which means things like user.Current() won't work.

Conclusion

There are a few thorny problems that weren't immediately apparent when I started working on this. For the moment I'm going to try to proceed with the Go solution, porting over an existing set of Ansible directives, and I'm going to try to prioritize speed of remote execution.

But I'm much less confident is going to work well, without a lot of effort.

Liked what you read? I am available for hire.

An API Client that’s Faster than the API

For the past few weeks I've been working on Logrole, a Twilio log viewer. If you have to browse through your Twilio logs, I think this is the way that you should do it. We were able to do some things around performance and resource conservation that have been difficult to accomplish with today's popular web server technologies.

Picture of Logrole

Fast List Responses

Twilio's API frequently takes over 1 second to return a page of calls or messages via the API. But Logrole usually returns results in under 100ms. How? Every 30 seconds we fetch the most recent page of Calls/Messages/Conferences and store them in a cache. When we download a page of resources, we get the URL to the next page - Twilio's next_page_uri — immediately, but a user might not browse to the next page for another few seconds. We don't have to wait for you to hit Next to get the results - on the server side, we fetch/cache the next page immediately, so it's ready when you finally hit the Next button, and it feels really snappy.

The cache is a simple LRU cache. We run Go structs through encoding/gob and then gzip before storing them, which takes the size of a cache value from 42KB to about 3KB. At this size, about 8,300 values can fit in 25MB of memory.

var buf bytes.Buffer
writer := gzip.NewWriter(&buf)
enc := gob.NewEncoder(writer)
if err := enc.Encode(data); err != nil {
	panic(err)
}
if err := writer.Close(); err != nil {
	panic(err)
}
c.mu.Lock()
defer c.mu.Unlock()
c.c.Add(key, buf.Bytes())
c.Debug("stored data in cache", "key", key, "size", buf.Len(),
    "cache_size", c.c.Len())

Right now one machine is more than enough to serve the website, but if we ever needed multiple machines, we could use a tool like groupcache to share/lookup cached values across multiple different machines.

Combining Shared Queries

The logic in the previous paragraphs leads to a race. If the user requests the Next page before we've finished retrieving/saving the value in the cache, we'll end up making two requests for the exact same data. This isn't too bad in our case, but doubles the load on Twilio, and means the second request could have gotten the results sooner by reusing the response from first request.

The singleflight package is useful for ensuring only one request ever gets made at a time. With singleflight, if a request is already in progress with a given key, a caller will get the return value from the first request. We use the next page URI as the key.

var g singleflight.Group
g.Do(page.NextPageURI, func() (interface{}, error) {
    // 1) return value from cache, if we've stored it
    // 2) else, retrieve the resource from the API
    // 3) store response in the cache
    // 4) return response
})

This technique is also useful for avoiding thundering herd problems.

Canceling Unnecessary Queries

You've configured a 30 second request timeout in nginx, and a query is taking too long, exceeding that timeout. nginx returns a 504 Gateway Timeout and moves on, but your server is still happily processing the request, even though no one is listening. It's a hard problem to solve because it's much easier for a thread to just give up than to give up and tell everyone downstream of you that they can give up too. A lot of our tools and libraries don't have the ability to do that kind of out of band signaling to a downstream process.

Go's context.Context is designed as an antidote for this. We set a timeout in a request handler very early on in the request lifecycle:

ctx, cancel := context.WithTimeout(req.Context(), timeout)
defer cancel()
req = req.WithContext(ctx)
h.ServeHTTP(w, req)

We pass this context to any method call that does I/O - a database query, a API client request (in our case), or a exec.Command. If the timeout is exceeded, we'll get a message on a channel at ctx.Done(), and can immediately stop work, no matter where we are. Stopping work if a context is canceled is a built in property of http.Request and os/exec in Go 1.7, and will be in database/sql starting with Go 1.8.

This is so nice - as a comparison, one of the most popular npm libraries for "stop after a timeout" is the connect-timeout library, which let you execute a callback after a particular amount of time, but does nothing to cancel any in-progress work. No popular ORM's for Node support canceling database queries.

It can be really tricky to enforce an absolute deadline on a HTTP request. In most languages you compute a timeout as a duration, but this timeout might reset to 0 every time a byte is received on the socket, making it difficult to enforce that the request doesn't exceed some wall-clock amount of time. Normally you have to do this by starting a 2nd thread that sleeps for a wall-clock amount of time, then checks whether the HTTP request is still in progress and kills the HTTP request thread if so. This 2nd thread also has to cleanup and close any open file descriptors.

Starting threads / finding and closing FD's may not be easy in your language but Contexts make it super easy to set a deadline for sending/receiving data and communicating that to a lot of different callers. Then the http request code can clean up the same way it would for any canceled request.

Metrics

I've been obsessed with performance for a long time and one of the first things I like to do in a new codebase is start printing out timestamps. How long did tests take? How long did it take to start the HTTP server? It's impossible to optimize something if you're not measuring it and it's hard to make people aware of a problem if you're not printing durations for common tasks.

Logrole prints three numbers in the footer of every response: the server response time, the template render time, and the Twilio API request time. You can use these numbers to get a picture of where the server was spending its time, and whether template rendering is taking too long. I use custom template functions to implement this - we store the request's start time in its Context, and then print time elapsed on screen. Obviously this is not a perfect measure since we can't see the time after the template footer is rendered - mainly the ResponseWriter.Write call. But it's close enough.

Page Footer

HTML5

Logrole loads one CSS file and one font. I would have had to use a lot more Javascript a few years ago, but HTML5 has some really nice features that eliminate the need for Javascript. For example, there's a built in date picker for HTML5, that people can use to filter calls/messages by date (It's best supported in Chrome at the moment). Similarly you don't need Javascript to play recordings anymore. HTML5 has an <audio> element that will provide controls for you.

I've needed Javascript in only three places so far:

  • a "click to show images" toggle button where the CSS necessary to implement it would have been too convoluted
  • a "click-to-copy" button
  • To submit a user's timezone change when they change it in the menu bar (instead of having a separate "Submit" button).

About 50 lines in total, implemented in the HTML below the elements where it's needed.

Conclusion

Combining these techniques, we get a server that uses little memory, doesn't waste any time doing unnecessary work, and responds and renders a slow data set extraordinarily quickly. Before starting a new project, evaluate the feature set of the language/frameworks you plan to use - whether the ORM/API clients you are planning to use support fast cancelation, whether you can set wall-clock timeouts and propagate them easily through your stack, and whether your language makes it easy to combine duplicate requests.

If you are a Twilio customer I hope you will give Logrole a try - I think you will like it a lot.

Thanks to Bradley Falzon, Kyle Conroy and Alan Shreve for reading drafts of this post. Thanks to Brad Fitzpatrick for designing and implementing most of the code mentioned here.

Liked what you read? I am available for hire.

Election Guide (Part 2) – CA Ballot Propositions, State Senate, more

This is Part 2 of my voter guide. Part 1 covers the 24 San Francisco ballot propositions and city supervisor races.

The deadline to register to vote in California is October 24. I highly recommend you sign up. Click here to register to vote.

A few notes I cover in more detail in Part 1: More housing is the most important issue for me on this year's ballot, and by default I vote "no" on ballot propositions, since I think we shouldn't be deciding policy by statewide or citywide ballot.

California State Initiatives

Prop 51 (School Bonds): Yes

The real story here is that Proposition 13, passed decades ago, limits the state's ability to collect property taxes, enriching a generation of homeowners at everyone else's expense. This is why our schools constantly need more money.

I also wish the Legislature should be able to figure out its budget and prioritize and we didn't have to vote on things like this. I don't feel too strongly in either direction.

Prop 52 (Medi-cal): No

Hospitals pay a required fee to the CA State government (about $5 billion a year). When the State allocates this money for Medi-cal, the federal government provides about $4 billion in matching funds.

In the past the State has diverted some of the hospital fee money to the general fund which hurts 2x - not only does Medical miss out on the fee money, it misses out on the federal matching funds.

This measure would require the hospital fee money to be spent on Medical, which seems reasonable.

I'm upset that we have to vote on this; I would rather the legislature do the right thing. I'm also upset that this amends the state Constitution; I don't think the Constitution should get into the specifics of how things should be funded. I also think we should be trying to loosen the hands of our legislators, not restrict them further, and that they're as aware of the cost of giving up matching funds as voters are.

Prop 53 (Voter Approval for Megaprojects): No

I'm really torn on this. On the one hand, you are putting voters in charge of deciding even more things about what the government does. On the other, megaprojects frequently fail and the majority come in at least 50% over budget (high speed rail is only the most prominent example of this). Politicians also like to build big things so they can have a "legacy" and the history of big things lately has been really mixed - see high speed rail and also the Bay Bridge which has required frequent fixes and may be cracking.

Our politicians might not make great decisions with our money but I think voters would make worse decisions. Note voters approved the first $9 billion of a high speed rail project whose final cost may be upwards of $60 billion, when no real funding source for the other $51 billion was in sight. This would also increase uncertainty and delay the start of any project until the next statewide vote.

Prop 54 (72 Hr Bill Freeze): Yes

I am unhappy that this bill amends the Constitution. But apparently there are numerous instances of state legislators shoehorning special-interest-friendly language at the last second.

There was that budget measure that limited the amount of reserves local school districts could maintain as a cushion against lean times (a gift to the teachers union, which wanted to make those dollars available for immediate spending); the 2009 waiver of environmental rules for a downtown Los Angeles football stadium (on the argument that time was of the essence to secure an NFL team ... the project never broke ground); or the 2011 bill that Democrats rushed through to force all voter initiatives on the November ballot, thus breaking a deal with Republicans to put spending reform on the June 2012 ballot.

Prop 55 (Extending Income Taxes on High Earners): No

The share of tax each resident pays is something that the Legislature should resolve. I also agree with the Chronicle that this measure will increase the variability of revenue in the state budget, which isn't great.

Prop 56 ($2 Cigarette Tax): Yes

In general taxes are a good way to discourage behavior you don't want. Cigarettes are unhealthy and incur significant spillover costs due to secondhand smoke, and the additional burden on the healthcare system from insuring/treating patients with cancer and emphysema.

I would have preferred for the Legislature to vote for this tax as well.

Prop 57 (Parole): No

Many people are serving sentences that are too long and the prisons are overcrowded. But the language is confusing and I don't see why the Legislature can't pass legislation to deal with this issue.

Prop 58 (Local Language Education Flexibility): Yes

Apparently this is on the ballot because it repeals a previous voter-passed initiative from 1998. The worry is that voting Yes will allow students to graduate without mastering English at all, which isn't good. But it seems like all of parents, students and schools want students to learn English, they just don't agree that "all English classes, all the time" is the best way to do it.

Prop 59 (Citizens United): No

I'm voting No because this is a waste of energy and we shouldn't be voting on things like this, not based on any opinion about Citizens United.

Prop 60 (Porn Stars Wear Condoms): No

The practical effect of this bill would be to shift the porn industry in California to Nevada or another nearby state. The porn industry also requires performers to get tested every two weeks. There are problems that probably deserve more scrutiny - the exploitation of performers in some scenarios - but it's not clear that this initiative is the vehicle or the method to fix them.

Prop 61 (Drug Prices): No

I agree with the Chronicle that the right solution here is to make drug prices (and the rates each agency pays) public, instead of ensuring that the prices Medi-cal and the VA pay are the same. I also think there are legitimate concerns about reduced access to necessary drugs and the ability of the Legislature to override this initiative if there are unforeseen problems.

Prop 62 (Repeal Death Penalty): Yes

Prop 66 (Quicken Death Penalty): No

Leaving aside whether it is ethical to put someone to death for crimes they have committed, I am against the death penalty for the following, more practical reasons:

  • It's entirely possible we have put an innocent person to death, an monstrous miscarriage of justice that should never be allowed to happen.

  • It's argued that the death penalty deters people from violent crimes. But there's a lot of evidence that deterrence depends much more on the severity and the certainty of punishment. Death, if it comes at all for death row inmates, is applied years or decades after the fact.

  • There are legitimate concerns about whether execution can be done "humanely" and a number of states have had problems sourcing the drugs used to put people to death.

  • It's expensive to execute someone, both in pure cost and in the cost of the appeals process - a death sentence must be appealed to the Supreme Court.

Repeal would also save California a significant amount of money.

Prop 63 (Ammunition): No

The biggest effect of a Yes vote would add additional charges for people who would like to buy ammunition. I don't think we need to vote on this.

Prop 64 (Marijuana Legalization): No Position

In general I'd prefer for drugs to be legalized and heavily regulated + taxed, instead of illegal, especially when you consider the potential revenue. I also think criminal sentences for possessing or distributing marijuana should be smaller than they are (the initiative provides for this). However, I'm concerned that marijuana is only as expensive as it is because it is illegal. Marijuana is not an expensive crop, and if it becomes legal to grow the price per ounce could go really low. I'm worried the flat taxes per ounce are too low, and the 15% sales tax should be a flat tax or a guaranteed minimum price per ounce.

The results on public health so far are mixed; one study reports a 7% increase in traffic fatalities for every 1% increase in marijuana consumption. The penalties for drunk drivers are not currently high enough and I'm worried we don't know how to measure whether a driver is high.

On top of this I am worried that the Legislature won't have the flexibility to override a state initiative; any amendments require a 2/3 vote.

Prop 65 (Money from Paper Bags to Environment): No

This directs revenue from grocery bag fees to specific environmental causes. I don't think we should put additional constraints on where the Legislature should direct money, and I don't think we should pass things by state initiative.

Prop 66: No (see #62 above)

Prop 67 (Affirm Plastic Bag Ban): Yes

Proposition 67 is a referendum on the existing bag law (10 cents a bag); a "Yes" vote says "Yes, please keep the law the way it is." I prefer the Legislature to write laws, not California voters, so I am voting Yes.

Superior Court

Victor Hwang, who has experience working as a public defender.

Board of Education

Stevon Cook, Matt Haney, Rachel Norton, Jill Wynns.

Community College Board

Rafael Mandelman, Amy Bacharach, Alex Randolph, Shanell Williams.

Bart Director

Gwyneth Borden, who has been endorsed by the Chronicle and is open to a ban on BART strikes.

California State Senate: Scott Wiener

This is one of the most important races on the ballot due to the difference in quality between the candidates. Wiener is running against Jane Kim, who has opposed numerous housing projects, and is sponsoring some of the poorer propositions on the city ballot. Scott Wiener understands how to build more housing in San Francisco.

Kim also recently sponsored "legacy status" for Luxor Cab Company, which gives them a permanent subsidy from the City of San Francisco. This is a terrific waste of money compounded by the fact the benefit won't do anything for the company's cab drivers, only its 20 or so full time employees. Vote for Scott Wiener.

California State Representative: David Chiu

Chiu is running against Matthew Del Carlo, who does not appear to have policy positions listed anywhere publicly; it's not clear what he would run for, or do in office.

Chiu slammed Governor Brown for including $0 in affordable housing in this year's budget. The housing measures were tied to the Governor's "by right" housing legislation, which would have done more to lower rent/housing prices in San Francisco than any other legislative measure in a decade. It's not clear whether Chiu supported or opposed this measure.

Chiu is running against Matthew Del Carlo, who doesn't have any information about his policy positions listed publicly. I reached out to him multiple times asking him to post these publicly and he's refused to do so.

United States Senate: Kamala Harris

Harris is running for Barbara Boxer's old seat. We really need a California Senator who understands the technology industry and is willing to fight for it; who understands you can't just make a "golden key" to read messages that only the US government can access, as in Dianne Feinstein's horrible encryption bill.

President

Hillary Clinton.

Liked what you read? I am available for hire.

San Francisco Voting Guide – Propositions and Supervisors

I think this is useful and the ballot's complicated so I wanted to share how I'm voting this year. I used several sources to compile this guide:

  • The SF Chronicle's endorsements - they follow these issues every day.

  • The ballot book mailed to every voter, especially the text of the law and the main pro/con arguments.

I highly recommend voting by mail. You can feel too rushed or disorganized in the ballot booth, especially in this election, when there are so many things to vote on.

San Francisco Ballot Initiatives

The #1 issue for me in this election is housing. People make a fundamental mistake when analyzing the SF housing market; they see lots of increased demand (maybe 10%) and little increased supply (maybe 1%), and conclude "We're building housing but prices are still rising; the new housing must be causing the price increases." In reality if demand is outpacing supply you'd expect to see prices rise and supply rise, and the new housing stock is preventing the price from rising even faster than it currently is!

I also see a lot of hypocrisy. SF is full of liberals, and social mobility is a traditional liberal plank. In one of the hottest economies in the country, high rents are preventing poor people from moving here and establishing a foothold. Lowering the price of housing in our fastest growing economies is a moral imperative.

San Francisco added 5000 new units this year, and SF condos are 8% cheaper this year than last year. The market rate of rent also slowed from its normal double-digit increase. We need to build on this progress.

I want there to be more housing in San Francisco, of all shapes and sizes. In this election, anything that makes housing more complicated to build is a No; anything that makes housing easier is a Yes. Affordable housing is admirable but isn't a full answer, and gets more expensive as market rent rises. The easiest path to more affordable housing is to lower market rent.

I'll say two other things; in general I am opposed to deciding things by ballot initiative that could be resolved by the Board of Supervisors or the State Senate, since election votes tend to tie the hands of our elected officials, and can require supermajorities to unwind. So all other things being equal I am more likely to vote No on any given ballot measure.

I am also generally opposed to measures that set aside percentages of the budget, or specific dollar amounts, for any cause, no matter how noble. They reduce the flexibility of our elected officials to balance a budget, which is why we elect them in the first place. The percentage of the city budget each interest group would like to reserve for itself would well exceed 100%.

Measure A (School Bond): Yes

Measure B ($99 Parcel Tax for City College): Yes

Measure C (Repurpose Earthquake Bonds for Housing): Yes

The City is sitting on $261 million in unspent earthquake safety bonds and would like to redirect it to housing. This will increase the supply of housing.

Measure D (Short Term Appointment Rules): No

Some replacement public officials are named by the mayor to replace someone else who left their term. This measure would prevent them from running for a full term. I see no reason why appointees should not be allowed to run for a full term. The SF Chronicle opposes this measure.

Measure E (Trees Fund): No

$19 million per year for trees. In the words of the SF Chronicle, "San Francisco is running a near $10 billion budget. The civic bill for tree care is pegged at $20 million. There should be room for this expense without carving out a program that can't be changed."

Measure F (Youth Vote): No

This would let 16 and 17 year olds vote in local elections.

Measure G (Police Oversight): Yes

This would grant additional powers to a citizen review board. I think police organizations have trouble regulating themselves and this is a good step in the right direction.

Measure H (Public Advocate): No

This creates a new elected position with no power to do anything. "It's posturing minus responsibility, a dream job in the political world," according to the Chronicle.

Measure I (Senior Citizen Fund): No

This measure would set aside $38 million a year for programs for senior citizens and adults with disabilities. I support programs for senior citizens, but would rather our elected officials make decisions about the budget, instead of voters.

Measure J (Homeless Housing and Services): No Position

Measure K (Sales Tax Increase): Yes

In general I'd like to see more parcel tax increases and fewer sales tax increases, since the former hit property owners, who have been granted great gifts by Prop 13. They are also politically unpalatable.

Measure L (Muni Board): No

This would let the Board of Supervisors appoint three of the seven members of the Muni Agency. I don't see why the mayor shouldn't appoint members of the Muni Agency.

Measure M (New Housing Committee): No

This would add another layer of approval in the housing approval process, which would make it more difficult to add housing. I am against measures that would make it more difficult to add housing.

Measure N (Noncitizen Resident Voting): No

I'm sympathetic but this would likely be subject to a legal challenge.

Measure O (Office Exemptions): Yes

The city limits new office construction to 950,000 square feet. This is a silly rule, which makes it hard for startups, among others, to rent in San Francisco. This would exempt Candlestick Point development from that square footage rule.

I would like to see similar rules applied to speed housing growth, but there you go.

Measure P (Competitive Bidding for Affordable Housing): No

This makes it more difficult to build housing by discouraging projects that can't get at least three bids. From the Chronicle:

Prop. P obliges the city to seek three bids when offering city land to affordable housing builders. But City Hall already beats the bushes for multiple contenders. By one count, the last 10 projects had at least two bidders. Locking in a three-bid minimum could kill projects which don’t attract that threshold number of entrants. The measure has the potential to stop promising deals, the last thing San Francisco needs.

Measure Q (Prohibit Tent Placement): No

This wouldn't have much practical effect, and won't really help much to address the shortage of beds.

Measure R (Neighborhood Crime Unit): No

This would allocate 3% of the police force for neighborhood crime. Even if this is an issue that could be addressed by this allocation, I don't think the right answer is for the voters to make allocation decisions for the police department.

Measure S (Hotel Money Allocation): No

This would allocate the 8% hotel tax for the arts and for the homeless. In general I'm against allocating revenue for specific purposes; this isn't an exception. I doubt this will matter; the Chronicle has no position and there are no arguments against the measure in the ballot book.

Measure T (Lobbying Rules): Yes

This would add tighter restrictions on what lobbyists are allowed to do and spend to influence votes.

Measure U (Median Income): No

This would help middle income families qualify for affordable housing at the expense of lower income families. Per [the Chronicle][measure-u], "The guidelines for competitive bidding and income qualifications are better left to a process of legislative hearings, study and political compromise that balances the competing goals and concerns. These are not issues to be settled at the ballot box."

The solution here is more housing of all stripes, and hopefully market rate housing that is affordable to middle income families. This wouldn't help.

Measure V (Soda Tax): Yes

Charging a higher price for something is a good way to discourage people from getting it. This strategy has been used very successfully with cigarettes, which cause cancer in others via secondhand smoke; raising the price of cigarettes makes it an expensive habit. The fact that this raises money for the City is an ancillary benefit. The goal of this bill is to make sugary drinks more expensive and non-sugary drinks cheaper by comparison.

I'm also dismayed by the efforts of bill opponents, who have sent volumes of mail and mislead when they call this a "grocery tax." It's a 1 cent per ounce tax on sugary drinks.

Measure W (Higher City Transfer Taxes): No Opinion

The arguments for this measure all seem to say "this will help make City College free", which is very odd since it seems the tax money will go into the General Fund.

The arguments against point out that this also applies to rent controlled buildings and large buildings.

Measure X (Arts Use in New Buildings): No

This would add restrictions if you want to build housing in an area that was formerly used for the arts or certain types of small businesses. We shouldn't be voting on this, and it makes it more difficult to build housing, maybe more so than any other measure on the ballot.

San Francisco Board of Supervisors

District 1: Marjan Philhour

Marjan wants to build more housing of all shapes and sizes to address the area's housing crisis. She's also been endorsed by the Chronicle.

District 3: No Recommendation

Aaron Peskin is the incumbent who is going to win going away. Peskin has held up new housing on several occasions. He's also supported symbolic efforts to oppose Governor Brown's by right legislation, which would have done more for housing growth than any other proposal in a long time. Peskin also believes that you should only be allowed to exceed existing density limits if you build 100% affordable housing, which is a great way to grandstand for affordable housing while ensuring no new housing gets built.

He is being opposed by Tim Donnelly, who supports "respecting building limits", increasing parking, expanding rent control, and "giving residents a voice" because changes have been made "despite overwhelming opposition from the local community." It does not sound like Mr Donnelly is in favor of more housing.

District 5: No Recommendation

London Breed voted against Governor Brown's by right legislation, which would have helped increase the market-rate and affordable housing stock in San Francisco by letting developers build any project that followed local zoning rules and had 20% affordable housing. She also supports affordability requirements that make it difficult to build more housing.

She is being opposed by Dean Preston, who is running against "rent gouging", and supports an "anti-demolition" ordinance for "historic" buildings. Mr. Preston would not make it easier to build more housing in San Francisco.

District 7: Joel Engardio

Engardio is running against Norman Yee, who supports CEQA, a law that is frequently abused to oppose housing. Engardio supports building more housing. "I also know that building more housing will help middle income residents become homeowners -- and we want to keep families from leaving San Francisco. Restricting supply only drives prices higher," he writes.

District 9: No Endorsement

Hillary Ronen pledges to "fight for an affordable San Francisco" and wants to build 5000 units of affordable housing in 10 years. There was a very easy way to have accomplished 5000 units of affordable housing - support Governor Brown's by right housing legislation, which would have guaranteed that 20% of every new building in San Francisco would have been affordable. Her boss, David Campos, voted against it. She also wants to leverage state and federal funds to build affordable housing. Her boss's vote against by right legislation helped remove $400 million for affordable housing from the state budget.

District 11: No Endorsement

None of the candidates in either of these districts seem to agree that building more housing of any shape and size is the best way to alleviate our affordability crisis for everyone. Notably bad is District 11's Kim Alvarenga, running on a platform of "more parking" and "100% affordable housing", which is very difficult to build.

Coming Soon!

California State Propositions, BART director, judicial elections, State Senate and US Senate.

Liked what you read? I am available for hire.

Dumb Tricks to Save Database Space

I have seen a few databases recently that could have saved a lot of space by being more efficient with how they stored data. Sometimes this isn't a big problem, when a table is not going to grow particularly quickly. But it can become a big problem and you can be leaving a lot of disk savings on the table.

Let's review some of the benefits of smaller tables:

  • Indexes are smaller. This means your database needs less space to index your tables, and more RAM can be used to cache results.

  • The cache can hold more objects, since the objects are smaller.

  • You'll delay the point at which your database won't fit on a single disk, and you have to shard.

  • Query results which might have fit in 2 TCP packets will now fit in one.

  • Backups complete more quickly.

  • Your application servers will use less RAM to hold the result.

  • Migrations complete more quickly.

  • Full table searches complete more quickly.

Let's review some common data types and strategies for holding these. If these are obvious to you - great! You can stop reading at any point. They're not obvious to a lot of people.

A brief reminder before we get started - a bit is a single 0 or 1, and a byte is a series of 8 bits. Every ASCII character can be represented in a single byte.

UUID's

It's common to store UUID's as text fields in your database. A typical UUID representation - "ad91e02b-a147-4c47-aa8c-1f3c2240c0df" - will take up 36 bytes and more if you store it with a prefix like SMS or user_. A UUID uses only 16 different characters (the hyphens are for display only, like hyphens in a phone number). This means you only need 4 bits to store a UUID character. There are 32 characters in a UUID, so can fit a UUID in 16 bytes, a saving of 55%. If you're using Postgres, you can use the uuid data type to store UUID's, or the binary data type - in MySQL, you can use a binary(16).

CREATE TABLE users (id uuid PRIMARY KEY);

It's often useful to store a prefix with a UUID, so you know what the ID represents from looking at it - for example, SM123 or user_456. I wrote a short library that stores a prefix with a UUID, but strips it before writing to the database. To read UUID's out of the database with a prefix, attach them to the SELECT statement:

SELECT 'user_' || id FROM users LIMIT 5;

My old team at Shyp recently converted text ID's to UUID's and wrote about that process on their engineering blog.

UUID's in JSON

It's becoming more common to store relational data in JSON or JSONB columns. There are a lot of reasons to do this or not do this - I don't want to rehash that discussion here. JSONB does lead to inefficient data storage for UUID's, however, since you are limited to storing characters that are valid JSON. If you are storing UUID's this means you can't get down to 16 bytes, since you can't just any byte in JSON. You can base64 encode your 16 byte UUID. In Go, that encoding dance looks something like this:

import "encoding/base64"
import "encoding/hex"
import "strings"
rawUUID := "ad91e02b-a147-4c47-aa8c-1f3c2240c0df"
// Strip the hyphens
uuidStr := strings.Replace(rawUUID, "-", "", 4)
// Decode the hex string into a slice of 16 bytes.
bits, _ := hex.DecodeString(uuidStr)
// Re-encode that 16-byte slice using base64.
fmt.Println(base64.RawURLEncoding.EncodeToString(bits))

That outputs rZHgK6FHTEeqjB88IkDA3w, which is only 22 bytes, a 38% improvement.

Use smaller numbers

A default Postgres integer is 4 bytes and can hold any number from -2147483648 to 2147483648. If you know that the integer you are storing is never going to exceed 32,760, you can use a smallint (2 bytes) to store it and save 2 bytes per row.

Use an enum type

Let's say you have a subscription that can have one of several states (trial, paid, expired). Storing the strings "trial", "paid", "expired" in the database can take up extra space. Instead use an enum type, which is only 4 bytes (1 byte in MySQL) and ensures you can't accidentally write a bad status like "trail". Another alternative is to store a smallint and convert them to values that make sense in the application, but this makes it harder to determine what things are if you're querying the database directly, and doesn't prevent mistakes.

Use binary for password hashes

Most password hashing algorithms should give you back raw bytes. You should be able to store the raw bytes directly in the database using bytea.

Move fields out of JSON columns

One downside of JSON/JSONB is that the key gets stored alongside the value for each row in the application. If you are storing a boolean like {"show_blue_button": true} in JSON, you're using 18 bytes per row to store the string "show_blue_button" and only one bit to store the boolean true. If you store this field in a Postgres column, you are only using one or two bits per row. Moving this to a column pays off in terms of space even if you only need the show_blue_button boolean once every 70-140 rows. It's much easier to add indexes on columns than JSON fields as well.

Conclusion

That's it! A small amount of thought and work upfront can pay big dividends down the line. Migrating columns after they're already in place can be a pain. In general, the best approach is to do the following:

  • Add the new column with the correct type.

  • Edit your application to write/update both the new and the old column.

  • Backfill the new column, copying over all values from the old column for old records in the database. If the table is large, do this in batches of 1000 rows or so to avoid locking your table for too long.

  • Edit your application to read exclusively from the new column.

  • Drop the old column.

I hope this helps!

Inspired by some tweets from Andrey Petrov and a Heap Analytics post about JSONB.

Liked what you read? I am available for hire.

More Comment-Preserving Configuration Parsers

For the past few weeks I've been on the hunt for a configuration file format with the following three properties:

  1. You can use a library to parse the configuration. Most configuration formats allow for this, though some (nginx, haproxy, vim) aren't so easy.

  2. You can manipulate the keys and values, using the same library.

  3. When that library writes the file to disk, any comments that were present in the original config file are preserved.

Why bother? First, allowing programs to read/write configuration files allows for automated cleanup/manipulation. Go ships with a first-class parser/AST, and as a consequence there are many programs that can lint/edit/vet your source code. These wouldn't be possible without that ast package and a set of related tools that make parsing and manipulating the source easy.

You can imagine installers that could automatically make a change to your configuration; for example, certbot from the Let's Encrypt project tries to automatically edit your Apache or Nginx configuration. This is an incredibly difficult task, due to the complexity of the configuration that have piled up over the years, and that those configuration files weren't built with automatic editing in mind.

Backwards incompatible changes are never good, but their downsides can be mitigated by effective tools for parsing and updating configuration.

You want comments in your configuration file because configurations tend to accumulate over the years and it can be incredibly unclear where values came from, or why values were set the way they were. At Twilio, the same HAProxy config got copied from service to service to service, even though the defined timeouts led to bad behavior. Comments allow you to provide more information about why a value is set the way it is, and note values where you weren't sure what they should be, but had to pick something besides "infinity" before deploying.

What problems do you run into when you try to implement a comment-preserving configuration parser? A lot of config parsers try to turn the file into a simple data type like a dictionary or an array, which immediately loses a lot of the fidelity that was present in the original file. The second problem there is that dictionaries in most languages do not preserve ordering so you might write out the configuration in a different order than you read it, which messes up git diffs, and the comment order.

You are going to need to implement something that is a lot closer to an abstract syntax tree than a dictionary; at the very least maps of keys and values should be stored as an array of tuples and not a dictionary type.

The next problem you run into is that syntax trees are great for preserving the fidelity of source code but tend to be unwieldy when all you want to do is index into an array, or get the value for a key, especially when the type of that value may take any number of values - a number, a string, a date, or an array of the above. The good news is configuration files tend to only need a subset of the syntax/fidelity necessary for a programming language (you don't need/want functions, for example) so you can hopefully get away with defining a simpler set of interfaces for manipulating data.

(Incidentally I realized in the course of researching this that I have written two libraries to do this - one is a library for manipulating your /etc/hosts file, and the other is a library for bumping versions in Go source code. Of course those are simpler problems than the one I am trying to solve here).

So let's look at what's out there.

  • JSON is very popular, but it's a non-starter because there's no support for comments, and JSON does not define an ordering for keys and values in a dictionary; they could get written in a different order than they are read. JSON5 is a variant of JSON that allows for code comments. Unfortunately I haven't seen a JSON5 parser that maintains comments in the representation.

  • YAML is another configuration format used by Ansible, Salt, Travis CI, CircleCI and others. As far as I can tell there is exactly one YAML parser that preserves comments, written in Python.

  • XML is not the most popular format for configuration, but the structure makes it pretty easy to preserve comments. For example, the Go standard library parser contains tools for reading and writing comments. XML seems to have the widest set of libraries that preserve comments - I also found libraries in Python and Java and could probably find more if I looked harder.

  • TOML is a newer format that resembles YAML but has a looser grammar. There are no known parsers for TOML that preserve comments.

  • INI files are used by windows programs, and the Python configparser module, among others. I have found one parser in Perl that tries to preserve comments.

  • Avro is another configuration tool that is gaining in popularity for things like database schema definitions. Unfortunately it's backed by JSON, so it's out for the same reasons JSON is out.

  • You can use Go source code for your configuration. Unfortunately the tools for working with Go syntax trees are still pretty forbidding, for tasks beyond extremely simple ones, especially if you want to go past the token representation of a file into actually working with e.g. a struct or an array.

I decided on [a configuration file format called hcl], from Hashicorp. It resembles nginx configuration syntax, but ships with a Go parser and printer. It's still a little rough around the edges to get values out of it, so I wrote a small library for getting and setting keys in a configuration map.

This is difficult - it's much easier to write a parser that just converts to an array or a dictionary, than one that preserves the structure of the underlying file. But I think we've only scratched the surface of the benefits, with tools like Go's auto code rewriter and npm init/npm version patch. Hopefully going forward, new configuration formats will ship with a proper parser from day one.

Liked what you read? I am available for hire.

Real Life Go Benchmarking

I've been following the commits to the Go project for some time now. Occasionally someone will post a commit with benchmarks showing how much the commit improves performance along some axis or another. In this commit, they've increased the performance of division by 7 (a notoriously tricky number to divide by) by about 40% on machines running the ARM instruction set. I'm not 100% sure what the commit does, but it switches around the instructions that get generated when you do long division on ARM in a way that makes things faster.

Anyway, I wanted to learn how to do benchmarks, and practice making something faster. Some motivation came along when Uber released their go-zap logging library, with a set of benchmarks showing my friend Alan's logging library as the worst performer. So I thought it would be a good candidate for benchmarking.

Fortunately Alan has already included a set of benchmarks in the test suite. You can run them by cloning the project and then calling the following:

go test -run=^$ -bench=.

You need to pass -run=^$ to exclude all tests in the test suite, otherwise all of the tests will run and also all of the benchmarks. We only care about the benchmarks, so -run=^$ filters out every argument.

We get some output!

BenchmarkStreamNoCtx-4                   	  300000	      5735 ns/op
BenchmarkDiscard-4                       	 2000000	       856 ns/op
BenchmarkCallerFileHandler-4             	 1000000	      1980 ns/op
BenchmarkCallerFuncHandler-4             	 1000000	      1864 ns/op
BenchmarkLogfmtNoCtx-4                   	  500000	      3866 ns/op
BenchmarkJsonNoCtx-4                     	 1000000	      1816 ns/op
BenchmarkMultiLevelFilter-4              	 2000000	      1015 ns/op
BenchmarkDescendant1-4                   	 2000000	       857 ns/op
BenchmarkDescendant2-4                   	 2000000	       872 ns/op
BenchmarkDescendant4-4                   	 2000000	      1029 ns/op
BenchmarkDescendant8-4                   	 1000000	      1018 ns/op
BenchmarkLog15AddingFields-4             	   50000	     29492 ns/op
BenchmarkLog15WithAccumulatedContext-4   	   50000	     33599 ns/op
BenchmarkLog15WithoutFields-4            	  200000	      9417 ns/op
PASS
ok  	github.com/inconshreveable/log15	30.083s

The name on the left is the benchmark name. The number (4) is the number of CPU's used for the benchmark. The number in the middle is the number of test runs; to get a good benchmark, you want to run a thing as many times as feasible, then divide the total runtime by the number of test runs. Otherwise you run into problems like coordinated omission and weighting the extreme outliers too much, or failing to weight them at all.

To get a "good" benchmark, you want to try to isolate the running code from anything else running on the machine. For example, say you run the tip benchmarks while a Youtube video is playing, make a change, and then run the benchmarks while nothing is playing. Video players require a lot of CPU/RAM to play videos, and all other things being equal, the benchmark is going to be worse with Youtube running. There are a lot of ways to accidentally bias the results as well, for example you might get bored with the tip benchmarks and browse around, then make a change and observe the console intensely to see the new results. You're biasing the results simply by not switching tabs or screens!

If you have access to a Linux machine with nothing else running on it, that's going to be your best bet for benchmarking. Otherwise, shut down as many other programs as are feasible on your main machine before starting any benchmark tests.

Running a benchmark multiple times can also be a good way to compensate for environmental effects. Russ Cox's benchstat program is very useful for this; it gathers many runs of a benchmark, and tells you whether the results are statistically significant. Run your benchmark with the -count flag to run it multiple times in a row:

go test -count=20 -run=^$ -bench=Log15AddingFields | tee -a master-benchmark

Do the same for your change, writing to a different file (change-benchmark), then run benchstat:

benchstat master-benchmark change-benchmark

You'll get really nice looking output. This is generally the output used by the Go core development team to print benchmark results.

$ benchstat benchmarks/tip benchmarks/early-time-exit
name                 old time/op  new time/op  delta
StreamNoCtx-4        3.60µs ± 6%  3.17µs ± 1%  -12.02%  (p=0.001 n=8+6)
Discard-4             837ns ± 1%   804ns ± 3%   -3.94%  (p=0.001 n=7+6)
CallerFileHandler-4  1.94µs ± 2%  1.88µs ± 0%   -3.01%  (p=0.003 n=7+5)
CallerFuncHandler-4  1.72µs ± 3%  1.65µs ± 1%   -3.98%  (p=0.001 n=7+6)
LogfmtNoCtx-4        2.39µs ± 2%  2.04µs ± 1%  -14.69%  (p=0.001 n=8+6)
JsonNoCtx-4          1.68µs ± 1%  1.66µs ± 0%   -1.08%  (p=0.001 n=7+6)
MultiLevelFilter-4    880ns ± 2%   832ns ± 0%   -5.44%  (p=0.001 n=7+6)
Descendant1-4         855ns ± 3%   818ns ± 1%   -4.28%  (p=0.000 n=8+6)
Descendant2-4         872ns ± 3%   830ns ± 2%   -4.87%  (p=0.001 n=7+6)
Descendant4-4         934ns ± 1%   893ns ± 1%   -4.41%  (p=0.001 n=7+6)
Descendant8-4         990ns ± 2%   958ns ± 2%   -3.29%  (p=0.002 n=7+6)

OK! So now we have a framework for measuring whether a change helps or hurts.

How can I make my code faster?

Good question! There's no one answer for this. In general, there are three strategies:

  • Figure out a way to do less work than you did before. Avoid doing an expensive computation where it's not necessary, exit early from functions, &c.

  • Do the same work, but in a faster way; use a different algorithm, or use a different function, that's faster.

  • Do the same work, but parallelize it across multiple CPU's, or threads.

Each technique is useful in different places, and it can be hard to predict where you'll be able to extract performance improvements. It is useful to know how expensive various operations are, so you can evaluate the costs of a given operation.

It's also easy to spend a lot of time "optimizing" something only to realize it's not the bottleneck in your program. If you optimize something that takes 5% of the code's execution time, the best overall speedup you can get is 5%, even if you get the code to run instantaneously. Go's test framework has tools for figuring out where your code is spending the majority of its time. To get the best use out of them, focus on profiling the code execution for a single benchmark. Here, I'm profiling the StreamNoCtx benchmark in the log15 library.

$ go test -cpuprofile=cpu.out -benchmem -memprofile=mem.out -run=^$ -bench=StreamNoCtx -v
BenchmarkStreamNoCtx-4   	  500000	      3502 ns/op	     440 B/op	      12 allocs/op

This will generate 3 files: cpu.out, mem.out, and log15.test. These are binary files. You want to use the pprof tool to evaluate them. First let's look at the CPU profile; I've started it and then run top10 to get the top 10 functions.

$ go tool pprof log15.test cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top5
560ms of 1540ms total (36.36%)
Showing top 5 nodes out of 112 (cum >= 60ms)
      flat  flat%   sum%        cum   cum%
     180ms 11.69% 11.69%      400ms 25.97%  runtime.mallocgc
     160ms 10.39% 22.08%      160ms 10.39%  runtime.mach_semaphore_signal
     100ms  6.49% 28.57%      260ms 16.88%  github.com/inconshreveable/log15.escapeString
      60ms  3.90% 32.47%       70ms  4.55%  bytes.(*Buffer).WriteByte
      60ms  3.90% 36.36%       60ms  3.90%  runtime.stringiter2

The top 5 functions are responsible for 36% of the program's runtime. flat is how much time is spent inside of a function, cum shows how much time is spent in a function, and also in any code called by a function. Of these 5, runtime.stringiter2, runtime.mallocgc, and runtime.mach_semaphore_signal are not good candidates for optimization. They are very specialized pieces of code, and they're part of the Go runtime, so changes need to pass all tests and get approved by the Go core development team. We could potentially figure out how to call them less often though - mallocgc indicates we are creating lots of objects. Maybe we could figure out how to create fewer objects.

The likeliest candidate for improvement is the escapeString function in our own codebase. The list function is super useful for checking this.

(pprof) list escapeString
ROUTINE ======================== github.com/inconshreveable/log15.escapeString in /Users/kevin/code/go/src/github.com/inconshreveable/log15/format.go
      30ms      330ms (flat, cum) 23.40% of Total
      10ms       10ms    225:func escapeString(s string) string {
         .          .    226:	needQuotes := false
         .       90ms    227:	e := bytes.Buffer{}
         .          .    228:	e.WriteByte('"')
      10ms       50ms    229:	for _, r := range s {
         .          .    230:		if r <= ' ' || r == '=' || r == '"' {
         .          .    231:			needQuotes = true
         .          .    232:		}
         .          .    233:
         .          .    234:		switch r {
         .          .    235:		case '\\', '"':
         .          .    236:			e.WriteByte('\\')
         .          .    237:			e.WriteByte(byte(r))
         .          .    238:		case '\n':
         .          .    239:			e.WriteByte('\\')
         .          .    240:			e.WriteByte('n')
         .          .    241:		case '\r':
         .          .    242:			e.WriteByte('\\')
         .          .    243:			e.WriteByte('r')
         .          .    244:		case '\t':
         .          .    245:			e.WriteByte('\\')
         .          .    246:			e.WriteByte('t')
         .          .    247:		default:
         .      100ms    248:			e.WriteRune(r)
         .          .    249:		}
         .          .    250:	}
         .       10ms    251:	e.WriteByte('"')
         .          .    252:	start, stop := 0, e.Len()
         .          .    253:	if !needQuotes {
         .          .    254:		start, stop = 1, stop-1
         .          .    255:	}
      10ms       70ms    256:	return string(e.Bytes()[start:stop])

The basic idea here is to write a string, but add a backslash before double quotes, newlines, and tab characters. Some ideas for improvement:

  • We create a bytes.Buffer{} at the beginning of the function. We could keep a Pool of buffers, and retrieve one, so we don't need to allocate memory for a buffer each time we escape a string.

  • If a string doesn't contain a double quote, newline, tab, or carriage return, it doesn't need to be escaped. Maybe we can avoid creating the Buffer entirely for that case, if we can find a fast way to check whether a string has those characters in it.

  • If we call WriteByte twice in a row, we could probably replace it with a WriteString(), which will use a copy to move two bytes, instead of growing the array twice.

  • We call e.Bytes() and then cast the result to a string. Maybe we can figure out how to call e.String() directly.

You can then look at how much time is being spent in each area to get an idea of how much any given idea will help your benchmarks. For example, replacing WriteByte with WriteString probably won't save much time; you're at most saving the time it takes to write every quote and newline, and most strings are made up of letters and numbers and space characters instead. (It doesn't show up at all in the benchmark but that's because the benchmark writes the phrase "test message" over and over again, and that doesn't have any escape-able characters).

That's the CPU benchmark; how much time the CPU spends running each function in the codebase. There's also the memory profile; how much memory each function allocates. Let's look at that! We have to pass one of these flags to pprof to get it to show memory information.

Sample value selection option (for heap profiles):
  -inuse_space      Display in-use memory size
  -inuse_objects    Display in-use object counts
  -alloc_space      Display allocated memory size
  -alloc_objects    Display allocated object counts

Let's pass one. Notice in this case, the top 5 functions allocate 96% of the objects:

go tool pprof -alloc_objects log15.test mem.out
(pprof) top5
6612331 of 6896359 total (95.88%)
Showing top 5 nodes out of 18 (cum >= 376843)
      flat  flat%   sum%        cum   cum%
   2146370 31.12% 31.12%    6612331 95.88%  github.com/inconshreveable/log15.LogfmtFormat.func1
   1631482 23.66% 54.78%    1631482 23.66%  github.com/inconshreveable/log15.escapeString
   1540119 22.33% 77.11%    4465961 64.76%  github.com/inconshreveable/log15.logfmt
    917517 13.30% 90.42%    1294360 18.77%  github.com/inconshreveable/log15.formatShared
    376843  5.46% 95.88%     376843  5.46%  time.Time.Format

Let's look at a function:

ROUTINE ======================== github.com/inconshreveable/log15.logfmt in /Users/kevin/code/go/src/github.com/inconshreveable/log15/format.go
   1540119    4465961 (flat, cum) 64.76% of Total
         .          .     97:		if i != 0 {
         .          .     98:			buf.WriteByte(' ')
         .          .     99:		}
         .          .    100:
         .          .    101:		k, ok := ctx[i].(string)
         .    2925842    102:		v := formatLogfmtValue(ctx[i+1])
         .          .    103:		if !ok {
         .          .    104:			k, v = errorKey, formatLogfmtValue(k)
         .          .    105:		}
         .          .    106:
         .          .    107:		// XXX: we should probably check that all of your key bytes aren't invalid
         .          .    108:		if color > 0 {
         .          .    109:			fmt.Fprintf(buf, "\x1b[%dm%s\x1b[0m=%s", color, k, v)
         .          .    110:		} else {
   1540119    1540119    111:			fmt.Fprintf(buf, "%s=%s", k, v)
         .          .    112:		}
         .          .    113:	}

In the common case on line 111 (when color = 0), we're calling fmt.Fprintf to write the key, then an equals sign, then the value. Fprintf also has to allocate memory for its own byte buffer, then parse the format string, then interpolate the two variables. It might be faster, and avoid allocations, to just call buf.WriteString(k), then write the equals sign, then call buf.WriteString(v). But you'd want to benchmark it first to double check!

Conclusion

Using a combination of these techniques, I was able to improve the performance of log15 by about 30% for some common code paths, and reduce memory allocations as well. I was not expecting this, but I also found a way to speed up JSON encoding in the Go standard library by about 20%!

Want someone to benchmark/improve performance in your company's application, or teach your team more about benchmarking? I am available for consulting; email me and let's set something up!

Liked what you read? I am available for hire.