Next month will mark the one year anniversary of when I emerged back into the real world, staggering into the light from an 18 year stay in the developer mines at Google. Like in those sci-fi stories where an astronaut on a long voyage returns to Earth, I was thrust back into a world that had changed without me being a part of it. It was an interesting once-in-a-lifetime experience seeing how everything had evolved, and how they compared to the internal Google developer experience.
The key difference between the two worlds is that Google is almost completely using its own stack, which is very different from the outside world, even when the basic components are exposed through Google Cloud. Within Google, that means you have to become an expert on a very specialized set of tools, and that extends to almost every part of software engineering. That expertise is mostly worthless in the outside world. And new hires also lose much of their expertise, which is troubling. So often I’m wondering what the external equivalent of some internal technology is, and so are so many other people that there’s a site for it! Maybe the isolated technology stack helps Google retention in some way, but at the current moment in the industry, retention seems like either a negative, or at least not a priority, so right now this friction is a real loss for everyone.
The real world, though, can suffer from a lack of cohesion. I used to program in C++ at Google, and we used pretty modern C++, which was great. Now I program in Python. Python has type information which can be checked by certain tools. It has asyncio. It’s gotten much more sophisticated since I last used it, and it’s been pretty sophisticated for about 10 years. But the real world is all over the place on how they use Python, and it’s mostly stuck in the past. Virtually no library includes type annotation in their documentation. Many common libraries do not use async at all. This wouldn’t happen at Google; there would be a company-wide effort to update usage to reflect the state of the art programming practices. The nice aspect for real-world users is that they rarely are forced to migrate. But if you don’t migrate, the rate of progress of the ecosystem is glacial, and that affects everyone, even new users doing new things.
There are other highlights of great things about Google’s internal tools. The observability is fantastic. I never had a problem with getting my metrics into our system. In contrast, using opentelemetry in concert with Google Cloud’s exporter is something that we’ve learned will just be logging errors no matter what we try, and the most we can do is minimize those errors. Such a common thing to need, but evidently even common things are fairly broken.
But our trust in Google’s internal tools has not always paid off. I love Google’s internal build system, blaze, so using bazel seemed like a natural choice. It turned out to be completely unworkable, even for extremely common things. I couldn’t make heads or tails out of the documentation, and everything suffered from the fact that bazel was going through some architectural transitions. The classic Google choice between the deprecated way and the way that doesn’t work yet. We ended up using pants, and we’re pretty happy with it.
Sometimes the real world is better. It’s fairly easy to create an alert in Google Alerts, for example. You have a metric graph and essentially say that if this metric is above or below the line for some amount of time, raise the alert. Internally at Google, you’d have to do this with Python. No idea why. And that’s the easy method, sometimes you were stuck with the old system, borgmon, and knowing how to deal with that is some expertise I’ll never make use of again.
Docker is a good example of the real world being far ahead of Google. We didn’t have a real internal equivalent. The functionality it provides wasn’t as critical at Google; the homogeny of Google’s production machines meant that ensuring the proper environment was mostly not an issue. Mostly. There’s still plenty of teams who are occasionally affected by experiments that they have no control over. Docker would help with that. But Docker’s another example of the issues with control and ecosystem that we saw before. If everyone can tightly control their runtime environment, evolving that runtime is hard. That runtime environment might benefit from many improvements, but smaller companies have neither the scale or willingness to try. Cross-fleet improvements are powerful, and like language migrations, they require each team to sacrifice a bit. That can never happen outside of a large company.
The best thing about the real world, though, is probably all the tools that let you do things quickly at a small scale. SQL! Cloud functions! Internally at Google, you’re pushed to other tools that really are optimized for scalability or ease of management. At Google, it’s hard to scale down. It is easier than it used to be, but that’s really not what Google is optimizing for.
All in all, I think if you are doing something quick and easy, the real world is better for this. The costs that Google imposes are too high. But that same freedom means that you must live in tech ecosystems that aren’t as healthy. The larger and more complicated your project gets, the better it is to be in a tightly controlled environment like Google’s. May we all be so successful!
Great post!
> Docker is a good example of the real world being far ahead of Google. We didn’t have a real internal equivalent.
It has been a while since my Google days, but didn't Midas / MPM packages provide some of the functionality of Docker?