| Infrastructure Engineering
| Infrastructure Engineering
| Infrastructure Engineering
| Infrastructure Engineering
| Infrastructure Engineering
| Infrastructure Engineering
| Infrastructure Engineering
I’ve never heard of a company that has a business, that doesn’t also occasionally have things go wrong. Something going wrong might turn into a support ticket, an angry email, or an alert popping up on an on-call engineer’s phone. If there is user or business impact, and an engineer might need to respond, then it becomes an incident. After the incident, the folks involved in mitigation write an Incident Review Template, and the that document is discussed in this meeting, the Incident Re...| infraeng.dev
As the organization starts to write more Technical Specifications, you’ll eventually want a forum to discuss the key decisions. At most companies, that meeting is the Tech Spec Review. The Tech Spec Review is a forum to review feedback on new Tech Specs, resolve open points of discussion, and flag new context to be considered before finalizing the design. Secondarily, it’s a valuable forum for keeping the wider organization aware of new and upcoming technology changes.| infraeng.dev
Interview in May, 2022. Learn more about Matthew on his blog, twitter, and linkedin. Tell us a little about your current role: where do you work, your title and generally the sort of work you and your team do. I work at Spotify as a Senior Backend Infrastructure Engineer. My team builds and maintains the tools that enable Spotify engineers to deploy safely and quickly whenever they need to. We work a lot with Kubernetes, which Spotify uses to deploy and manage most of its websites and backend...| infraeng.dev
Written interview in May, 2022. Learn more about Mahdi on his website, linkedin, and his StaffEng podcast interview. Tell us a little about your current role: where do you work, your title and generally the sort of work you and your team do. I am currently a Senior Staff Engineer at 1Password, leading the Server Architecture team. We are implicated in our systems’ overall design while pushing for the modernization of legacy systems.| infraeng.dev
Written interview in early February, 2022. Learn more about Smruti on linkedin, twitter and on LeadDev. Tell us a little about your current role: where do you work, your title and generally the sort of work you and your team do. I currently lead the Data Platform group at Stripe – we operate the centralized data lake, and the bigdata, async & stream processing infrastructure for Stripe’s mission-critical business, while ensuring security, reliability and efficiency.| infraeng.dev
Fork this template on Google Docs Negotiating contracts is an important part of managing costs, but it’s also something that you only do infrequently. Particularly in an earlier stage company, you might only negotiate one large contract a year. It’s quite hard to get better at something that you do so infrequently, but using a checklist is one way to be consistent in your approach, and to ensure learnings from one negotiation carry over into the next.| infraeng.dev
In my early career roles, I worked at companies that never worried about their infrastructure costs at all. They were simply too low a cost and growing too slowly for the Finance team to pay much attention to it. This “ignore it until it’s too large to ignore” approach served me well. Until it didn’t. Working at Uber, I was caught me off guard when a new Director joined and overnight infrastructure costs were recategorized from insignificant to requiring urgent, detailed review every ...| infraeng.dev
Fork this template on Google Docs As your company gets larger and more complex, it’s easy to become embroiled in supporting incoming asks from other teams. That’s important work, but it’s also important that your team is operating effectively and prioritizing your goals in addition to the goals of other teams making requests. If you’re getting mixed signals on whether your team is doing the right work, the Business Review Template can help cut through the confusion.| infraeng.dev
Early on in your company’s lifetime, you’ll form the seed of your infrastructure organization: a small team of four to eight engineers. Maybe you’ll call it the infrastructure team. It’s very easy to route infrastructure requests, because they all go to that one team. Later on, things are easy as well. You have seventy engineers spread across eight to ten mutually exclusive and collectively exhaustive teams with names like Storage, Traffic, and Compute. You’ll pull up the organizati...| Infrastructure Engineering
Example survey, Example analysis While you should rely on your organizational metrics to measure developer productivity, quantitative measurement will sometimes miss important context. For example, you might be proud of how the backend developers are having a great time with their CI/CD, only to realize that the iOS engineers hate their release process that isn’t instrumented in any of your dashboards. A Developer Productivity Survey is an effective tool to bring qualitative feedback into y...| Infrastructure Engineering
Fork this template on Google Docs Healthy engineering organizations make a lot of technical decisions. Many of those decisions impact multiple teams (Frontend, Backend) and functions (Engineering, Product, Customer Success, Finance). It’s normal to either feel like you’re moving too slow (“too many stakeholders in every decision”) or that your reckless pace creates frequent rework as issues are discovered late (“this problem would have been obvious if you’d just talked to Security...| Infrastructure Engineering
Fork this template on Google Docs Something about the close-knit social chemistry of a small team gives them a shared brain. Of course you know that last week Michelle decided all new frontend work would happen in Typescript. Ambient awareness is less and less effective as an alignment tool as an organization grows, and becomes quite unreliable as an organization grows past ~twenty folks. One tool that folks use to scale alignment around key decisions is the “decision log.”| Infrastructure Engineering
Fork the org growth template and the org design template. Having been involved in quite a few budget and headcount processes over the years, one thing that continues to surprise me is how often folks make major headcount requests without having done any organzational design of those those requested heads will compose into an organization. The good news is that the high-level sort of organizational design required for headcount planning is abstract, low granularity, and it’ll likely only tak...| Infrastructure Engineering
Fork this template on Google Docs It’s impossible to avoid headcount planning when running a large team within an engineering organization. On the other hand, many folks find it’s impossible to be usefully involved in headcount planning when the folks running the process aren’t closely involved with your work: Infrastructure? Oh, that’s going great: no crashes or breaches lately, we don’t need to invest here!| Infrastructure Engineering
Fork this template on Google Sheets At some point in your planning process, you’re going to get a headcount target. It’s tempting to immediately jump into allocating that headcount–we’re going to do so much this year–but it’s helpful to take an hour to model out recruiting capacity to understand whether your headcount target is realistic. Once you’ve gone through the exercise, you’ll finish with a simple chart that shows your progress over the year towards that headcount targe...| Infrastructure Engineering
TODO: Find better vocabulary to distinguish between “leadership team” in your org (that you manage) and “leadership team” that you’re a member of or report to I once walked into an annual headcount planning session to learn that the other engineering managers in the room had already decided together how they would reallocate the senior members from the infrastructure organization that I supported to the teams that they ran. This was, they assured me, optimal for their roadmaps.| Infrastructure Engineering
Interview occurred in February, 2022. Read more from Shawn on his blog, twitter, and his book, The Coding Career Handbook. Tell us a little about your current role: where do you work, your title and generally the sort of work you and your team do. I’m currently Head of Developer Experience at Temporal.io, an open source workflow engine for long running, durable processes powering companies as small as 2-person YCombinator startups, to enterprises as large as Stripe, Snap, Datadog, Netflix, ...| infraeng.dev