Skip to main content
Career insights

The System Design Vocabulary Every PM Needs (No Engineering Background Required)

5 min read

When an engineer says "this will create a bottleneck in the write path" or "we would need to add a cache layer for this to work at scale," your ability to respond meaningfully depends on vocabulary. If you nod without understanding, you cannot push back, ask clarifying questions, or make an informed decision. Vocabulary is the prerequisite for genuine collaboration with an engineering team.

The three-tier model you need to know

Almost every software product is built on three layers. The frontend is what users see — the web page, the mobile app, the interface. The backend is where business logic runs — the code that decides what a user is allowed to do, calculates prices, sends emails, and coordinates everything. The database is where data is stored — user accounts, orders, messages, product information. Most features touch all three layers, which is why engineering estimates are often larger than product managers expect. A simple-looking change to the UI frequently requires backend logic and a database schema change to support it.

Scalability in one paragraph

When more users arrive, you have two options for handling the load. Vertical scaling means making your existing server bigger — more CPU, more memory. It is simple but has a ceiling. Horizontal scaling means adding more servers and distributing traffic across them. It has no ceiling but requires more complex architecture, including a load balancer that routes incoming requests across the server pool. Most modern products are built to scale horizontally. When engineering says a feature does not scale, they usually mean it was built for vertical assumptions and cannot be easily distributed.

The terms that come up in feature discussions

Latency is how fast the system responds — measured in milliseconds. Throughput is how many requests the system can handle per second. A bottleneck is the component that is limiting both — the slowest part of the chain. A cache stores the results of expensive operations so the system does not have to recalculate them for every request — if a thousand users ask for the same product page, a cache serves it once and reuses the result. Rate limiting is how you prevent one user or one client from consuming all of the system's capacity — it is why APIs limit you to a certain number of requests per minute.

The microservices conversation

When engineering says "we need to break this out as a separate service," they mean a feature or system has grown complex enough that it should run independently rather than as part of the main application. A separate service can be deployed, scaled, and updated without touching the rest of the product. This is usually a weeks-long engineering project and comes with significant testing and monitoring requirements. As a PM, when you hear this suggestion, it is worth asking: what problem does this solve, what is the scope of the work, and what does the product experience look like during the migration?

Keep learning

Ready to make the move?

Explore structured learning paths for every non-coding tech role — free to start, no signup required.

Browse all roles
← All articles