Discovery is uncomfortable because it requires genuine uncertainty — you have to hold space for the possibility that your idea is wrong, that the problem is not real, or that the solution you love will not work. Delivery is comfortable because the question is settled and the work is executing against a plan. Organizations reward delivery — shipping — more visibly than discovery — learning — which creates an incentive to skip the uncomfortable phase and go straight to building. The most expensive product failures are delivery successes: features that are perfectly built and nobody uses. The team shipped on time, under budget, with clean code, and the feature sits unused six months after launch because the discovery phase that would have revealed the problem was skipped in favor of getting to work faster.
What "build the cheapest test of your riskiest assumption" means in practice
The riskiest assumption is not always the technical one — sometimes it is whether users will change behavior, whether they will pay, or whether the distribution channel exists. The cheapest test depends on what the assumption is. For a behavior change question, a landing page that describes the product and measures sign-ups is cheaper than a prototype. For a usability question, a Figma prototype is cheaper than code. For a value question, a Wizard-of-Oz test — where a human manually does what the software would do — is cheaper than building the automation. The goal is always to learn the most important thing for the least investment. The discipline is in identifying which assumption, if wrong, would cause you to abandon or significantly change the plan — and testing that assumption first, before investing in the assumptions that are cheaper to test or more comfortable to test.
Why kill criteria matter as much as success criteria
Teams that define success metrics before testing but not failure criteria are unconsciously setting up for confirmation bias. If the test produces ambiguous results, teams without explicit failure criteria tend to rationalize continuation — the sample was too small, the timing was bad, the users were not the right segment. Defining in advance what "this did not work" looks like — specific numbers, specific behaviors — forces the team to take the negative result seriously and prevents the most common discovery failure mode: continuing to build something that the evidence says users do not want. A kill criterion might be "fewer than fifteen percent of users who complete onboarding return within seven days." If you hit that number, you stop and go back to discovery. Without the criterion written down before the test, teams almost always find a reason to continue.