Skip to main content

Defining the Class NP\mathcal{NP}

The first thing my brain reached for when I heard the phrase "solutions can be verified quickly" was a Rubik's Cube.

Not solving one. Checking one.

When someone hands me a solved Rubik's Cube, I know within about two seconds whether it is actually solved. Six faces. Each face one solid color. Green: done. White: done. Red, blue, orange, yellow: done. The whole check takes no effort at all. It doesn't matter how the cube got to that state or how many moves it required. The verification is instant.

The solving is not instant. I practiced for months before I could do it without stopping to think. The gap between those two experiences, checking and solving, is exactly the gap this post is about.

NP\mathcal{NP} is the class of problems where checking a solution is fast. This post builds that definition carefully, shows what it means with real examples, and explains why the class is more interesting, and stranger, than it first appears.


Starting From What We Already Know​

Post 7 defined P\mathcal{P}: the set of decision problems solvable in polynomial time on a deterministic Turing machine. Sorting, binary search, shortest paths, primality testing: all of them live there. The defining feature of P\mathcal{P} is that you can find the answer quickly.

NP\mathcal{NP} is different. The defining feature of NP\mathcal{NP} is that you can check an answer quickly.

Decision Problems Again

Just like with P\mathcal{P}, we work with decision problems: questions whose answer is either yes or no. "Is this Sudoku solution correct?" is a decision problem. "Does this graph have a coloring that uses at most three colors?" is a decision problem. The yes/no framing keeps everything clean to reason about.

Let me be precise about what "checking an answer" means, because there is more structure to it than you might expect.


Showing Your Work​

Think about how you would convince someone that a Sudoku puzzle has a valid solution.

You could describe your reasoning. You could explain how you eliminated possibilities. You could walk them through your logic step by step. Or you could just hand them the completed grid.

The completed grid is all they need. They check every row, every column, every box. If everything passes, the puzzle has a solution. Your entire argument collapses into one object: the answer itself. Hand it over, let them check, done.

This is the core idea behind NP\mathcal{NP}. For every problem in this class, a yes answer comes with a piece of evidence you can hand over. That evidence has two properties: it is short (not longer than the problem itself, roughly speaking), and checking it is fast (polynomial time). You do not need to explain how you found it. You just produce it, and verification takes care of the rest.

Computer scientists call this piece of evidence a certificate. The word sounds formal, but it is just that: the thing you hand over to prove the answer is yes. A completed Sudoku grid is a certificate. A list of moves that solves a scrambled Rubik's Cube is a certificate. A coloring of a graph that uses only three colors is a certificate.

The Formal Definition of NP\mathcal{NP}

NP\mathcal{NP} is the set of all decision problems where, for every yes instance, there exists a certificate of polynomial length that can be verified in polynomial time by a deterministic Turing machine.

Three things to notice: the certificate must be short (polynomial in the input size), the verification must be fast (polynomial time), and the definition only requires this for yes instances. More on that last point shortly.


The Rubik's Cube in More Detail​

Let me slow down on the Rubik's Cube because it is the example that made this feel real for me.

The question we can ask in NP terms is: "Given a scrambled Rubik's Cube, does a solution exist?" The certificate for a yes answer is just the sequence of moves that solves it. I hand you the move sequence. You apply those moves to the scrambled cube and check whether the result is fully solved. Checking that takes time proportional to the length of the solution sequence, which is at most 20 moves for any standard cube (this was proven in 2010). Polynomial in the input size. Fast.

Notice what this does not give me. The definition says a certificate exists and can be verified quickly. It does not say finding that certificate is fast. For the Rubik's Cube, finding the optimal 20-move solution from a given scramble is genuinely hard. Checking it once you have it is easy. That gap is exactly the point.

This is the same asymmetry I noticed when I was learning to solve the cube. Checking someone else's completed cube: two seconds. Learning to produce the solution myself: months of practice.


Problems Inside NP​

Let's look at several real problems that live in NP\mathcal{NP}, and for each one, identify what the certificate is.

The problem: Given a partially filled Sudoku grid, does a valid completion exist?

The certificate: A completely filled-in grid.

The verification: Check every row for repeated digits. Check every column. Check every 3Γ—33 \times 3 box. For an nΓ—nn \times n generalized grid, this takes O(n2)O(n^2) time, which is clearly polynomial.

Why finding is hard: The number of ways to fill in the blank cells grows exponentially with the grid size. For a standard 9Γ—99 \times 9 Sudoku, the space is still manageable with good techniques. For a generalized nΓ—nn \times n Sudoku, the problem is NP-Complete, meaning it is among the hardest problems in NP\mathcal{NP}. We will revisit that term properly in Phase 2.


The Second Definition of NP​

In Post 4, we met the nondeterministic Turing machine: a theoretical machine that, at every branch in a computation, somehow explores all possible paths simultaneously (or, equivalently, always guesses the right path). Post 4 showed that this machine is the mechanical model behind NP\mathcal{NP}.

The formal connection is this: NP\mathcal{NP} is exactly the set of problems solvable by a nondeterministic Turing machine in polynomial time.

A nondeterministic machine solves a problem in NP\mathcal{NP} in two stages. First, it nondeterministically guesses a certificate, instantaneously and perfectly, if one exists. Second, it deterministically verifies that the guessed certificate is correct, in polynomial time.

These two definitions say the same thing in different languages. The certificate definition is intuitive: here is the evidence, check it. The nondeterministic machine definition is operational: here is the kind of machine that can solve this class of problems. The N\mathcal{N} in NP\mathcal{NP} stands for Nondeterministic. NP\mathcal{NP} is Nondeterministic Polynomial time.

A Common Misconception

NP\mathcal{NP} does not stand for "Non-Polynomial." It stands for "Nondeterministic Polynomial." The problems in NP\mathcal{NP} are not defined by being hard to solve. They are defined by being easy to verify. Some of them, as we will see, are easy to solve too.


The Part That Felt Weird to Me​

When I worked through this properly, something bothered me that I didn't see addressed anywhere.

If NP\mathcal{NP} is the class of problems whose solutions are easy to verify, then what about sorting? Given a sorted list, I can verify it is sorted in O(n)O(n) time. Does sorting belong to NP\mathcal{NP}?

Yes. It does.

And binary search. And shortest paths. And primality testing. Every problem in P\mathcal{P} is also in NP\mathcal{NP}.

That felt wrong to me when I first saw it. P\mathcal{P} contains the "easy" problems. NP\mathcal{NP} contains the "hard" problems. Shouldn't they be separate?

They are not separate. P\mathcal{P} sits inside NP\mathcal{NP}:

PβŠ†NP\mathcal{P} \subseteq \mathcal{NP}

The reason is simple once you see it. If a problem can be solved in polynomial time, then it can certainly be verified in polynomial time too. The certificate is just the answer itself, and you verify it by solving the problem again. Solving is at least as powerful as verifying. So everything solvable fast is also verifiable fast.

NP\mathcal{NP} is not the class of hard problems. It is the class of problems with fast verification. Some of those problems are also easy to solve (they live in P\mathcal{P}). Others appear to require exponential time to solve, even though checking a given solution is fast.

The open question is whether those "others" actually require exponential time, or whether polynomial-time solutions for them exist and we just haven't found them yet. Which is exactly the P\mathcal{P} vs NP\mathcal{NP} question.


The Landscape So Far​

Here is the picture of what we know for certain:

PβŠ†NP\mathcal{P} \subseteq \mathcal{NP}

P\mathcal{P} sits inside NP\mathcal{NP}. This is proven. Every problem solvable in polynomial time is also verifiable in polynomial time.

What we do not know is whether P\mathcal{P} and NP\mathcal{NP} are actually the same set. The question mark in P=?NP\mathcal{P} \stackrel{?}{=} \mathcal{NP} is asking whether the containment is proper (they are truly different sets, with problems in NP\mathcal{NP} that are not in P\mathcal{P}) or whether the two sets are identical.

Almost every researcher in the field believes P≠NP\mathcal{P} \neq \mathcal{NP}. Sudoku, TSP, graph coloring, and hundreds of other problems feel fundamentally harder to solve than to verify. The intuition is strong. But intuition is not a proof, and no proof exists in either direction.

What We Have Established

NP\mathcal{NP} is the set of decision problems where every yes instance has a short certificate that can be verified in polynomial time. This is equivalent to the set of problems solvable by a nondeterministic Turing machine in polynomial time. P\mathcal{P} sits inside NP\mathcal{NP}, because anything solvable fast is also verifiable fast. The central question is whether anything else does too.


One More Thing Worth Sitting With​

The certificate definition of NP\mathcal{NP} only requires fast verification for yes instances. For no instances, nothing is guaranteed.

Consider Sudoku. If the puzzle has a valid solution, I can hand you a filled-in grid and you can check it. But if the puzzle has no solution, what certificate could I give you? There is no short piece of evidence that "proves" no solution exists, at least not obviously. You would have to check every possible completion and confirm all of them fail. That is a much harder thing to certify.

This asymmetry gives rise to the class coNP\mathcal{coNP}: the class of problems where no instances have short certificates. We will see that class properly in Phase 3. For now, just notice that NP\mathcal{NP} is not symmetric. Yes and no are not treated equally.


Up Next​

We now have both classes defined: P\mathcal{P} (solvable fast) and NP\mathcal{NP} (verifiable fast). We know PβŠ†NP\mathcal{P} \subseteq \mathcal{NP}. We know the question is whether that containment is strict.

Post 9 sits with that gap directly. We have used the words "solving" and "verifying" throughout this series, but we have not yet put them side by side and asked: how different are they, really? That is what the next post is about. A direct, honest comparison of what it actually costs to find an answer versus what it costs to check one someone hands you.

See you there. πŸš€


Azeon is an open documentation project. If something in this post is unclear or incorrect, open an issue on the GitHub repository linked in the sidebar.