Defining the Class $\mathcal{NP}$

The first thing my brain reached for when I heard the phrase "solutions can be verified quickly" was a Rubik's Cube.

Not solving one. Checking one.

When someone hands me a solved Rubik's Cube, I know within about two seconds whether it is actually solved. Six faces. Each face one solid color. Green: done. White: done. Red, blue, orange, yellow: done. The whole check takes no effort at all. It doesn't matter how the cube got to that state or how many moves it required. The verification is instant.

The solving is not instant. I practiced for months before I could do it without stopping to think. The gap between those two experiences, checking and solving, is exactly the gap this post is about.

$\mathcal{NP}$ is the class of problems where checking a solution is fast. This post builds that definition carefully, shows what it means with real examples, and explains why the class is more interesting, and stranger, than it first appears.

Starting From What We Already Know

Post 7 defined $\mathcal{P}$ : the set of decision problems solvable in polynomial time on a deterministic Turing machine. Sorting, binary search, shortest paths, primality testing: all of them live there. The defining feature of $\mathcal{P}$ is that you can find the answer quickly.

$\mathcal{NP}$ is different. The defining feature of $\mathcal{NP}$ is that you can check an answer quickly.

Decision Problems Again

Just like with $\mathcal{P}$ , we work with decision problems: questions whose answer is either yes or no. "Is this Sudoku solution correct?" is a decision problem. "Does this graph have a coloring that uses at most three colors?" is a decision problem. The yes/no framing keeps everything clean to reason about.

Let me be precise about what "checking an answer" means, because there is more structure to it than you might expect.

Showing Your Work

Think about how you would convince someone that a Sudoku puzzle has a valid solution.

You could describe your reasoning. You could explain how you eliminated possibilities. You could walk them through your logic step by step. Or you could just hand them the completed grid.

The completed grid is all they need. They check every row, every column, every box. If everything passes, the puzzle has a solution. Your entire argument collapses into one object: the answer itself. Hand it over, let them check, done.

This is the core idea behind $\mathcal{NP}$ . For every problem in this class, a yes answer comes with a piece of evidence you can hand over. That evidence has two properties: it is short (not longer than the problem itself, roughly speaking), and checking it is fast (polynomial time). You do not need to explain how you found it. You just produce it, and verification takes care of the rest.

Computer scientists call this piece of evidence a certificate. The word sounds formal, but it is just that: the thing you hand over to prove the answer is yes. A completed Sudoku grid is a certificate. A list of moves that solves a scrambled Rubik's Cube is a certificate. A coloring of a graph that uses only three colors is a certificate.

The Formal Definition of

\mathcal{NP}

$\mathcal{NP}$ is the set of all decision problems where, for every yes instance, there exists a certificate of polynomial length that can be verified in polynomial time by a deterministic Turing machine.

Three things to notice: the certificate must be short (polynomial in the input size), the verification must be fast (polynomial time), and the definition only requires this for yes instances. More on that last point shortly.

The Rubik's Cube in More Detail

Let me slow down on the Rubik's Cube because it is the example that made this feel real for me.

The question we can ask in NP terms is: "Given a scrambled Rubik's Cube, does a solution exist?" The certificate for a yes answer is just the sequence of moves that solves it. I hand you the move sequence. You apply those moves to the scrambled cube and check whether the result is fully solved. Checking that takes time proportional to the length of the solution sequence, which is at most 20 moves for any standard cube (this was proven in 2010). Polynomial in the input size. Fast.

Notice what this does not give me. The definition says a certificate exists and can be verified quickly. It does not say finding that certificate is fast. For the Rubik's Cube, finding the optimal 20-move solution from a given scramble is genuinely hard. Checking it once you have it is easy. That gap is exactly the point.

This is the same asymmetry I noticed when I was learning to solve the cube. Checking someone else's completed cube: two seconds. Learning to produce the solution myself: months of practice.

Problems Inside NP

Let's look at several real problems that live in $\mathcal{NP}$ , and for each one, identify what the certificate is.

🧩 Sudoku
🗺️ Traveling Salesman
🎨 Graph Coloring
➕ Subset Sum

The problem: Given a partially filled Sudoku grid, does a valid completion exist?

The certificate: A completely filled-in grid.

The verification: Check every row for repeated digits. Check every column. Check every $3 \times 3$ box. For an $n \times n$ generalized grid, this takes $O(n^2)$ time, which is clearly polynomial.

Why finding is hard: The number of ways to fill in the blank cells grows exponentially with the grid size. For a standard $9 \times 9$ Sudoku, the space is still manageable with good techniques. For a generalized $n \times n$ Sudoku, the problem is NP-Complete, meaning it is among the hardest problems in $\mathcal{NP}$ . We will revisit that term properly in Phase 2.

The Second Definition of NP

In Post 4, we met the nondeterministic Turing machine: a theoretical machine that, at every branch in a computation, somehow explores all possible paths simultaneously (or, equivalently, always guesses the right path). Post 4 showed that this machine is the mechanical model behind $\mathcal{NP}$ .

The formal connection is this: $\mathcal{NP}$ is exactly the set of problems solvable by a nondeterministic Turing machine in polynomial time.

A nondeterministic machine solves a problem in $\mathcal{NP}$ in two stages. First, it nondeterministically guesses a certificate, instantaneously and perfectly, if one exists. Second, it deterministically verifies that the guessed certificate is correct, in polynomial time.

These two definitions say the same thing in different languages. The certificate definition is intuitive: here is the evidence, check it. The nondeterministic machine definition is operational: here is the kind of machine that can solve this class of problems. The $\mathcal{N}$ in $\mathcal{NP}$ stands for Nondeterministic. $\mathcal{NP}$ is Nondeterministic Polynomial time.

A Common Misconception

$\mathcal{NP}$ does not stand for "Non-Polynomial." It stands for "Nondeterministic Polynomial." The problems in $\mathcal{NP}$ are not defined by being hard to solve. They are defined by being easy to verify. Some of them, as we will see, are easy to solve too.

The Part That Felt Weird to Me

When I worked through this properly, something bothered me that I didn't see addressed anywhere.

If $\mathcal{NP}$ is the class of problems whose solutions are easy to verify, then what about sorting? Given a sorted list, I can verify it is sorted in $O(n)$ time. Does sorting belong to $\mathcal{NP}$ ?

Yes. It does.

And binary search. And shortest paths. And primality testing. Every problem in $\mathcal{P}$ is also in $\mathcal{NP}$ .

That felt wrong to me when I first saw it. $\mathcal{P}$ contains the "easy" problems. $\mathcal{NP}$ contains the "hard" problems. Shouldn't they be separate?

They are not separate. $\mathcal{P}$ sits inside $\mathcal{NP}$ :

$\mathcal{P} \subseteq \mathcal{NP}$

The reason is simple once you see it. If a problem can be solved in polynomial time, then it can certainly be verified in polynomial time too. The certificate is just the answer itself, and you verify it by solving the problem again. Solving is at least as powerful as verifying. So everything solvable fast is also verifiable fast.

$\mathcal{NP}$ is not the class of hard problems. It is the class of problems with fast verification. Some of those problems are also easy to solve (they live in $\mathcal{P}$ ). Others appear to require exponential time to solve, even though checking a given solution is fast.

The open question is whether those "others" actually require exponential time, or whether polynomial-time solutions for them exist and we just haven't found them yet. Which is exactly the $\mathcal{P}$ vs $\mathcal{NP}$ question.

The Landscape So Far

Here is the picture of what we know for certain:

$\mathcal{P} \subseteq \mathcal{NP}$

$\mathcal{P}$ sits inside $\mathcal{NP}$ . This is proven. Every problem solvable in polynomial time is also verifiable in polynomial time.

What we do not know is whether $\mathcal{P}$ and $\mathcal{NP}$ are actually the same set. The question mark in $\mathcal{P} \stackrel{?}{=} \mathcal{NP}$ is asking whether the containment is proper (they are truly different sets, with problems in $\mathcal{NP}$ that are not in $\mathcal{P}$ ) or whether the two sets are identical.

Almost every researcher in the field believes $\mathcal{P} \neq \mathcal{NP}$ . Sudoku, TSP, graph coloring, and hundreds of other problems feel fundamentally harder to solve than to verify. The intuition is strong. But intuition is not a proof, and no proof exists in either direction.

What We Have Established

$\mathcal{NP}$ is the set of decision problems where every yes instance has a short certificate that can be verified in polynomial time. This is equivalent to the set of problems solvable by a nondeterministic Turing machine in polynomial time. $\mathcal{P}$ sits inside $\mathcal{NP}$ , because anything solvable fast is also verifiable fast. The central question is whether anything else does too.

One More Thing Worth Sitting With

The certificate definition of $\mathcal{NP}$ only requires fast verification for yes instances. For no instances, nothing is guaranteed.

Consider Sudoku. If the puzzle has a valid solution, I can hand you a filled-in grid and you can check it. But if the puzzle has no solution, what certificate could I give you? There is no short piece of evidence that "proves" no solution exists, at least not obviously. You would have to check every possible completion and confirm all of them fail. That is a much harder thing to certify.

This asymmetry gives rise to the class $\mathcal{coNP}$ : the class of problems where no instances have short certificates. We will see that class properly in Phase 3. For now, just notice that $\mathcal{NP}$ is not symmetric. Yes and no are not treated equally.

Up Next

We now have both classes defined: $\mathcal{P}$ (solvable fast) and $\mathcal{NP}$ (verifiable fast). We know $\mathcal{P} \subseteq \mathcal{NP}$ . We know the question is whether that containment is strict.

Post 9 sits with that gap directly. We have used the words "solving" and "verifying" throughout this series, but we have not yet put them side by side and asked: how different are they, really? That is what the next post is about. A direct, honest comparison of what it actually costs to find an answer versus what it costs to check one someone hands you.

See you there. 🚀

Azeon is an open documentation project. If something in this post is unclear or incorrect, open an issue on the GitHub repository linked in the sidebar.

Starting From What We Already Know​

Showing Your Work​

The Rubik's Cube in More Detail​

Problems Inside NP​

The Second Definition of NP​

The Part That Felt Weird to Me​

The Landscape So Far​

One More Thing Worth Sitting With​

Up Next​