What Threads Are and How To Use Them Correctly, Part 1
Aside from quite cool and all the time, that is
This is a post about software engineering, which I don't typically write about.
I do not intend to become a tech Substacker. I intend to continue to write about whatever the hell I want: flossing, absurdist fiction, social dilemmas, bad fiction, and accidentally philosophy sometimes.
Today, I want to write about programming.
0. Processors don’t care about threads
My friend and I had an argument the other day — about what exactly threads are — and, by the way, it’s an argument I won. But when we challenged one another to provide an excellent, useful explanation of threads to those who work with them but don’t know what they really are, I will confess that his was a million times better than mine.
But I know more about them! Seth! You idiot! And I’m going to wield my vast platform to show the whole world the truth.
For the completely uninformed, here’s a basic explanation of threads:
Threads are a way to run multiple bits of code (multiple programs) at the same time, or at least, what feels like the same time. Processors are very fast, so they can constantly switch between many different threads (or programs), making it seem like they’re all being executed at the same time. Processors can also have multiple cores, letting them run many more threads, and offering some real true concurrency.
Basically, as a programmer, threads let me run multiple different bits of code at the same time, independently (or, if I want, not) of each other.
- me, now
But today we’re interested in what they actually are.
The first thing to understand about threads is that processors do not support them.1 Processors have multiple cores, the intention of which is to enable more threads to run, but those cores all just run whatever program is shoved into them — each core is essentially just a single-core processor, doing single-core processor things.
And single-core processors2, on which you can certainly run many threads, do not concern themselves with threads as a CPU-level concept.
Threads are entirely implemented in code. OS code, to be specific. There is actual normal code, written in C (probably), in your Operating System, which entirely and completely is responsible for multithreading.
This may seem a trivial point, but I think it’s an important one — the goal of this series is to demystify threads, and to do so, I need to demonstrate that they’re things you or I could easily build. Assuming you or I were good programmers. If your name is Seth, you aren’t.
So, to that end, I’m going to teach you a few things about how processors work which, when combined in a certain way3, allow us to implement threading.
1. Some stuff on how processors work
To make things easy, we’re going to think about a simple, single-core, 48MHz4 processor. A simple lil’ guy, running on a circuit board, doing his best to run the code you gave him. We’re not running Windows or Linux here (though the same principles apply).
Important Note: I am not trying to teach you about how to build a processor. I’m trying to teach you about threads. So I’m going to simplify many things here. If you object to the way I’ve presented anything in this section, I will nerd emoji you.
a. We’re electrocuting a crystal and using its spasms to make a clock
A processor can’t do anything without its clock. You’ve heard this term: clock cycles; instructions-per-clock; clock frequency. What do these things mean?
It means that we’ve taken a crystal which has a certain property — when you run a constant DC voltage through it, it produces perfectly consistent pulses5 of electricity on the other end — and then abused the hell out of it. Constant DC voltage in → constant pulsed output out.
We call this a “clock,” because the most intuitive thing we can do with this is keep track of time. But we also use it to run the processor at a fixed rate.
We just hook the crystal’s output up to the input power source of the processor’s thinky bits. Every time the crystal pulses, a surge of electricity courses through the processor’s wiring, being directed every which way by the internal transistors that allow us to perform all kinds of logic.
That’s literally how processors work at a deep level — a surge of electricity courses through it, causing it to perform one “cycle”: read the next instruction; execute it; move the Program Counter forward one so that, during the next cycle, we’ll read the next instruction.
All done by building dizzying layers of abstraction over electricity coursing through wires and transistors, taking different paths as the transistors demand for each cycle; flipping various transistors6 such that the next cycle will play out differently.
Observation: Processors are just state machines. The crystal sends out a pulse; the processor, powered by said pulse, performs a cycle; the pulse ends and the processor doesn’t do anything until the next one.
One pulse, one instruction7. There is no true concurrency here. A processor can only perform one instruction per cycle; threads are how we determine the best instructions to spend time on.
b. Memory Register? I barely know her!
The next thing to understand about processors, in our journey to understand threads, is registers. These funky fellas are basically just slots within the CPU itself that can hold data, usually 64-bit integers8. Some of them just let you hold data; others do special stuff when you read or write to them.
Computer programs are just a series of instructions that the CPU should execute. The possible instructions you can execute are vast and varied:
Add/multiply/subtract/divide these two numbers
Read from some address in RAM
Write to some address in RAM
Reboot the processor
All sorts of other things
Most of these instructions take in some input (we’ll call the input “arguments” because it’s a fun word) and return some output (we’ll call it “output” because I have run out of fun words). Some don’t, but I don’t care about those.
These arguments are typically provided in the form of registers.
For example, you might add two numbers together by calling the following instructions9:
mov rbx, 50 ; Move the value 50 into register RBX
mov rax, 70 ; Move the value 70 into register RAX
add rax, rbx ; Add RBX and RAX together, putting the result in RAX
Here, RAX and RBX are both “general-purpose registers”: they can hold any integer value. These registers are used to provide input to instructions, as well as to hold the output of instructions. Nifty!
Besides the few general-purpose registers available to our code, there are also shitloads of special-purpose registers.
But we only really need to care about two right now:
PC (Program Counter)
This register holds a value that the processor relies heavily upon: the address of the next instruction10 to execute.
SP (Stack Pointer)
This register holds a value that points to the top of the current stack.
A program needs a stack to function; since threads are essentially all running different “programs”, they all need their own stack.
These two registers are the main ones that we’re going to manipulate in order to implement threading.
Observation: Taken together, the registers (all of them) provide a “snapshot” of the processor’s current execution state. The processor is not alive in-between pulses from the crystal; it essentially dies and comes back to life constantly. It relies entirely on the values contained in these registers to “get back to what it was doing” each time a cycle occurs.
Therefore, if we can save a snapshot of all of the relevant registers, and load them back in later, we can put the processor into exactly the same state as it was in when we took that snapshot — it’ll proceed like it had never been interrupted. It’s like save states in an emulator.
And threads are just save states coupled with a condition11 upon which they will be resumed.
c. Interrupts
The final piece of the puzzle for implementing threads is interrupts. Interrupts are a feature of almost any processor whereby the code can tell the processor, “when condition X occurs, no matter what code is executing at the current moment, immediately execute [some function] instead.”12
We can use this for some nifty things:
When a button is pressed and the CPU detects that the pin it’s connected to is high, call some button-handling function immediately
Every 128 seconds, call some function
That’s basically the whole category of things you can do with them actually.
But this is useful! Using interrupts, our code can effectively register “callbacks” that the CPU itself will ensure get called. We can respond to stimuli immediately; we can guarantee that some bit of code will get called every so often for sure.
Question: If processors are just running code endlessly, with no support for threads at the CPU level, how can threading work? What if one thread gets into an infinite loop — how does the CPU know to “switch threads”? How do we bring the OS back into play?
Observation: Interrupts can be configured to run a given piece of code at a fixed interval, interrupting whatever code was currently executing. An example of a piece of code one might want to run in a recurring interrupt is a “Thread Scheduler.”
2. Be Continued
If you’ve been following along, I hope you’ve already had an “a-ha” moment and feel like you’ve filled in some keystone in your knowledge. If not, no worries: there’s still part two.
If you already knew all of this: leave an angry comment or otherwise punish me with your engagement!
Key Takeaways
Processors don’t manage threads at all
Any given core is really just churning through code, one instruction at a time.
Threads are fully implemented by the Operating System
Threads are how the OS prioritizes and keeps track of work
When you use them well, that is
We’ll expand on this in Part II
Threads are just snapshots of a CPU’s current state
(Plus some other stuff we’ll talk about, but that doesn’t directly relate to the “snapshot” aspect.)
Snapshots can be resumed to put the CPU right back where it was, doing what it was doing, when the snapshot was taken.
These snapshots can be created and resumed by C code that you could write! In your very own home!
Interrupts allow the Operating System to take control of execution
This is what allows the OS to regularly decide which thread is running, whether the threads like it or not.
Next Time on PGB
In the next post, we’ll dive into coroutines and how threads are a natural evolution thereof. We’ll inspect the actual data structure used by Mbed OS to define a thread. And we’ll talk about how to use threads in the way God intended.
I do want to leave you with something more about threads, so here’s the takeaway I am leading towards and which, by the end of this series, you will intuitively grasp:
Threads are ideally not about doing “background work”; they are not about sharing processor time. Threads are a way to ensure the CPU is always doing the most valuable thing for your program, without having to manually yield control between coroutines in your application logic.
Well-designed multithreaded systems use OS primitives such as message queues, semaphores, sockets, etc.
Poorly-designed multithreaded systems use sleep().13
- Me, in the next blog post
Thank you!
Okay, look, in 2025, yeah, processors have instructions to make threading implementations more efficient. Hyperthreading is misleadingly named. But despite this, threading is still implemented in the OS code! The processor doesn’t directly manage threads itself; threads are still basically just clever manipulation of the CPU by the OS.
Multi-core processors are not an exception, I’m just using language in a certain rhythmic way here. It’s not like multi-core processors have a bunch of threading-specific instructions; rather, my point is that they compose down to just a collection of single-core processors. The only real processor is a single-core processor.
You might feel a bit tricked — it seems like I said “processors don’t care about threads” and then said “let’s look at the features processors have for threads!”
But that’s not the case. The features of processors that I’m going to talk about have nothing to do with threads, truly! They’re all far more general than that.
It doesn’t matter what the actual frequency is or what specific processor we’re talking about. The important thing is: this is a cheap, simple processor, which you’re going to flash with a single binary and it’ll just run that binary on boot forever. Multiple programs running in parallel? Nothing like that here.
As a matter of fact, these pulses are not at all perfectly consistent and I just lied to you! Every crystal of this kind has some error, meaning that over time it will “skew” from other crystals. This is called clock skew and is the reason every digital clock which isn’t hooked up to the internet to constantly resync itself will necessarily run fast or slow on a given time horizon.
Latches
I see you, nerd. I see you about to correct me on this. “Instructions-per-clock!” you say. Well you know what I say? I say this: 🤓
This is a lie.
The eagle-eyed amongst you might have noticed that I'm using AMD64 syntax for an ARM chip. I'm twenty steps ahead of you at all times.
Code is stored in your RAM; just a buncha’ bytes. Whenever the processor is brought to life by a spasm of the crystal, it looks at the PC register, pulls the next instruction from RAM at the address pointed to by PC, and executes it. It then typically increments PC before ending the cycle. Or the code can change PC itself, if it wants (to jump around and shit).
What condition, you ask? Well, one might be, “until 100 milliseconds have passed since I went to sleep”. Another might be “until the mutex or semaphore I am waiting on becomes free.”
Realistically, what this means is “when pin X is high/is low/toggles, set PC to address Y”.
The worst-designed programs of any kind blindly follow dogmatic rules laid out by bloggers
🤓