Live cohort · 10 weeks · Starting June 2026

If your model stopped learning tomorrow, would you know where in the network to look?

Neural networks.
Understood, not
just operated.

Backpropagation is a leaky abstraction. Every engineer who skips the math eventually hits a wall they cannot reason past. This course builds the foundation from scratch — so that wall never stops you.

Reserve your seat → See the curriculum

seats per cohort

weeks live

black boxes

z^(l) = W^(l) a^(l−1) + b^(l)
a^(l) = g(z^(l))
δ^(l) = (W^(l+1))^T δ^(l+1) ⊙ g′(z^(l))
∂L/∂W^(l) = δ^(l) (a^(l−1))^T
∂L/∂b^(l) = δ^(l)
W ← W − η · ∂L/∂W

The problem

Most engineers use neural networks.
Few actually understand them.

Courses that skip the derivation

Andrew Ng is excellent. fast.ai is practical. But both make a choice: skip the hard parts to keep you moving. That choice leaves a gap. When something breaks at the gradient level, you are guessing — not reasoning.

Frameworks that hide the mechanism

PyTorch computes gradients automatically. That is its strength — and your weakness if you never derived them yourself. Autograd is not understanding. It is the absence of the need to understand, until suddenly you need to.

The wall everyone eventually hits

Dead ReLUs. Vanishing gradients. A model that trains fine but fails silently. Engineers who understand backpropagation see these coming. Those who don't spend weeks wondering why their loss curve went flat.

Intuition without foundation

You can explain what attention does. But can you explain why the dot product of queries and keys produces meaningful similarity scores? Real understanding goes one level deeper than most courses are willing to go.

What changes

You will implement backprop in NumPy before you touch a framework.

Derive every equation by hand. Not follow along — actually derive, on paper, until the chain rule stops feeling abstract
Understand why neurons die, why gradients vanish, and exactly what happens inside your model when they do
Build the full forward and backward pass in raw NumPy. Watch it converge. Know every number in that loop
Understand attention not as a trick but as a mathematical consequence of what came before it
Leave with the kind of knowledge that does not expire when the next framework comes out

Other courses vs. this

Elsewhere	Here
Derivation in the appendix	Derivation is the session
Framework hides the gradient	You compute the gradient yourself
"Trust the autograd"	You are the autograd, in NumPy
Intuition without the proof	Proof first, intuition follows
15% course completion	Live. 30 people. Accountable
Knowledge expires with the API	Math does not have deprecation notices

Curriculum

Ten weeks. Every equation earned.

What is a neural network?

Not the Wikipedia answer. We start from a single neuron, understand what it is computing and why, then stack layers until the structure of a network stops feeling arbitrary and starts feeling inevitable.

2 sessions

Linear regression & logistic regression

These are the smallest possible neural networks. We derive the loss function, write gradient descent from scratch, and build the intuition for learning itself before a single hidden layer appears. Everything after this is built on this.

3 sessions

Forward pass & backward pass

We draw the computational graph. We trace exactly how a number flows forward through the network and how an error signal flows backward through the same graph in reverse. The chain rule stops being a formula and becomes a picture.

2 sessions

Derivation of backpropagation

The centrepiece of the course. We derive the four fundamental equations of backprop line by line — no steps skipped, no "it can be shown that." Then we implement it in NumPy. Then we watch it train. This is the session most engineers say they wish they had years earlier.

3 sessions

Recurrent neural networks

Sequences introduce a new problem: gradients must travel through time. We derive backpropagation through time, understand exactly why gradients vanish over long sequences, and see how LSTMs solve it structurally — not as a magic trick but as a designed solution to a specific mathematical failure.

3 sessions

Transformers

Attention emerged because RNNs had a fundamental limitation. We arrive here having earned that context. Self-attention, positional encoding, multi-head attention and the full encoder architecture are derived from what you already know — not presented as a new black box but as a logical continuation of everything before it.

3 sessions

Building an LLM from scratch — architecture

We move from understanding transformers to building one. Tokenisation, embedding layers, positional encodings, the full decoder stack — written from scratch in Python. No HuggingFace, no shortcuts. You will have a working language model architecture by the end of this module, and you will understand every parameter in it.

3 sessions

Building an LLM from scratch — training

Architecture is only half the story. We implement the training loop, cross-entropy loss over a vocabulary, learning rate scheduling, and gradient clipping. We train a small character-level language model from scratch, watch it learn to generate text, and understand exactly why it does — and what limits it. The course ends with something that works, and a mind that understands why.

3 sessions

Pricing

Straightforward. No subscriptions.

Recorded access

€79

one time · lifetime access

All session recordings, watch anytime, rewatch forever
All derivation notebooks and worked solutions
Practice problem set with full solutions

Join waitlist

Early bird · first 30 seats

Live cohort

€799

€999

per person · cohort 1 only

20 live sessions over 10 weeks
Small cohort — you are known by name, not a number
Real-time Q&A every session
Private Discord for async questions
Weekly problem sets, individually reviewed
All recordings included — watch sessions back anytime
Certificate of completion

Reserve a seat →

Early bird seats are limited. Price returns to €999 once the first 30 seats are filled.

FAQ

Honest answers.

Is this for beginners or experienced engineers?

Primarily for engineers who already write code but have been using ML tools without understanding the mechanics under them. If you have used PyTorch, taken Andrew Ng's course, and still cannot explain why your loss plateaued — this course is for you. Beginners with strong maths are also welcome, but expect to work hard in the first two modules.

What maths do I actually need coming in?

You need to know what a derivative is. Not compute them fluently — just know conceptually that it measures how one thing changes with respect to another. We cover the chain rule from first principles in session one. Linear algebra intuition helps but is not a hard requirement. If you passed high school calculus, you have enough.

Why NumPy and not PyTorch?

Because PyTorch computes the gradients for you. The entire point of this course is for you to compute them yourself — to write the backward pass by hand, watch the weights update, and understand exactly what is happening at every step. Once you have done that, PyTorch becomes a tool you understand rather than one you depend on.

What if I miss a session?

Every session is recorded and shared within hours. The recordings are yours permanently. Missing a session is not ideal, but it will not break your progression. Bring your questions to the next session or drop them in the Discord — they will get answered.

When does cohort 1 start and how are sessions scheduled?

June 2026. Two live sessions per week for 10 weeks, scheduled in the evening CET. Exact times will be confirmed with waitlist members to ensure they work across time zones. Recordings mean you are never blocked even if a session conflicts.

Neural networks.Understood, notjust operated.

Most engineers use neural networks.Few actually understand them.

Courses that skip the derivation

Frameworks that hide the mechanism

The wall everyone eventually hits

Intuition without foundation

You will implement backprop in NumPy before you touch a framework.

Ten weeks. Every equation earned.

What is a neural network?

Linear regression & logistic regression

Forward pass & backward pass

Derivation of backpropagation

Recurrent neural networks

Transformers

Building an LLM from scratch — architecture

Building an LLM from scratch — training

Straightforward. No subscriptions.

Honest answers.

The gap in your understandingis not your fault.But it is yours to close.

Neural networks.
Understood, not
just operated.

Most engineers use neural networks.
Few actually understand them.

The gap in your understanding
is not your fault.
But it is yours to close.