Curriculum Pricing Join cohort →
Live cohort · 10 weeks · Starting June 2026
If your model stopped learning tomorrow, would you know where in the network to look?

Neural networks.
Understood, not
just operated.

Backpropagation is a leaky abstraction. Every engineer who skips the math eventually hits a wall they cannot reason past. This course builds the foundation from scratch — so that wall never stops you.

30
seats per cohort
10
weeks live
0
black boxes
z(l) = W(l) a(l−1) + b(l)
a(l) = g(z(l))
δ(l) = (W(l+1))T δ(l+1) ⊙ g′(z(l))
∂L/∂W(l) = δ(l) (a(l−1))T
∂L/∂b(l) = δ(l)
W ← W − η · ∂L/∂W

The problem

Most engineers use neural networks.
Few actually understand them.

Courses that skip the derivation

Andrew Ng is excellent. fast.ai is practical. But both make a choice: skip the hard parts to keep you moving. That choice leaves a gap. When something breaks at the gradient level, you are guessing — not reasoning.

Frameworks that hide the mechanism

PyTorch computes gradients automatically. That is its strength — and your weakness if you never derived them yourself. Autograd is not understanding. It is the absence of the need to understand, until suddenly you need to.

The wall everyone eventually hits

Dead ReLUs. Vanishing gradients. A model that trains fine but fails silently. Engineers who understand backpropagation see these coming. Those who don't spend weeks wondering why their loss curve went flat.

Intuition without foundation

You can explain what attention does. But can you explain why the dot product of queries and keys produces meaningful similarity scores? Real understanding goes one level deeper than most courses are willing to go.


What changes

You will implement backprop in NumPy before you touch a framework.

  • Derive every equation by hand. Not follow along — actually derive, on paper, until the chain rule stops feeling abstract
  • Understand why neurons die, why gradients vanish, and exactly what happens inside your model when they do
  • Build the full forward and backward pass in raw NumPy. Watch it converge. Know every number in that loop
  • Understand attention not as a trick but as a mathematical consequence of what came before it
  • Leave with the kind of knowledge that does not expire when the next framework comes out
Other courses vs. this
ElsewhereHere
Derivation in the appendixDerivation is the session
Framework hides the gradientYou compute the gradient yourself
"Trust the autograd"You are the autograd, in NumPy
Intuition without the proofProof first, intuition follows
15% course completionLive. 30 people. Accountable
Knowledge expires with the APIMath does not have deprecation notices

Curriculum

Ten weeks. Every equation earned.

01

What is a neural network?

Not the Wikipedia answer. We start from a single neuron, understand what it is computing and why, then stack layers until the structure of a network stops feeling arbitrary and starts feeling inevitable.

2 sessions
02

Linear regression & logistic regression

These are the smallest possible neural networks. We derive the loss function, write gradient descent from scratch, and build the intuition for learning itself before a single hidden layer appears. Everything after this is built on this.

3 sessions
03

Forward pass & backward pass

We draw the computational graph. We trace exactly how a number flows forward through the network and how an error signal flows backward through the same graph in reverse. The chain rule stops being a formula and becomes a picture.

2 sessions
04

Derivation of backpropagation

The centrepiece of the course. We derive the four fundamental equations of backprop line by line — no steps skipped, no "it can be shown that." Then we implement it in NumPy. Then we watch it train. This is the session most engineers say they wish they had years earlier.

3 sessions
05

Recurrent neural networks

Sequences introduce a new problem: gradients must travel through time. We derive backpropagation through time, understand exactly why gradients vanish over long sequences, and see how LSTMs solve it structurally — not as a magic trick but as a designed solution to a specific mathematical failure.

3 sessions
06

Transformers

Attention emerged because RNNs had a fundamental limitation. We arrive here having earned that context. Self-attention, positional encoding, multi-head attention and the full encoder architecture are derived from what you already know — not presented as a new black box but as a logical continuation of everything before it.

3 sessions
07

Building an LLM from scratch — architecture

We move from understanding transformers to building one. Tokenisation, embedding layers, positional encodings, the full decoder stack — written from scratch in Python. No HuggingFace, no shortcuts. You will have a working language model architecture by the end of this module, and you will understand every parameter in it.

3 sessions
08

Building an LLM from scratch — training

Architecture is only half the story. We implement the training loop, cross-entropy loss over a vocabulary, learning rate scheduling, and gradient clipping. We train a small character-level language model from scratch, watch it learn to generate text, and understand exactly why it does — and what limits it. The course ends with something that works, and a mind that understands why.

3 sessions

Pricing

Straightforward. No subscriptions.

Recorded access
€79
one time · lifetime access
  • All session recordings, watch anytime, rewatch forever
  • All derivation notebooks and worked solutions
  • Practice problem set with full solutions
Join waitlist
Early bird · first 30 seats
Live cohort
€799
€999
per person · cohort 1 only
  • 20 live sessions over 10 weeks
  • Small cohort — you are known by name, not a number
  • Real-time Q&A every session
  • Private Discord for async questions
  • Weekly problem sets, individually reviewed
  • All recordings included — watch sessions back anytime
  • Certificate of completion
Reserve a seat →

Early bird seats are limited. Price returns to €999 once the first 30 seats are filled.


FAQ

Honest answers.

Is this for beginners or experienced engineers?
Primarily for engineers who already write code but have been using ML tools without understanding the mechanics under them. If you have used PyTorch, taken Andrew Ng's course, and still cannot explain why your loss plateaued — this course is for you. Beginners with strong maths are also welcome, but expect to work hard in the first two modules.
What maths do I actually need coming in?
You need to know what a derivative is. Not compute them fluently — just know conceptually that it measures how one thing changes with respect to another. We cover the chain rule from first principles in session one. Linear algebra intuition helps but is not a hard requirement. If you passed high school calculus, you have enough.
Why NumPy and not PyTorch?
Because PyTorch computes the gradients for you. The entire point of this course is for you to compute them yourself — to write the backward pass by hand, watch the weights update, and understand exactly what is happening at every step. Once you have done that, PyTorch becomes a tool you understand rather than one you depend on.
What if I miss a session?
Every session is recorded and shared within hours. The recordings are yours permanently. Missing a session is not ideal, but it will not break your progression. Bring your questions to the next session or drop them in the Discord — they will get answered.
When does cohort 1 start and how are sessions scheduled?
June 2026. Two live sessions per week for 10 weeks, scheduled in the evening CET. Exact times will be confirmed with waitlist members to ensure they work across time zones. Recordings mean you are never blocked even if a session conflicts.

Cohort 1 · June 2026

The gap in your understanding
is not your fault.
But it is yours to close.

Join the waitlist. Waitlist members get first access to seats and lock in the early bird price before it opens publicly.

You are on the list. One email when cohort 1 enrolment opens.

No spam. One email when seats open.