NN

A small feed-forward neural network library, dimension-safe by typing. Built on ``Tensor``: a ``Layer`` is an affine map ``x |-> act (W x + b)`` whose input and output widths are tracked by the uninterpreted measures ``in_features`` / ``out_features``. ``forward`` only accepts an input vector whose length matches ``in_features`` and promises an output of length ``out_features`` — so a layer fed the wrong-width activation is a *type* error, not a numpy broadcast surprise. ``Network`` is a typed stack of layers. ``stack`` refuses to append a layer unless its ``in_features`` equals the network's current ``net_out``, and ``predict`` requires the input width to equal ``net_in``. Build a whole MLP and the widths are checked end-to-end at compile time: let l1 = NN.dense_relu 4 8 in -- 4 -> 8 let l2 = NN.dense_relu 8 3 in -- 8 -> 3 (8 must match l1's out) let net = NN.stack (NN.sequential l1) l2 in NN.predict net (Tensor.vzeros 4) -- input width 4 must match net_in "Other requirements on values": layer constructors take *positive* widths, ``make_layer`` requires the bias length to equal the weight matrix's row count, and ``softmax`` requires a non-empty vector. A single affine layer: weights, bias, and an activation tag.
Imports
open Tensor;
Table of Contents

Types

Layer

(type)
A single affine layer: weights, bias, and an activation tag.
type Layer

Network

(type)
An ordered stack of layers.
type Network

Gradients

(type)
Gradients for a network — one (weight, bias) pair per layer. Carries the same input/output widths as the network they were computed from, so they can only be applied back to a shape-compatible network.
type Gradients

Functions

relu

Rectified linear unit. Output elements are non-negative: ``[0, +inf)``.
def relu (v: Vector) : {r : Vector | dim r = dim v && lower r ≥ 0.0}

sigmoid

Logistic sigmoid. Output elements land in ``[0, 1]`` (strictly (0, 1)).
def sigmoid (v: Vector) : {r : Vector | dim r = dim v && (lower r ≥ 0.0 && upper r ≤ 1.0)}

tanh_act

Hyperbolic tangent. Output elements land in ``[-1, 1]``.
def tanh_act (v: Vector) : {r : Vector | dim r = dim v && (lower r ≥ 0.0 - 1.0 && upper r ≤ 1.0)}

softmax

Softmax over a non-empty vector; preserves length and yields a probability distribution — every entry lies in ``[0, 1]`` (and they sum to 1).
def softmax (v: {x : Vector | dim x > 0}) : {r : Vector | dim r = dim v && (lower r ≥ 0.0 && upper r ≤ 1.0)}

dense

Linear layer (identity activation).
def dense (nin: {k : Int | k > 0}) (nout: {k : Int | k > 0}) : {l : Layer | in_features l = nin && out_features l = nout}

dense_relu

def dense_relu (nin: {k : Int | k > 0}) (nout: {k : Int | k > 0}) : {l : Layer | in_features l = nin && out_features l = nout}

dense_sigmoid

def dense_sigmoid (nin: {k : Int | k > 0}) (nout: {k : Int | k > 0}) : {l : Layer | in_features l = nin && out_features l = nout}

dense_tanh

def dense_tanh (nin: {k : Int | k > 0}) (nout: {k : Int | k > 0}) : {l : Layer | in_features l = nin && out_features l = nout}

make_layer

Build a linear layer from explicit weights and bias. The bias length must equal the weight matrix's row count (the output width); the layer then maps ``cols w`` features to ``rows w``.
def make_layer (w: Matrix) (b: {v : Vector | dim v = rows w}) : {l : Layer | in_features l = cols w && out_features l = rows w}

layer_in

def layer_in (l: Layer) : {n : Int | n = in_features l && n ≥ 0}

layer_out

def layer_out (l: Layer) : {n : Int | n = out_features l && n ≥ 0}

forward

Apply one layer: ``act (W x + b)``. The input width must match ``in_features``; the result has width ``out_features``.
def forward (l: Layer) (x: {v : Vector | dim v = in_features l}) : {r : Vector | dim r = out_features l}

sequential

A one-layer network.
def sequential (l: Layer) : {n : Network | net_in n = in_features l && net_out n = out_features l}

stack

Append a layer to a network. The new layer's input width must equal the network's current output width; the extended network keeps its original input width and adopts the new layer's output width.
def stack (net: Network) (l: {x : Layer | in_features x = net_out net}) : {n : Network | net_in n = net_in net && net_out n = out_features l}

net_in_of

def net_in_of (net: Network) : {n : Int | n = net_in net && n ≥ 0}

net_out_of

def net_out_of (net: Network) : {n : Int | n = net_out net && n ≥ 0}

predict

Run the input through every layer in order. The input width must match ``net_in``; the output has width ``net_out``.
def predict (net: Network) (x: {v : Vector | dim v = net_in net}) : {r : Vector | dim r = net_out net}

predict_unchecked

Like ``predict`` but without the static ``dim x == net_in`` precondition (a width mismatch is caught at runtime by numpy). This is the entry point the ``Models`` typeclasses dispatch to: a typeclass method is generic over the model type, so it cannot carry a per-model dimension refinement. Prefer ``predict`` whenever the input width is statically known.
def predict_unchecked (net: Network) (x: Vector) : Vector

predict_scalar

Single scalar forward: output component 0, for a one-unit regression head. Unchecked like ``predict_unchecked``; the ``Models`` ``Regressor`` instance dispatches here.
def predict_scalar (net: Network) (x: Vector) : Float

mse

Mean squared error between a prediction and a target of equal length. This is the loss ``backward`` differentiates.
def mse (pred: Vector) (target: {v : Vector | dim v = dim pred}) : {f : Float | f ≥ 0.0}

cross_entropy

Categorical cross-entropy: ``-sum(target * log(pred))``. ``pred`` should be a probability vector (e.g. a ``softmax`` output); it is clipped away from zero for numerical safety, so the result is always non-negative. Both vectors must be non-empty and share a length.
def cross_entropy (pred: {v : Vector | dim v > 0}) (target: {v : Vector | dim v = dim pred}) : {f : Float | f ≥ 0.0}

softmax_cross_entropy

Softmax cross-entropy computed directly from *logits* (the raw output of a ``linear`` final layer) — numerically stable and equivalent to ``cross_entropy (softmax logits) target`` but without the intermediate clipping. This is the loss ``backward_ce`` differentiates. ``target`` should be a probability/one-hot vector of the same (non-empty) length.
def softmax_cross_entropy (logits: {v : Vector | dim v > 0}) (target: {v : Vector | dim v = dim logits}) : {f : Float | f ≥ 0.0}

backward

Gradients of the MSE loss w.r.t. every parameter of ``net``. The input width must match ``net_in`` and the target width must match ``net_out``; the resulting gradients carry those same widths, which is what lets ``sgd_step`` accept them.
def backward (net: Network) (x: {v : Vector | dim v = net_in net}) (target: {v : Vector | dim v = net_out net}) : {g : Gradients | grad_in g = net_in net && grad_out g = net_out net}

backward_ce

Gradients of the softmax cross-entropy loss — the classification counterpart of ``backward``. The network output is read as logits, so the final layer should be ``linear`` (build it with ``dense``); the softmax is applied inside. ``target`` is a one-hot / probability vector of width ``net_out``. Same shape contract as ``backward``, so the resulting gradients feed straight into ``sgd_step``.
def backward_ce (net: Network) (x: {v : Vector | dim v = net_in net}) (target: {v : Vector | dim v = net_out net}) : {g : Gradients | grad_in g = net_in net && grad_out g = net_out net}

grad_in_of

Runtime witnesses for the gradient widths.
def grad_in_of (g: Gradients) : {n : Int | n = grad_in g && n ≥ 0}

grad_out_of

def grad_out_of (g: Gradients) : {n : Int | n = grad_out g && n ≥ 0}

sgd_step

One gradient-descent update. The gradients must be shape-compatible with the network (``grad_in == net_in`` and ``grad_out == net_out``), and the learning rate must be strictly positive. The updated network keeps the same input/output widths.
def sgd_step (net: Network) (grads: {g : Gradients | grad_in g = net_in net && grad_out g = net_out net}) (lr: {f : Float | f > 0.0}) : {out : Network | net_in out = net_in net && net_out out = net_out net}

train_step

One full training step on a single example: backprop the MSE loss, then apply an SGD update. The widths line up by construction — the input and target must match the network, the learning rate must be positive, and the returned network preserves the shape.
def train_step (net: Network) (x: {v : Vector | dim v = net_in net}) (target: {v : Vector | dim v = net_out net}) (lr: {f : Float | f > 0.0}) : {out : Network | net_in out = net_in net && net_out out = net_out net}

train_step_ce

The classification counterpart of ``train_step``: backprop the softmax cross-entropy loss, then apply an SGD update. Build the network with a ``linear`` final layer (logits). Same width/learning-rate refinements, and the returned network preserves the shape.
def train_step_ce (net: Network) (x: {v : Vector | dim v = net_in net}) (target: {v : Vector | dim v = net_out net}) (lr: {f : Float | f > 0.0}) : {out : Network | net_in out = net_in net && net_out out = net_out net}

Uninterpreted

Functions declared as def f ... = uninterpreted: only their signature is known to the verifier; they have no body.

in_features

uninterpreted
def in_features : (l : Layer) → Int

out_features

uninterpreted
def out_features : (l : Layer) → Int

net_in

uninterpreted
def net_in : (n : Network) → Int

net_out

uninterpreted
def net_out : (n : Network) → Int

grad_in

uninterpreted
def grad_in : (g : Gradients) → Int

grad_out

uninterpreted
def grad_out : (g : Gradients) → Int