Mathematical Notation For Working Creative Technologists

As a self-taught programmer in the creative domain, while familiar with most paradigms and programming patterns due to personal curiosity, I lack a formal education in computer science, and especially it's connection to math. One of my biggest strength is the desire to seek abstraction. So for example if I have to deal with computer graphics, instead of looking to implement linear algebra myself or matrix multiplication, I would look for environments that already support it and align my own algorithms to take advantage of the existing abstractions as building blocks.

However as an academic I am also curious about the state of the art, and getting that information first-hand from the original source: papers. Most papers however express their algorithmic breakthroughs in mathematical notation, something I gloss over and try to make sense reading the text. However I quickly came to realize that the text gives the minimum required context to you, and to get the full meaning behind a solution, understanding the mathematical notation is key. As I have already talked about in a previous article every language has it's own strength when expressing concepts and conveys certain notions that get lost when translated to another language. Also I have explained that languages are composed of syntax and semantics. So this is where we will look next.

My goal with this article is to look at mathematical notation and derive meaning from it, translating them to concepts applicable for professionals in my field.

The Semantics

Semantically I understand mathematical notation because I am familiar with functional programming, especially haskell. I can read simple haskell so the conceptualization of an algorithm as mathematical formula is pretty clear to me. For example mathematical formulas abstract lower level logical constructs like loops and branching, similar to how FP abstracts loops with higher order functions. Mathematical notation is more akin to using aggregate summation symbols or mapping functions, for example.

First insight: Mathematical notation expresses relationships as equational statements. Loops are modeled using recursive statements. Many logical structures are abstracted to higher-order operations, often operating on simpler functions. Having a background in FP concepts helps immensely.

Another aspect we need to acknowledge, which clarifies much of the mystique around research papers, is that every publication is embedded within its existing field, which has well-defined concepts and shared knowledge. These concepts are often abstracted into single symbols or terms in formulas. Because for the expert audience, redefining foundational ideas repeatedly would be unnecessary noise. As a result, mathematical notation in research papers typically expresses algorithmic or theoretical insights at a very high level of abstraction, focusing only on the most fundamental distinctions and relationships.

This brings us to our second insight: Mathematical notation for us programmers is on a higher-level of abstraction compared to implementation code. If we want to get a full understanding of it, we need to resolve standard concepts on our own, similar to how we would have to look up the implementation of certain abstracted operations in source code.
For example in some machine learning paper, concepts such a linear regression are taken as given, and just mentioned as one single symbol in the formula. We however need to reimplement linear regression on our own, or find a library that can do so.

As a consequence when we dive into examples I will first map symbols to concepts and operations on their level of abstraction. I will then try to translate them to Haskell code, and finally try to translate them to either JS or Python code, for a more imperative understanding.

The Syntax

I think the more daunting aspect of mathematical formulas is the information-dense representation and the unfamiliar symbology.
Let's see an example. A non-trivial mathematical formula.

$x = \frac{- b \pm \sqrt{b^{2} - 4 a c}}{2 a}$

This really means, we have a statement of the value of x, an assignment. You see how x is not used on the right side of the assignment, therefor this is not a function definition.
This is a fundamental aspect to understand also in FP. Think of functions in functional programming as parametrized assignments, rather than scoped procedures.
We have 3 variables: b, a, c. In the context of a program this could be any variable from within our scope. While the semantics of this formula don't imply parameters explicitly, this can however become a function if we make a, b and c parameters to a function instead.
Then we have the operations, and their order. The form chosen in mathematical notation allows to graphically express relationships and order of execution. In this case we solve the top part (numerator) first, inside which we first solve the square root, but before that we solve the b to the power of two, subtracted by 4ac. This is exactly how we would do it functionally

sqrt ( (pow b 2) - a * c * 4)

Here we solve from the inside out.

or imperatively as

squared = pow(b, 2)
product = a * c * 4
diff = squared - product
sqr = sqrt(diff)

Here we execute instructions in order from top to bottom.

One interesting observation I had while translating into code, is that opposed to the mathematical notation it feels more natural to multiply the variable by the number instead of prefixing the number to the variable. Just a little wink at syntactical conceptualization. It might be just personal preference. Let's ~~solve~~ translate the remaining formula.

-b is plus-minused by the square root of our square root result: this results in two outputs, or a tuple of values if you want, once in which we do the operation with + and one with the operation with -. This is the first instance where the mathematical abstraction needs to be resolved in different ways in code.

Again functionally

( (-b - s) / a * 2 , (-b + s) / a * 2 )
where 
	s = sqrt ( (pow b 2) - a * c * 4)

Here we need to abstract the result of the square root to apply it twice with different signs. Be aware I chose the solution with simplified syntax so it is easier to follow. This is not idiomatic haskell.

And once again imperatively.

...
minus = -1*b - sqr
plus = -1*b + sqr
return (minus / a*2, plus / a*2)

Here we continue our sequence of statements taking the result stored in sqr and similar to the functional solution pack the two divergent results into a tuple to be returned.

And the beauty of it, we don't need to do the math ourselves. Or in other words, we don't need to explicitly calculate the result, just interpret the mathematical concepts into order of operations, ergo an algorithm and translate it in the idiomatic syntax of the target programming language.

Last thing now that we have unwrapped this syntactically and semantically, what does it mean conceptually. X is some range, tolerance given by $\pm$ of some fraction of a and c, divided by 2a.

This was simple enough, however. Let's look at a few more examples

$\sin (x) = \sum_{n = 0}^{\infty} \frac{(- 1)^{n} x^{2 n + 1}}{(2 n + 1)!}$

$L_{o} (ω_{o}) = \int_{Ω} f_{r} (ω_{i}, ω_{o}) L_{i} (ω_{i}) \cos θ_{i}, d ω_{i}$

$p^{'} = q \otimes p \otimes q^{- 1}, q = \cos \frac{θ}{2} + u \sin \frac{θ}{2}$

$\frac{\partial u}{\partial t} + (u \cdot \nabla) u = - \frac{1}{ρ} \nabla p + ν \nabla^{2} u + f$

$S_{d} (x_{i}, ω_{i}; x_{o}, ω_{o}) = \frac{1}{π} F_{t} (η, ω_{i}) R_{d} (| x_{i} - x_{o} |) F_{t} (η, ω_{o})$

$I \approx \frac{1}{N} \sum_{i = 1}^{N} \frac{f (X_{i})}{p (X_{i})}, X_{i} \sim p (x)$
$R_{s} = {| \frac{η_{t} \cos θ_{i} - η_{i} \cos θ_{t}}{η_{t} \cos θ_{i} + η_{i} \cos θ_{t}} |}^{2}$

So now the remaining problem is one of translation. On one hand we have unfamiliar constructs like $\sum$ or $| |$ or $\otimes$ and what function or operation they map to. Notice how they graphically structure compound operations and define relationships.

On the other hand some symbols, let's call them variables refer to pre-existing concepts that are being abstracted. For example $π$ is familiar to all creative coders as Pi, which is 3.14~, but more importantly it is used in angle based and radial operation, or anything implying circular geometry. So this idea is familiar. A variable implying concepts from a target domain. So to make sense of more exotic symbols, it is a matter of understanding what $ω$ or $Ω$ or $\nabla$ stand for in their respective domains.

This is what the remaining article is all about. I am letting an LLM generate algorithms in mathematical notation, and use them as exercise to be translated and interpreted into programming syntax. Finally I will then move to actual examples from papers I am personally interested in.

Similar to my other posts, this will be a continuous work in progress and playground that I will add to and extend over time.

LLM Generated Exercise 1: Gamma Correction

$I_{out} = I_{in}^{\frac{1}{γ}}$

Some value is given by taking an input value and putting it to the power of $1 / γ$ .

gamma_correction :: Floating a => a -> a -> a -- type signature
gamma_correction i g = i ** (1/g)

def gamma_correction(value, gamma):
	return pow(value, 1/gamma)

In this case in both implementations the actual operation translates to one line, so the difference between the two does not become so apparent. Let us try a longer one.

LLM Generated Exercise 2: Subsurface Scattering

$S_{d} (x_{i}, ω_{i}; x_{o}, ω_{o}) = \frac{1}{π} F_{t} (η, ω_{i}) R_{d} (| x_{i} - x_{o} |) F_{t} (η, ω_{o})$

Oh look another intimidating formula with a very familiar algorithm name.
Let's see what we have. We have different abstracted components, so let's separate them.

$1 / P I$
$F_{t} (η, ω_{i})$
$R_{d} (| x_{i} - x_{o} |)$
$F_{t} (η, ω_{o})$

All multiplied together. As input we have a pair of 2d-vectors (in and out), each with an x and omega component. The underscores are to differentiate between them semantically.
In programming we would use something like x_in, w_in, x_out, w_out. Inside we have yet another variable/constant: $η$ called eta. $F_{t}$ is the Fresnel transmission, and abstraction, something we need to implement later. And $R_{d}$ is the diffusion profile. Lastly we have $|$ to take care of. It is the norm of a vector, or the length of the vector without it's sign, the magnitude. Therefor the argument to $R_{d}$ is the difference of x_in and x_out, which also implies that they are vectors. Turns out that $η$ is the refractive index of the material, while $x$ and $ω$ are point and direction.

First functionally

import Linear.V3 (V3(..), dot)
import Linear.Metric (norm)

type Vector3 a = V3 a

sub_scattering 
  :: Floating a 
  => a                   -- η (refractive index)
  -> (Vector3 a, Vector3 a)  -- (x_in, w_in)
  -> (Vector3 a, Vector3 a)  -- (x_out, w_out)
  -> a
sub_scattering eta (x_in, w_in) (x_out, w_out) = 
  (1 / pi) * fresnel_trans eta w_in * rd * fresnel_trans eta w_out
  where
    r = norm (x_out - x_in)
    rd = diffusion r

fresnel_trans :: Floating a => a -> Vector3 a -> a
fresnel_trans eta w = -- Fresnel logic implementation (abstracted)

diffusion :: Floating a => a -> a -> a
diffusion r = -- diffusion implementation (abstracted)

Now imperatively.

import math

def fresnel_transmission(eta, cos_theta):
	# Fresnel logic implementation (abstracted)
    return

def diffusion_profile(distance):
    #  diffusion implementation (abstracted)
    return 

def subsurface_scattering(eta, x_in, x_out, w_in, w_out):

    # Calculate distance between input and output points
    dx = x_out[0] - x_in[0]
    dy = x_out[1] - x_in[1]
    dz = x_out[2] - x_in[2]
    r = math.sqrt(dx**2 + dy**2 + dz**2)
    
    ft_in = fresnel_transmission(eta, w_in[2])
    ft_out = fresnel_transmission(eta, w_out[2])
    rd = diffusion_profile(r)
    
    return (1 / math.pi) * ft_in * rd * ft_out

Disclaimer: Technically a true diffusion_profile implies an additional $σ_{t r}$ as parameter, but we were interested in the direct translation and I therefor omitted it.

I just wanted to show how math relates to code, and how a one line formula expresses abstraction and the sequential nature of programming succinctly.