Scale and Direction: Understanding Homogeneous Functions

30 Dec, 2025

TL;DR: This post collects useful facts about homogeneous functions—functions satisfying $f (a x) = a^{r} f (x)$ . Key insight: they decompose into spherical behavior plus radial scaling. I also connect to two references in ML.

A function $f$ is (positively) r-homogeneous if $f (a x) = a^{r} f (x)$ for all $a \geq 0$ and $x \in X$ , e.g., $X \subset ℝ^{n}$ . We will use $H_{r} (X, Y)$ to denote such functions from $X$ to $Y$ and $H_{r} (X)$ for functions from $X$ to $ℝ$ . We will be using $H_{r}$ when context is sufficient and $H$ when we are talking about homogeneous functions in general.

Supposing we work in a normed space with norm $‖ . ‖$ , a fun little substitution is $x = ‖ x ‖ \cdot x / ‖ x ‖$ which gives

f (x) = ‖ x ‖^{r} f (x / ‖ x ‖) .

In other words, an $H_{r}$ function can do interesting stuff on the unit sphere and then we apply a scale term that is independent of $f$ to get the value at $x$ . Especially if $f \in H_{0}$ we get $f (x) = f (x / ‖ x ‖),$ i.e., $f$ is constant on rays.

If we are on $ℝ_{> 0} \to ℝ$ , then $f \in H_{r}$ means $f (x) = k x^{r}$ . There’s also Euler’s theorem which gives some implications about the derivatives of $f$ .

Another interesting property: if $x, y \in X$ , $a \in ℝ^{*}$ , and $f \in H_{0}$ and continuous, then

{lim}_{a \to \infty} f (a x + y) = f (x) .

There are various ways to show this, e.g., as $f \in H_{0}$ , we have

f (a x + y) = f (x + y / a) \approx f (x),

if $a$ is large, and continuity does the rest.

As building blocks

One may want to construct more complex functions by using $H$ . There are some ways to do this:

Composition: If $f \in H_{0}$ and we get some generic $g$ , then $g \circ f (a x) = g (f (a x)) = g (f (x))$ , so $g \circ f \in H_{0}$ . However, for $f \circ g (a x) = f \circ g (x)$ we need $g \in H_{1} \cup H_{0}$ .

Multiplication¹: If $f \in H_{r}$ and $g \in H_{k}$ , then $f \cdot g \in H_{r + k}$ .

Activations and Linear Maps: If $A$ is some matrix, $c$ is a positive scalar, and $σ$ is ReLU, then:

f (c x) = σ (A (c x)) = σ (c A x) = c σ (A x),

and so we have shown that $f$ is positively 1-homogeneous. Accordingly, and as long as we use appropriate activation functions, all NNs without bias² are $H_{1}$ .

Averages: All sorts of averages are also $H_{1}$ . For example, if $c > 0$ , the arithmetic mean satisfies $g (c x_{1}, \dots, c x_{n}) = c (x_{1} + \dots + x_{n}) / n = c g (x_{1}, \dots, x_{n})$ .

These building blocks appear throughout deep learning. As a practical example:

Normalization and scale separation: Batch normalization provides a practical example of scale-direction separation. By normalizing inputs to unit variance (after centering), it removes scale information—similar to our decomposition $f (x) = ‖ x ‖^{r} f (x / ‖ x ‖)$ where scale ( $‖ x ‖^{r}$ ) and direction ( $f (x / ‖ x ‖)$ ) are separated. The $ϵ$ stabilization term and learnable parameters mean BatchNorm is only approximately homogeneous, but it demonstrates how normalization relates to homogeneity in practice.

I haven’t done a particularly deep dive in the references for this, but one work that I enjoyed reading is from Merrill, W., et al.³. Among other things, the authors look into the approximate homogeneity of transformers with respect to their parameters, i.e., $f (x; c θ) \approx c^{k} f (x; θ)$ . Transformers without bias terms are shown to be approximately $H_{1}$ .

Fun fact, we cannot define a group over all $H$ functions with multiplication as the operation. That’s because not all multiplicative inverses are well-defined for elements of $H$ .↩
A further study of those appears here: Ji, Z. and Telgarsky, M., 2020. Directional convergence and alignment in deep learning. Advances in Neural Information Processing Systems, 33, pp.17176-17186.↩
Merrill, W., Ramanujan, V., Goldberg, Y., Schwartz, R. and Smith, N.A., 2021, November. Effects of parameter norm growth during transformer training: Inductive bias from gradient descent. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 1766-1781).↩

Scale and Direction: Understanding Homogeneous Functions

As building blocks

Related work