Implementing a Neural Network in Pure C – Ep.5

Welcome to the fourth episode of this series. In this episode, we will add a level of abstraction to the neural network framework we are building by adding the concept of a neural network to the matrix data structure we built in the last episode.

We’ll start by porting the model definition and allocation functions into the nn.h library we’re building, and we’ll dynamically do everything using arrays, so it’ll be super easy to change the architecture of the network.

Neural Network Definition

Let’s take a look at the XOR model we hard-coded in the last episode:

C
typedef struct {
    Matrix x;

    Matrix w1;
    Matrix b1;
    Matrix a1;

    Matrix w2;
    Matrix b2;
    Matrix a2;

    Matrix y;
} Xor;

First of all, we can get rid of Y, as it is just a copy of the last activation matrix, A_2. Furthermore, the second layer takes its input directly from the activation matrix of the first layer. To be coherent, we can treat the input matrix X as the activation matrix of layer 0, which is the input to the network. Let’s rewrite it then:

C
typedef struct {
    Matrix a0;

    Matrix w1;
    Matrix b1;
    Matrix a1;

    Matrix w2;
    Matrix b2;
    Matrix a2;
} Xor;

Can you see the pattern? We have some recurring matrices, what if we use arrays instead? We can now write the Neural Network data structure directly into the nn.h library, maybe after the matrix parts

C
typedef struct {
    size_t num_layers;
    Matrix *weights;
    Matrix *biases;
    Matrix *activations; // num_layers + 1 (input)
} NeuralNetwork;

Now it’s way more elegant.

Let’s now see how to implement a function to allocate matrices inside the NeuralNetwork data structure.

Neural Network Matrices Allocation

We want this process to be as abstract as possible. It would be great to find a way to represent the architecture in a very simple way. For example, we could start with something like this:

C
size_t archi[] = {2, 2, 1};

The first element is the number of features, i.e. the columns of the input matrix. The remaining numbers are the number of neurons in each layer. In particular, the last number also indicates the number of columns in the output matrices.

So, in this case, we have something like this:

But the cool thing is that we can model any complex network! For example, with:

C
size_t archi = {2, 3, 2, 4, 1};

We get something like this:

C
// Definition
NeuralNetwork nn_alloc(size_t archi[], size_t num_layers);

//Declaration
NeuralNetwork nn_alloc(size_t archi[], size_t num_layers)
{
    size_t input_size = archi[0];
    NeuralNetwork nn;

    nn.num_layers = num_layers;
    nn.weights = malloc(num_layers * sizeof(*nn.weights));
    nn.biases = malloc(num_layers * sizeof(*nn.biases));
    nn.activations = malloc((num_layers + 1) * sizeof(*nn.activations));
    
    nn.activations[0] = mat_alloc(1, input_size);

    for (size_t i = 1; i <= nn.num_layers; i++)
    {
        nn.weights[i - 1] = mat_alloc(nn.activations[i - 1].cols, archi[i]);
        nn.biases[i - 1] = mat_alloc(1, archi[i]);
        nn.activations[i] = mat_alloc(1, archi[i]);
    }

    return nn;
}

The function assigns the number of layers from the input parameter num_layers to nn.num_layers. It then allocates memory for the weights, biases, and activations of the neural network, setting nn.weights, nn.biases, and nn.activations to point to the allocated memory.

The first activation layer is allocated based on the input size, setting nn.activations[0] to a matrix with one row and input_size columns.

Within the for loop, the code iterates over the layers of the neural network. For each layer, it starts from 1 up to num_layers. In each iteration, it allocates memory for the weights of the current layer using mat_alloc, with the number of columns equal to the size of the previous layer’s activations and the number of rows equal to the size of the current layer specified in archi[i]. It assigns the allocated matrix to nn.weights[i - 1].

Similarly, it allocates memory for the biases of the current layer, which is a matrix with one row and a number of columns equal to the size of the current layer specified in archi[i], assigning it to nn.biases[i - 1].

Lastly, it allocates memory for the activations of the current layer, which is also a matrix with one row and a number of columns equal to the size of the current layer specified in archi[i], assigning it to nn.activations[i].

The function then returns the initialized NeuralNetwork structure.

As we did with matrices, let’s write a function to randomize the neural network parameters:

C
// Definition
void nn_rand(NeuralNetwork nn, float min, float max);

// Declaration
void nn_rand(NeuralNetwork nn, float min, float max)
{
    for (size_t i = 0; i < nn.num_layers; i++)
    {
        mat_rand(nn.weights[i], min, max);
        mat_rand(nn.biases[i], min, max);
    }
}

Next, let’s write a function to print the matrix, exactly like we did with the Matrix data structure:

C
// Definition
void nn_print(NeuralNetwork nn, char *name);

// Declaration
void nn_print(NeuralNetwork nn, char *name)
{
    printf("%s: [\n", name);
    for (size_t i = 0; i < nn.num_layers; i++)
    {
        mat_print(nn.weights[i], "weights");
        mat_print(nn.biases[i], "biases");
    }
    printf("]\n");
}

Nice! One last useful macro that we will use to pass the number of layers to the nn_alloc function is the following:

C
#define ARRAY_LEN(arr) (sizeof((arr)) / sizeof((arr)[0]))

Let’s test everything now:

C
#define NN_IMPLEMENTATION
#include "nn.h"
#include "time.h"

int main(void)
{
    size_t archi[] = {2, 2, 1};
    NeuralNetwork nn = nn_alloc(archi, ARRAY_LEN(archi) - 1);
    nn_rand(nn, 0.0f, 1.0f);
    nn_print(nn, "nn");
}

Which outputs:

nn: [
weights: [
    0.000008    0.131538
    0.755605    0.458650
]
biases: [
    0.532767    0.218959
]
weights: [
    0.047045
    0.678865
]
biases: [
    0.679296
]
]

Nice! Now let’s move on with the forwarding process and let’s add for loops to that as well!

Neural Network Forward Propagation

Let’s take a look at how we implemented it in the last episode:

C
void forward(Xor xor) {
    mat_dot(xor.a1, xor.x, xor.w1);
    mat_sum(xor.a1, xor.b1);
    mat_sigf(xor.a1);

    mat_dot(xor.a2, xor.a1, xor.w2);
    mat_sum(xor.a2, xor.b2);
    mat_sigf(xor.a2);

    mat_cpy(xor.y, xor.a2);
}

We decided to not use the last matrix, Y, so it becomes very easy to rewrite it:

C
// Definition
void nn_forward(NeuralNetwork nn);

// Declaration
void nn_forward(NeuralNetwork nn)
{
    for (size_t i = 0; i < nn.num_layers; i++)
    {
        mat_dot(nn.activations[i + 1], nn.activations[i], nn.weights[i]);
        mat_sum(nn.activations[i + 1], nn.biases[i]);
        mat_sigf(nn.activations[i + 1]);
    }
}

Now it’s much better! We are making the framework more elegant at each function we re-implement. Now let’s jump to MSE!

Mean Squared Error

Before implementing the MSE function, let’s take a look at one of the assert we put at the begging of the MSE function in the previous episode:

C
assert(train_out.cols == m.y.cols);

Here, we were accessing the output matrix Y of the model m. If we want to do so in our current framework, it would be something like:

C
assert(train_out.cols == nn.activations[nn.num_layers].cols);

I have to think too much to get to the output matrix, and even accessing the input matrix is a bit counterintuitive:

C
assert(train_in.cols == nn.activations[0].cols);

So, let’s define two macros that can help us in jumping directly where we need:

C
#define NN_INPUT(nn) ((nn).activations[0])
#define NN_OUTPUT(nn) ((nn).activations[(nn).num_layers])

Much better! Let’s now port the MSE function in our framework:

C
// Definition
float nn_mse(NeuralNetwork nn, Matrix train_in, Matrix train_out);

// Declaration
float nn_mse(NeuralNetwork nn, Matrix train_in, Matrix train_out)
{
    assert(train_in.rows == train_out.rows);
    assert(train_in.cols == NN_INPUT(nn).cols);
    assert(train_out.cols == NN_OUTPUT(nn).cols);

    float result = 0.0f;

    for (size_t i = 0; i < train_in.rows; i++)
    {
        Matrix x = mat_row(train_in, i);
        Matrix y = mat_row(train_out, i);
        mat_cpy(NN_INPUT(nn), x);

        nn_forward(nn);

        for (size_t j = 0; j < train_out.cols; j++)
        {
            float diff = MAT_AT(NN_OUTPUT(nn), 0, j) - MAT_AT(y, 0, j);
            result += diff * diff;
        }
    }

    return result / train_in.rows;
} 

It’s not very different from the MSE function from the last episode, so I will not comment on it. Notice how we used the NN_INPUT and NN_OUTPUT we just defined and how elegant it looks!

Let’s port now the gradient descent!

Gradient Descent

As always, we need to implement first the gradient calculation. What we are going to do is to convert all the hard-coded computations using for loops:

C
void nn_finite_diff(NeuralNetwork nn, NeuralNetwork grad, float eps, Matrix train_in, Matrix train_out)
{
    float saved;
    float mse = nn_mse(nn, train_in, train_out);
    for (size_t i = 0; i < nn.num_layers; i++)
    {
        for (size_t j = 0; j < nn.weights[i].rows; j++)
        {
            for (size_t k = 0; k < nn.weights[i].cols; k++)
            {
                saved = MAT_AT(nn.weights[i], j, k);
                MAT_AT(nn.weights[i], j, k) += eps;
                float new_mse = nn_mse(nn, train_in, train_out);
                MAT_AT(grad.weights[i], j, k) = (new_mse - mse) / eps;
                MAT_AT(nn.weights[i], j, k) = saved;
            }
        }
        for (size_t j = 0; j < nn.biases[i].rows; j++)
        {
            for (size_t k = 0; k < nn.biases[i].cols; k++)
            {
                saved = MAT_AT(nn.biases[i], 0, k);
                MAT_AT(nn.biases[i], 0, k) += eps;
                float new_mse = nn_mse(nn, train_in, train_out);
                MAT_AT(grad.biases[i], 0, k) = (new_mse - mse) / eps;
                MAT_AT(nn.biases[i], 0, k) = saved;
            }
        }
    }
}

Unfortunately, having the weights and the biases stored in different arrays, we need to iterate over them saparately.

We can now do the same thing with the actual gradient descent function:

C
void nn_gradient_descent(NeuralNetwork nn, float eps, float rate, Matrix train_in, Matrix train_out, size_t iterations)
{
    for (size_t it = 0; it < iterations; it++)
    {
        NeuralNetwork grad = nn_alloc(nn.archi, nn.num_layers);
        nn_finite_diff(nn, grad, eps, train_in, train_out);

        for (size_t i = 0; i < nn.num_layers; i++)
        {
            for (size_t j = 0; j < nn.weights[i].rows; j++)
            {
                for (size_t k = 0; k < nn.weights[i].cols; k++)
                {
                    MAT_AT(nn.weights[i], j, k) -= rate * MAT_AT(grad.weights[i], j, k);
                }
            }
            for (size_t j = 0; j < nn.biases[i].rows; j++)
            {
                for (size_t k = 0; k < nn.biases[i].cols; k++)
                {
                    MAT_AT(nn.biases[i], j, k) -= rate * MAT_AT(grad.biases[i], j, k);
                }
            }
        }
    }
}

I know, there are a lot of for loops, so let’s try to put some order:

  • The first loop are the gradient descent iterations, the more we do, the better the model will be.
  • The second loop iterates over the network layers, applying the optimization on each of them.
  • The thirds loops iterate over the weights and biases arrays.
  • The fourths loops iterate over the single weights and biases stored in the matrices.

Results

And that’s it! The framework is finished, Let’s try it out!

C
#define NN_IMPLEMENTATION
#include "nn.h"
#include "time.h"

int main(void)
{
    srand(time(NULL));
    size_t archi[] = {2, 2, 1};
    NeuralNetwork nn = nn_alloc(archi, ARRAY_LEN(archi) - 1);
    nn_rand(nn, 0.0f, 1.0f);

    float train_set[] = {
        0, 0, 0,
        0, 1, 1,
        1, 0, 1,
        1, 1, 0
    };

    Matrix train_in = {
        .rows = 4,
        .cols = 2,
        .stride = 3,
        .data = train_set
    };
    
    Matrix train_out = {
        .rows = 4,
        .cols = 1,
        .stride = 3,
        .data = train_set + 2
    };

    printf("MSE BEFORE: %f\n", nn_mse(nn, train_in, train_out));

    nn_gradient_descent(nn, 1e-1, 5e-1, train_in, train_out, 10*1000);

    printf("MSE  AFTER: %f\n", nn_mse(nn, train_in, train_out));

    nn_print(nn, "nn");

    for (size_t i = 0; i < 4; i++)
    {
        mat_cpy(NN_INPUT(nn), mat_row(train_in, i));
        nn_forward(nn);
        printf("%f - %f: %f\n", MAT_AT(NN_INPUT(nn), 0, 0), MAT_AT(NN_INPUT(nn), 0, 1), MAT_AT(NN_OUTPUT(nn), 0, 0));
    }

    return 0;
}

And the final result is…

MSE BEFORE: 0.336299
MSE  AFTER: 0.000729
nn: [
weights: [
    4.338735    6.178598
    4.337761    6.174761
]
biases: [
    -6.928128    -2.661664
]
weights: [
    -9.527683
    8.603272
]
biases: [
    -4.110957
]
]
0.000000 - 0.000000: 0.027687
0.000000 - 1.000000: 0.972845
1.000000 - 0.000000: 0.972853
1.000000 - 1.000000: 0.025998

Let’s now create a different neural network, with just one neuron, and see of it performs! If we did everything right, it shouldn’t be able to learn the XOR logic gate (as we saw in episode 2 and 3). The cool thing about this framework is that is super-easy to change the architecture, we just need to write the architecture that we want. So, 2 inputs and 1 neuron:

C
size_t archi[] = {2, 1};

As simple as that. And the output is:

C
MSE BEFORE: 0.326047
MSE  AFTER: 0.250156
nn: [
weights: [
    -0.000011
    -0.000004
]
biases: [
    -0.049992
]
]
0.000000 - 0.000000: 0.487505
0.000000 - 1.000000: 0.487503
1.000000 - 0.000000: 0.487502
1.000000 - 1.000000: 0.487501

And it doesn’t work! Which we are very happy about!

In the next episode, we will get rid of the finite difference method that we used in the gradient descent, and use instead backpropagation, a more efficient way to compute gradients of a network with respect to its weights. Now that we fully understand the idea behind it, it will be much easier to understand!

Join the ConversationLeave a reply

Your email address will not be published. Required fields are marked *

Comment*

Name*

Website