This weekend, I'm working on MAW 3.9. The single problem results in almost 500 lines of code. This is quite unexpected. The problem is stated as the following:

Write an arbitrary-precision integer arithmetic package. You should use a strategy similar to polynomial arithmetic. Compute the distribution of the digits \(0\) to \(9\) in \(2^{4000}\).

This post is the reflection about this problem.

## Which way to go?

Since the problem states "arbitrary-precision" and "use a strategy similar to polynomial arithmetic", then I can conclude that linked list is the best data structure for this problem. However, the question is how we can construct the linked list to best implement our integer arithmetic operations (i.e. addition, mulitiplication)?

We essentially have two options:

- We put the most significant digit as the the very first data node and
we put the least significant digit as the last data node. For example,
for a number \(123\), we will implement it like
`dummy->1->2->3`. - This is the exactly opposite of the first option. We put the least significant
digit as the very first data node and we put the most significant digit as
the last data node. Again, for \(123\), we will implement is like
`dummy->3->2->1`.

Let's evaluate these two options from two perspective:

- Whether we can easily construct a linked list to represent arbitrary-precision integer?
- Whether the arithmetic operations are essy to implement?

From the first perspective, for option one, each time we add a new digit to the most significant position, we insert
a new node at the very beginning of the list (i.e. right after the header node).
On the other hand, for option two, we append a new node
at the very end of the list. Since we design our `addDigit` with an input of a pointer to node (i.e. to specify
where to add node), these two options work equally well.

From the second perspective, things are different. Take arithmetic addition as an example. When we try to add
two numbers, for option one, we need to walk through the whole list to begin with the very end of the node
because we want to start with unit digit. This makes our routine complex because we need to use a while loop
to walk through the list first. For second option, situation is easier becauuse the number is implemented in the
reverse order in the list. The very first data node is the unit digit and we can directly start with addition
while we move towards the end of the list. If we need to add additional node because of carry (i.e. \(999 + 1\)
will be no longer 3-digit but 4-digit number), we can naturally pass the pointer pointing towards the current node to
the `addDigit` function.

So, we choose option two to implement our integer package.

## Memory leak

Memory leak is a very important issue to pay attention to during the testing phase. We use valgrind to help us detect if there is any leak in our code. You can reference their quick start guide and memory check user manual for the commands and error shooting.

Here are the two mistakes I made (You can check out my commit about memory leak debug):

- Always
`free`the chunk allocated by`malloc`whenever possible.

Take `multiply` function as an example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ```
integerList
multiply(integerList A, integerList B)
{
PtrToNode dummyA = A->NextDigit;
PtrToNode dummyB = B->NextDigit;
integerList tmpR = makeEmpty();
PtrToNode dummyTmpR = tmpR;
integerList R = makeEmpty();
int product, carry = 0;
int i, indent = 0;
while (dummyA != NULL)
{
while (dummyB != NULL)
{
product = dummyA->Digit * dummyB->Digit + carry;
carry = product / 10;
addDigit(product % 10, dummyTmpR);
dummyTmpR = dummyTmpR->NextDigit;
dummyB = dummyB->NextDigit;
}
if (carry > 0)
{
addDigit(carry, dummyTmpR);
dummyTmpR = dummyTmpR->NextDigit;
}
for(i = 0; i < indent; i++)
{
addDigit(0,tmpR);
}
integerList tmp = R; // prevent memory leak
R = add(R, tmpR);
deleteAll(tmp);
indent ++;
carry = 0;
deleteIntegerList(tmpR);
dummyTmpR = tmpR;
dummyA = dummyA->NextDigit;
dummyB = B->NextDigit;
}
deleteAll(tmpR);
return R;
}
``` |

We allocate `tmpR` through `makeEmpty()` in Line[7]. If we don't do anything about it
inside the function, then the memory will be lost because we have no way to reference this
chunk of memory outside the function. Local variable `tmpR` is the only reference to the
memory allocated on the heap. However, once the function is done, the local variable is destroyed
from the stack, and thus, we lose our only reference to the memory chunk. So, we need to free it
before we exit the function (Line[49]).

- Be careful with a function call inside a function call.

This type of leak is much more subtle than the first one. Originally instead of

```
integerList tmp = R;
R = add(R, tmpR);
deleteAll(tmp);
```

I only have `R = add(R, tmpR)`. This cause the leak because of the following reasoning:
Originally, we have `R` points to a list of nodes. When we do `add(R,tmpR)`, we create
a new list of nodes, which hold our addition result. Then we let `R` points towards this newly-created
list. This makes us lose the list of nodes originally pointed by `R`. That's why we introduce `tmp`.

## makeEmpty ?

Originally, I don't have this `makeEmpty` function:

```
integerList
makeEmpty()
{
integerList R = malloc(sizeof(struct Node));
R->NextDigit = NULL; // super important step
return R;
}
```

If you take a look at this function, it seems to be a wrapper around `malloc` operation, which
seems redundant (we could directly call `malloc` directly in the place that `makeEmpty` appears).
However, the key for this routine is `R->NextDigit = NULL;`. This step can be easily omitted. However,
without this step, we don't have fully control on what our newly-allocated empty list (i.e. a list with only
header node) will look like. In other words, our header node will point to somewhere (i.e. `R->NextDigit`) randomly without
our key step. This can cause serious trouble for the following routine debug. For example, we could have `R->NextDigit`
holds some address value that happens to have a node structure there with a value in it. For instance, `dummy->1`.
This can usually happen when you OS try to reuse the memory chunk you previously freed. For example, try the following experiment:

- replace
`makeEmpty`on Line[7] & line[10] in`multiply`function `multiply`works fine with`test_multiply()`solely in the test program.`multiply`won't work if we do`test_intializeInteger()`and`test_add()`before`test_multiply()`because the integer we construct will no longer be`342`in the test case but something like`3425`, where`5`is some value pointed by`R->NextDigit`.

So, always clear out the pointer by setting it to `NULL` whenever we do initialization.