# ballingt

## Reading Error Detecting and Error Correcting Codes

06 Oct 2014

Last week’s Hacker School Paper of the Week was Richard Hamming’s “Error Detecting and Error Correcting Codes." I worked through the paper (not something I do a lot) and wanted to show an example of my process reading it.

I’m probably an “active learner," insofar as that’s a useful label in describing learning styles, whatever those are.1 The way I worked through the paper was by reading a bit for a problem description, trying to solve the problem myself, then learning why my solution was suboptimal or reading the solution to a problem I couldn’t solve. This is like attempting the exercises in a math text before reading the chapter; it helps me care about solutions to problems because I want to compare my solution, or I’ve proved I can’t solve them myself.

Here’s some of that process for the first two sections of the paper. It’s humbling to share rough ideas and even rougher programs that were knocked down pretty hard only a few minutes after I wrote them out.

# Efficient Error Detection

We could obviously detect errors if we sent all the information twice - if the two messages aren’t the same, we’ve got a problem. But we should be able to be more efficient about this with a checksum, and we can mod it to make it a parity bit.

Yep! Looks like we’re on the same page.

# Single Error Correcting Codes

The most naive way I can think of for detecting errors is to send the information three times - if a bit is the same in two of the copies but not the third, that’s an error. This method might have some nice properties about the errors being likely to be independent from each other, but in the problem described we’re dealing only with recovering from a single error, and surely we can do better. Two copies of the information, each with a checksum would be better. We don’t really need the second checksum if we’re assuming just one error. But a series of parity checks should be enough as well - and we might not need as many.

My first try with parity bits was 4 bits, with two different parity checks applying to each bit of data:

``````def parity(bits):
return sum(bits) % 2

def secc4(info):
checks = [parity(bits) for bits in [info[0:2], info[2:4], info[0:4:2], info[1:4:2]]]
return info + checks + parity(info)

secc4([1, 0, 0, 1])
#[1, 0, 0, 1, 1, 1, 1, 1]
``````

This still took as much space as repeating the information, but the idea of combining information from several parity checks to locate the bad bit seems good. Next I could do a tictacttoe-like arrangement:

``````def secc9(info):
assert len(info) == 9
checks = ([parity(info[i:9:3]) for i in range(3)] +
[parity(info[i:i+3]) for i in range(0, 8, 3)])
return info + checks + parity(info)

secc9([1, 0, 0, 1, 0, 0, 1, 1, 1])
#[1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
``````

Now our check bits grow more slowly than the info. The general case:

``````import math

def secc_m(n):
#was stuck on the math for a while, so doing this the slow, straightforward way
if n == 0: return 0
for i in xrange(int(math.ceil(math.sqrt(n))) + 1):
k = 2 * i + 1
m = n - k
if m < 0:
if i == math.ceil(math.sqrt(m)):
return m

def secc(info):
num_rows = math.ceil(math.sqrt(len(info)))
checks = ([parity(info[i:len(info):num_rows]) for i in range(num_rows)] +
[parity(info[i:i+num_rows]) for i in range(0, len(info), num_rows)])
return info + checks + parity(info)

def decode_secc(data):
"""
>>> decode_secc('001100110')
[0, 0, 1, 1]
>>> decode_secc('001100111')
Traceback (most recent call last):
ValueError: bad parity check, but nothing to fix
>>> decode_secc('001000110')
[0, 0, 1, 1]
>>> decode_secc('000100110')
[0, 0, 1, 1]
>>> decode_secc('011100110')
[0, 0, 1, 1]
>>> decode_secc('101100110')
[0, 0, 1, 1]
>>> decode_secc('111100110')
[1, 1, 1, 1]
"""
if isinstance(data, str):
data = [int(x) for x in data]
m = secc_m(len(data))
info = data[:m]
checks = data[m:]
par = data[-1]
if parity(info) == par:
return info
k = len(data) - m
num_rows = (k - 1) / 2
rows = [parity(info[i*num_rows:(i+1)*num_rows]) for i in range(num_rows)]
columns = [parity(info[i:m:num_rows]) for i in range(num_rows)]
row_check = [exp == calc for exp, calc in zip(data[m:m+num_rows], rows)]
col_check = [exp == calc for exp, calc in zip(data[m+num_rows:m+2*num_rows], columns)]
if len([c for c in row_check if not c]) > 1 or len([c for c in col_check if not c]) > 1:
raise ValueError('more than one row or col bad: %r %r' % (row_check, col_check))
if len([c for c in row_check if not c]) != len([c for c in col_check if not c]):
raise ValueError('only one of (row, col) bad: %r %r' % (row_check, col_check))
for i, (r, c) in enumerate((r, c) for r in row_check for c in col_check):
if not r and not c:
info[i] = (info[i] + 1) % 2
if parity(info) == par:
return info
else:
raise ValueError('parity check fails after fixing data: %r' % (info,))
else:
raise ValueError('bad parity check, but nothing to fix')
``````

This tictactoe method needs `2*ceil(sqrt(num_bits))` bits for correcting checks and another bit for parity on either the orig data or the checks. I had trouble solving m given n because math, so iterate through the possible values to check them.