06 — Dice Simulator, Wrap Up

This video finally put the random numbers to bed with just one more false start.

The hardware approach seemed like it would be perfect. You program the counter to repeatedly count from 0 to 5 with virtually no program needed. The problem was that the hardware went on counting after the button press. This problem was greatly improved by slowing the counter down.

HWDice.adx

!MOV     #0x04,0            ; "graphics" for dice faces
!MOV     #0x12,1
!MOV     #0x15,2
!MOV     #0x1b,3
!MOV     #0x37,4
!MOV     #0x3f,5
set button,0x80
    CALL   initialise
mainloop:
    MOV     #0,PORTB        ; all LEDs off
;
; Wait for a button press
;
waitbutton:
    CALL    shake           ; rattle the dice
    BIT     #button,PIND
    JEQ     waitbutton
;
; Get value of dice
;
    CALL    roll            ; roll out dice
;
; Display on LEDs
;
    CALL    showleds
;
; Wait for button release
;
    WAIT    5000            ; half second minimum display
waitup:
    BIT     #button,PIND
    JNE     waitup
    JMP     mainloop
;
; shake -- continue virtual shake of dice
; R1 -- seed
shake:
    RET                     ; happens automatically
;
; roll -- get result of roll
; Inputs:
;   R1 -- dice seed value
; Outputs:
;   R1 -- dice value number [0:5]
roll:
    MOV     TCNT0,R1        ; counter value is all we want
    RET
;
; showleds -- show 1 to 6 LEDs
; Inputs:
;   R1 -- value [0:5], #LEDs to light - 1
; Clobbers R1
showleds:
    MOV      @R1,PORTB
    RET
;
; Initialise dice simulator
;
initialise:
    MOV     #0x3f,DDRB      ; full bright LEDs
    MOV     #0x02,TCCR0A    ; CTC mode
    MOV     #0x05,TCCR0B    ; 15,625 Hz
    MOV     #5,OCR0A        ; upper limit
    RET

If you don't want to use the assembler, here is the code ready to upload to ArdEx.

Loading TCCR0B with #0x01 will run the counter at 16 MHz. If you try that you'll see that the result is far from random.

I wasn't too happy with the diagrams I put together trying to illustrate the problem, but haven't been able to come up with anything better. I'll put them here so you don't have to go back to the video to scratch your head.

Firstly, the 16 MHz problem: Why 16 MHz
counter is not sampled randomly

And now the illustration of why slowing the clock down improves the result: Why 16
kHz is better

The key point is that, when the counter is running too fast, the time the button is pressed has no bearing on what will be read from the counter 30 µs later. The true factor is how ArdEx instructions and the counter clock are synchronised. If they're perfectly in step, it will always return the same value.

Slowing the counter down so that ArdEx instructions go past faster than counter values means that the moment the button is pressed is the deciding factor.

I described this as being related to the Nyquist frequency, but reading that article tells me I should have called it the Nyquist rate. Apologies.

If you look at section 19.9.2 of the 328/P datasheet, you'll see that 1024 is as large as we can make the divider. However we're in a position to slow it down more if we like. We could have the counter count to 11 instead of 5 (i.e. MOV #11,OCR0A), then after roll reads it out of TCNT0. divide the value by 2 (i.e. SR R1). This will take twice as long over each value. Or count to 23 and divide by 4 (two shifts) to take four times as long. This is doing much the same as the hardware divider, but using software.

Other Hardware Approaches

If you have some experience with microcontrollers, you might think that using an interrupt handler would be the way to go. I opted not to include interrupt handling in ArdEx because I doubted the value of it in an interpreted environment, but this would still be a problem even working in the 328/P's native assembly language. The interrupt handler would be triggered the moment the button was pressed, but the counter would continue to run while the return address, etc., was pushed, and the interrupt vector taken, and whatever initial code you had in the handler itself. I think you'd still want a divider of at least 8.

The hardware approach that really would work would be to set up input capture on the button press. It's designed for just this purpose. You program it to wait for a hardware event (e.g. signal at a pin going high when the button is pressed). It waits for that event and, when it detects it, it saves the counter value in a separate place where it won't change. An input event has caused it to capture the counter value.

It's fiddly though. If you look at the block diagram in section 4 of the datasheet you can see that the only input capture, ICP1, is the 16-bit counter TC 1 on pin PB0. That's one of the ThinkerShield LEDs. Not hard to hook it up on a breadboard, and you might enjoy experimenting with it, but it's outside the scope of what I'm up to here, which is mostly about software after all.

The Software Approach

Who would have guessed that the best solution of all would use a lookup table!

SWDice.adx

!MOV     #0x04,0            ; "graphics" for dice faces
!MOV     #0x12,1
!MOV     #0x15,2
!MOV     #0x1b,3
!MOV     #0x37,4
!MOV     #0x3f,5
!MOV     #1,0xf0            ; next dice value for "shake"
!MOV     #2,0xf1
!MOV     #3,0xf2
!MOV     #4,0xf3
!MOV     #5,0xf4
!MOV     #0,0xf5
set button,0x80
;
; Initialise
;
    MOV     #0x3f,DDRB      ; full bright LEDs
mainloop:
    MOV     #0,PORTB        ; all LEDs off
;
; Wait for a button press
;
waitbutton:
    CALL    shake           ; rattle the dice
    BIT     #button,PIND
    JEQ     waitbutton
;
; Get value of dice
;
    CALL    roll            ; roll out dice
;
; Display on LEDs
;
    CALL    showleds
;
; Wait for button release
;
    WAIT    5000            ; half second minimum display
waitup:
    BIT     #button,PIND
    JNE     waitup
    JMP     mainloop
;
; shake -- continue virtual shake of dice
; Inputs:
;   R1 -- seed
shake:
    ADD     #0xf0,R1
    MOV     @R1,R1
    RET
;
; roll -- get result of roll
; Inputs:
;   R1 -- dice seed value
; Outputs:
;   R1 -- dice value number [0:5]
roll:
    RET
;
; showleds -- show 1 to 6 LEDs
; Inputs:
;   R1 -- value [0:5], #LEDs to light - 1
; Clobbers R1
showleds:
    MOV      @R1,PORTB
    RET

Here is the assembled code ready to upload to ArdEx.

This version is very simple and, if you look carefully, you'll see there are 6 instructions updating the counter between checks of the button. That's about 60 µs, which means the counter is updating about 16,600 times per second; slightly faster than the hardware solution with a divider of 1024. It wouldn't be difficult to put the code in-line to eliminate the cost of CALL and RET and get it to 25,000 updates per second. Good enough as it is though. Humans aren't that fast.

There's no reason the lookup table has to be in numerical order. Then again, there's not much benefit in using a different order. The randomness doesn't come from the order of the segments, it comes from the human who stops the "wheel" turning.

In production code, you'd want to bracket shake, roll and the lookup table with clear comments warning other programmers (and yourself) that it is time critical code, and that they should resist any urge to "simplify" it.

Here is the C version of the code as shown in the video.

dice.c

static char nxtface[] = {1,2,3,4,5,0};
static char pattern[] = {0x04, 0x12, 0x15, 0x1b, 0x37, 0x3f};

#define BUTTON,0x80

/*
 * Simulate shaking of a dice.
 */
unsigned char
shake(unsigned char face)
{
        return(nxtface[face]);
}

/*
 * Simulate final position of rolled dice.
 */
unsigned char
roll(unsigned char face)
{
        return(face);
}

/*
 * Display a given number of LEDs
 */
void
showleds(unsigned char face)
{
        *PORTB = pattern[face];
}

void main()
{
        unsigned char face;

        face = 0;
        for (;;)
        {
                *PORTB = 0;
                if (!(*PIND & BUTTON))
                {
                        /*
                         * Still waiting for button.  Shake.
                         */
                        face = shake(face);
                        continue;
                }
                /*
                 * Get final dice value
                 */
                face = roll(face);
                /*
                 * Display on LEDs for a while then wait
                 * for button release.
                 */
                showleds(face);
                do_wait(5000);
                while ((*PIND & BUTTON))
                        ;
        }
}

This code is really conceptual. I haven't compiled it anywhere. To make it work as an Arduino Sketch, you'd have to restructure the main function as a sketch loop. You'd also need to get rid of the asterisks on the PORTB and PIND references. (It makes more sense to me that these are addresses, but I think the Arduino people wanted to hide scary ideas like pointers)

Homework

The video suggested two things you might try your hand at.

First was to modify the program so that the rolled value remains visible between rolls. You shouldn't find this terribly hard. I can think of a couple of approaches that only require minor tweaking, but you might come up with something smarter.

The other thing was to try out my claims about modularity: unplug the ThinkerShield, and wire up a button and 7-segment LED, and change the code so the dice value displays as a decimal digit (or you could use the seven line segments to mimic patterns on a proper dice). 7-segment LED schematic

The diagram shows the connections. Obviously you can hook it up through your breadboard. Whatever suits.

As mentioned in the video, you'll still want a lookup table for the bit patterns. Bits 0-5 will be the usual PORTB pins, but bit 6 needs to switch bit 5 of PORTD. I'll describe two ways you might go about this.

One way is to use two lookup tables, one for PORTB and another for PORTD. This makes it extremely easy, but is a bit extravagant with memory and, worse, writing to all 8 bits of PORTD assumes that nothing else is connected. E.g. the ThinkerShield has the button and buzzer on other PORTD pins. It's against the notion of modularity for code to go writing over things that are none of its business. It can be solved. Maybe you'd like to try. Here's a hint: think about the little ping-pong animation at the bottom of last video's material.

The other way is to use BIT to test bit 6 of the value looked up, and then use BIS or BIC on bit 5 of PORTD according to the result of that test. Not pretty, but not hard.

To test my claims about lessons here being applicable to a high-level language, here is an Arduino Sketch of the very first dice algorithm:

Dice6.ino

void setup()
{
	DDRB = 0x3f;	// full power to all LEDs
}

static const unsigned char pattern[] = {0x04, 0x12, 0x15, 0x1b, 0x37, 0x3f};


void loop()
{
	unsigned char face;

	for (;;)
	{
		face = 6;
		while (face != 0)
		{
			if (PIND &er; 0x80)
			{
				PORTB = pattern[face - 1];
				delay(1000);
				PORTB = 0;
				break;
			}
			face--;
		}
	}
}

If you build this in an Arduino development environment, with compiler optimisation enabled, it'll work with the ThinkerShield, and you'll find that it favours 6 in a similar way to Dice6.

Here are the figures I got:

123456
373631273732NOT optimised
403342383748optimised

It's worth asking why the unoptimised code doesn't show a bias.

One way we could have "fixed" Dice6 would be to add a WAIT 2 to the inner loop. That would have been like adding 20 extra instructions. The extra overhead of two extra instructions at the wrap point would have been hidden until we'd tallied up many thousands of samples.

The unoptimised Sketch doesn't insert a WAIT 2, but it leaves in a bunch of redundant instructions which will dilute the bias in just the same way.

This is something I've struck a few times over the years: you add a delay to some code and it suddenly starts behaving better. The problem hasn't been fixed, it has been hidden!

The reason the optimised code shows less bias than our ArdEx code is that it probably takes more 328/P instructions to do the task. If you recall, Dice6 had about a 50% bias towards 6 because the usual loop took 4 instructions, the wrap took 6. The optimised Sketch looks to have about a 20% bias towards 6. This would happen if, for example, the usual loop took 10 cycles, and the wrapping loop 12.

Conclusion

Were you surprised how interesting this dull problem turned out to be? I hope so. Here are a few points to think about.