rhialto: Me under a waterfall (Default)
[personal profile] rhialto
The SuperPET is a version of the Commodore PET which has been outfitted with an extra 64 kilobytes of memory and a 6809 processor. There is some nice software for it from Waterloo University in Canada (interpreters for various languages, such as Basic, Cobol, Fortran, Pascal and APL, and an 6809 assembler development system). Unfortunately, this software checks for the presence of an undocumented chip, the 6702. Since I wanted to add SuperPET emulation to VICE, something had to be done about emulating this chip.

Over the last months, several people on the cbm-hackers mailing list have helped to work on this issue. We used, over time, various approaches.

The presence of the dongle was checked by writing values to a memory location (which was part of the chip) and reading values back, and checking some complicated properties that must hold. (More about that below).

One approach was to identify the location of the dongle check routine and "disarm" it. This can be done by hooking in the emulation of the JSR opcode, and check if it happens to jump to the offending routine. If so, it doesn't actually do that, but instead just sets the registers to what they should be to "pass", and continue on. This approach is slightly complicated by the fact that there is bank switching involved (the relevant addresses may be the same but if the bank number is different, it isn't jumping to the same code) and that there are different calling methods used. There are different versions of the various language processors too, so this increases the size of the list of addresses/bank numbers to check. Nevertheless, this method works.

A more efficient approach is to patch the actual programs themselves, to simply remove the dongle check. This is also useful because there are SuperPETs in the wild that don't have their 6702 dongle chip any more. In the first SuperPET version, it was on a small separate board and may have gotten lost. At least one 6702 was destroyed for analysis by the visual6502 team. The dongle check routine can be stubbed out, but this procedure is complicated by the fact that there is a routine which checks the dongle check routine!

There were approaches from the other end too. Dave Roberts made a C version of the dongle check routine for analysis. He used a Monte-Carlo simulation to find values to return from the dongle that would pass the check. This method would simply ignore whatever was written to the chip, and simply regurgitate a fixed list when it was read. He was luckier with this than I was, since he found several working sequences, but his program gave me none at all. Therefore I tried a less random and more exhaustive search (with some branch-and-bound tricks of course) but that didn't really get anywhere.

In the mean time, Ruud Baltissen lent me his SuperPET, so I could try experiments on the real thing. This really helped a lot. I re-wrote Dave's dongle checker in Basic and ran it on the SuperPET in normal Commodore mode. This showed that the dongle isn't time sensitive. Also, by logging the values that were written to it and those that were returned, I discovered the pattern that new values were only produced after first an even and then an odd value was written to it. Various people checked that any extra values written to the chip, outside this pattern, didn't make a difference.

Furthermore, even the value of the even number didn't make a difference, only the odd value had an effect on the result. All this had the very fortunate effect of reducing the search space considerably! I did experiments with writing "minimal values" to the chip: a series of repeats of the same smallest odd values: a sequence of 1, and a sequence of (2+1), and of (4+1), (8+1), (16+1), (32+1), (64+1) and (128+1) (with even values in between). I posted the resulting sequences, and people quickly noticed that in principle, each output bit went through a periodic list of values: first n 1 values, then n 0 values (with n being different for each bit position) (except for 2 bit positions which always kept the same value). If the corresponding bit position in the input was an 1, however, then changes occurred: the un-changing output bits would change, and the changing ones would stop changing. This resulted in the theory that there are 8 counter registers in parallel, each responsible for 1 output bit position. The input value would make them count, or not.

This was a nice model, and it worked as long as you kept writing the same odd value into the chip. But if you put in different values, it broke down. William Levak was the first to come up with a model where the output of the counter register wasn't directly given back to the outside world, but it was routed via a flip-flop (a flip-flop remembers a single bit, and can change the value of its bit if you put a 1 on an input line). He also suspected a mechanism to re-set the counter register, if you would write a different odd value to the chip. However, I couldn't quite understand how exactly he meant that, so I simplified the model a bit when I woke up a bit early in the morning and didn't want to get out of bed yet. And after only a small tweak, to my surprise, that passed the dongle check routine! Suddenly I had the solution to the mystery of the internals of the 6702!

Although, not really the complete mystery: only the "how", but not the "why". The chip is doing something with shift registers, the check routine does something strange with multiplications, and remembering several values and checking some of their bits against other values. The connection between the two isn't exactly clear at first sight.

So I would like to present first the 6702 emulation code, then the 6702 verification code. If you see what properties of the chip output are actually being checked, I would like to know!

/* Internal state of the 6702 dongle */
#define DONGLE_MAGIC    (128 + 64 + 16 + 4 + 2) /* = 214 = $D6 = %1101 0110 */
static int shift[8];
static const int leftmost[8] = {
    1 << (6 - 1),       /*   1 The size of each shift register is 6, 3, 7... */
    1 << (3 - 1),       /*   2 and therefore those are also the periods of */
    1 << (7 - 1),       /*   4 the output bits. */
    1 << (8 - 1),       /*   8 */
    1 << (1 - 1),       /*  16 */
    1 << (3 - 1),       /*  32 */
    1 << (5 - 1),       /*  64 */
    1 << (2 - 1),       /* 128 */
};
static int val6702;
static int prevodd;
static int wantodd;

static void reset6702(void)
{
    int i;

    for (i = 0; i < 8; i++) {
        if ((1 << i) & (DONGLE_MAGIC | 1)) {
            shift[i] = leftmost[i];
        } else {
            shift[i] = 0;
        }
    }
    val6702 = DONGLE_MAGIC;
    prevodd = 1;
    wantodd = 0;
}

static inline
BYTE read6702(void)
{
    return val6702;
}

/*
 * Only the first odd value which is written after
 * an even value has an effect.
 *
 * Thanks to Ruud Baltissen, William Levak, Rob Clarke,
 * Kajtar Zsolt and Segher Boessenkool, from cbm-hackers,
 * for their contributions.
 * -Olaf Seibert.
 */
static inline
void write6702(BYTE input)
{
    if ((input & 1) == wantodd) {
        if (wantodd) {
            int i;
            int v = val6702;
            int changed = prevodd ^ input;
            int mask = 0x80;

            /* loop over all 8 output bits / shift registers */
            for (i = 7; i >= 0; i--, mask >>= 1) {
                /* If the input bit changed toggle leftmost bit */
                if (changed & mask) {
                    shift[i] ^= leftmost[i];
                }
                if (shift[i] & 1) {
                    v ^= mask;
                    shift[i] |= leftmost[i] << 1; /* wrap bit around */
                }
                shift[i] >>= 1;
            }

            prevodd = input;
            val6702 = v;
        }
        wantodd ^= 1;
    }
}

// **************************
// ***                    ***
// ***  More global data  ***
// ***                    ***
// **************************

int BigLoops;                  // Internal statistics.

unsigned char A;                      // 6809 accumulator A.
unsigned char B;                      // 6809 accumulator B.
unsigned char X;                      // 6809 register X.
unsigned char Y;                      // 6809 register Y.

unsigned short D;                     // Used within the simulator for the MUL instruction at address 9873.

unsigned char S0, S1, MulHI, MulLO;     // These reflect memory locations "0,S" , "1,S" , "4,S" and "5,S" respectively.

unsigned char StoreA;                 // Stack storage for the A accumulator.
unsigned char StoreX;                 // Temp. storage for the X register.

unsigned char TABLE[ 8 ] = {          // Data values following the "BSR" instruction at address 9895. Indexed as 0 to 7.
  0x10,
  0x80,
  0x22,
  0x00,
  0x40,
  0x00,
  0x04,
  0xA8
};

unsigned char XSTACK[ 7 ];            // Stack storage. Indexed as 0 to 6.

// ************************************
// ***                              ***
// ***  Main function of interest!  ***
// ***                              ***
// ************************************

int X6702( void ) {

  LOOKUP_INDEX = 0;                   // Start at the beginning of the lookup table. This table occurs just after the BSR 
                                      // instruction at address 9895.

  X = 7;                              // 9852: TFR S,X     ; My simulation implementation anyhow!

                                      // Irrelevant instructions here for the purpose of cracking the 6702...
                                      // 9854: LEAS -14,S    ; Allocate working space for subroutine.
                                      // 9856: LDA #$F0    ; }
                                      // 9858: STA 6,S    ;  }
                                      // 985A: DEC 6,S    ;   } Evil way of setting 6,S and 7,S to the
                                      // 985C: STA 7,S    ;  }  address of the 6702 chip ($EFE0).
                                      // 985E: ASL 7,S    ; }

  A = Read6702();                     // 9860: LDA [<$06,S]    ; Get a code value from the 6702. <<<<<<<<<<<<<<<<<<<<<<<<<

  A |= 0x11;                          // 9863: ORA #$11    ; make sure it is odd

  MulHI = A;                             // 9865: STA $04,S    ; for multiplication: first time it will be A squared
  MulLO = A;                             // 9867: STA $05,S    ; note that A is odd, so the product is odd too.

  B = 0x40;                           // 9869: LDB #$40

  do {                                // Big loop starting at address 986B.

    BigLoops++;

    printf("Start big loop (B = 0x%02X).\n",B);

    B <<= 1;                          // 986B: LSLB

    S1 = B;                           // 986C: STB $01,S

    Write6702( B );                    // 986E: STB [<$06,S]    ; Can ignore for the purpose of this simulation.
    //                                 ; the written value is always EVEN ($80 $40 $20 $10 $08 $04 $02)

    A = MulHI; B = MulLO;                  // 9871: LDD $04,S    ; Loads A and B in one go.

    printf("%02x * %02x -> ", A, B);
    D = A*B;                          // 9873: MUL        ; A:B = A * B.
    A = ((D >> 8) & 0xFF);            //            ; A is hi byte of result.
    B = ((D >> 0) & 0xFF);            //            ; B is lo byte of result.
    printf("%02x %02x -> ", A, B);

    MulLO = B;                           // 9874: STB $05,S    ; lo byte is always odd

    A |= MulLO;                          // 9876: ORA $05,S    ; hi |= lo, so hi will now always be odd too.

    MulHI = A;                           // 9878: STA $04,S
    printf("%02x %02x\n", MulHI, MulLO);

    B ^= 0xD7;                        // 987A: EORB #$D7

    Write6702( Read6702() << 1 );    // 987C: ASL [<$06,S]    ; Can ignore for the purpose of this simulation.
    //                                 ; the written value is always EVEN

    B ^= Read6702();                  // 987F: EORB [<$06,S]    ; Get a code value from the 6702. <<<<<<<<<<<<<<<<<<<<<<<<<

    A = MulLO;                           // 9882: LDA $05,S    ; hi | lo from multiplication, so ODD.

    Write6702( A );                   // 9884: STA [<$06,S]    ; Can ignore for the purpose of this simulation.
    //                                 ; the written value is always ODD

    Write6702( Read6702() << 1 );     // 9887: ASL [<$06,S]    ; Can ignore for the purpose of this simulation.
    //                                 ; the written value is always EVEN

    A = Read6702();                   // 988A: LDA [$06,S]    ; Get a code value from the 6702. <<<<<<<<<<<<<<<<<<<<<<<<<

    S0 = A;                           // 988D: STA ,S

    assert( X <= 7 );

    // ; The way I think this works (which seems to have got us all confused) is as follows. I often wondered why 14 bytes
    // ; of stack were allocated for this subroutine but only 8 appear to be used (0,S through 7,S). This leaves 8,S upwards
    // ; either unused - or used by something we have not discovered yet!
    // ; 
    // ; The "STB ,-X" instruction at address 988F pre-decrements register X and stores the value of register B into the
    // ; allocated stack memory. The new (already decremented) value of X is then stored away for later use at address 98B4.
    // ; This code is in the "outer loop".
    // ; 
    // ; The "inner loop" then works 'up' the stack checking the value just stored ***AND*** all the previous values stored
    // ; from the last iterations of the "outer loop". Hence the reason that the instruction "LDA ,X+" at address 98A8 can iterate
    // ; more than once - because multiple values have been stored in this area of the stack from the iteration of the outer
    // ; loop.
    // ; 
    // ; On the first   iteration of the outer loop, one   value  will have been stored and the inner loop will iterate once.
    // ; On the second  iteration of the outer loop, two   values will have been stored and the inner loop will iterate twice.
    // ; On the third   iteration of the outer loop, three values will have been stored and the inner loop will iterate three times.
    // ; On the fourth  iteration of the outer loop, four  values will have been stored and the inner loop will iterate four  times.
    // ; On the fifth   iteration of the outer loop, five  values will have been stored and the inner loop will iterate five  times.
    // ; On the sixth   iteration of the outer loop, six   values will have been stored and the inner loop will iterate six   times.
    // ; On the seventh iteration of the outer loop, seven values will have been stored and the inner loop will iterate seven times.

    XSTACK[ --X ] = B;                // 988F: STB ,-X    ; Store the value of B on the stack and move X down in memory.
    //printf("XSTACK = %02X %02X %02X %02X %02X %02X %02X).\n",XSTACK[0],XSTACK[1],XSTACK[2],XSTACK[3],XSTACK[4],XSTACK[5],XSTACK[6]);

    StoreX = X;                       // 9891: STX $02,S    ; Save the (new) lower value of X for later use.

    B = S1;                           // 9893: LDB $01,S    

                                      // 9895: BSR >$989F    ; Not required for simulation...

    // ; Discontinuity in listing. This address (just following the BSR) is occupied by a look-up table of 8 (?) bytes.
    // ; The called subroutine picks up the return address from the stack and uses it to index the look-up table. The
    // ; called subroutine never returns back to the instruction stream following the BSR.

    Y = 0;                            // 989F: LDY ,S++    ; Well, a hacked version of it! 
                                      //                      ; Start at the beginning of TABLE[]

    do {                              // Small loop starting at address 98A2.

      //printf("Start small loop (S1=0x%02X TABLE[Y]=0x%02X B=0x%02x).\n",S1,TABLE[Y],B);

      assert (A == 0 || A == S0);
      A |= S0;                        // 98A2: ORA ,S        ; same as A = S0 !! because S0 == $00 || S0 == A

      assert( Y <= 7 );

      A &= TABLE[ Y ];                // 98A4: ANDA ,Y    ; Index TABLE[] value.

      StoreA = A;                     // 98A6: PSHS A        ; Save A on the stack at this point (for use at 98AC).

      A = XSTACK[ X++ ];              // 98A8: LDA ,X+    ; Recover a value stored in the stack and bump up the index.    

      A &= TABLE[ Y++ ];              // 98AA: ANDA ,Y+    ; Index the same TABLE[] value as above and bump up the table index for next time.

      printf("Check Y=%d TABLE[Y]=%02X S0=%02X A=0x%02X StoreA=0x%02X.\n",Y-1,TABLE[Y-1],S0,A,StoreA);

      A -= StoreA;                    // 98AC: SUBA ,S+    ; Is the 6702 returning the correct codes?

      printf("%02X -> %02X == %02X <- %02X under mask %02X\n",
          XSTACK[X-1], XSTACK[X-1] & TABLE[Y-1], 
                                S0 & TABLE[Y-1], S0,
          TABLE[Y-1]);
      if( A != 0x00 ) {               // 98AE: BNE >$98BA    ; Branch taken if the answer is no!

        printf("6702 check FAIL 0x%02X.\n",A);

        return A;               // *** NO NO NO NO ***

      } // End if

      S1 <<= 1;                       // 98B0: ASL $01,S

    } while( S1 != 0x00 );            // 98B2: BNE >$98A2
  
    X = StoreX;                       // 98B4: LDX $02,S    ; Recover the 'reduced' value of X indexing into the stack.

    B >>= 2;                          // 98B6: LSRB
                                      // 98B7: LSRB

  } while( B != 0x00 );               // 98B8: BNE >$986B

  // ****************
  // ***          ***
  // ***  WHOOP!  ***
  // ***          ***
  // ****************

  printf("!!! 6702 check PASS !!!\n");

  return A | B;

} // End of function X6702


A possible hint?

Date: 2012-12-30 01:14 am (UTC)
From: (Anonymous)
The mask table appears to indicate when the "same bit" will come back around on each of the counters. The output of the multiply is irrelevant. You can replace the product generated in D with any odd 16-bit value and the check will still succeed. Try it.

--Joe Zbiciak (intvnut at gmail)

Profile

rhialto: Me under a waterfall (Default)
rhialto

February 2024

S M T W T F S
    123
45678910
11121314151617
18192021 222324
2526272829  

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios