# $Id: 64doc,v 1.8 1994/06/03 19:50:04 jopi Exp $ # # This file is part of Commodore 64 emulator # and Program Development System. # # See README for copyright notice # # This file contains documentation for 6502/6510/8500/8502 instruction set. # # # Written by # John West (john@ucc.gu.uwa.edu.au) # Marko Mäkelä (Marko.Makela@HUT.FI) # # # $Log: 64doc,v $ # Revision 1.8 1994/06/03 19:50:04 jopi # Patchlevel 2 # # Revision 1.7 1994/04/15 13:07:04 jopi # 65xx Register descriptions added # # Revision 1.6 1994/02/18 16:09:36 jopi # # Revision 1.5 1994/01/26 16:08:37 jopi # X64 version 0.2 PL 1 # # Revision 1.4 1993/11/10 01:55:34 jopi # # Revision 1.3 93/06/21 13:37:18 jopi # X64 version 0.2 PL 0 # # Revision 1.2 93/06/21 13:07:15 jopi # *** empty log message *** # # Note: To extract the uuencoded ML programs in this article most easily you may use e.g. "uud" by Edwin Kremer <edwin@zlotty>, which extracts them all at once. Documentation for the NMOS 65xx/85xx Instruction Set 6510 Instructions by Addressing Modes 6502 Registers 6510/8502 Undocumented Commands Register selection for load and store Decimal mode in NMOS 6500 series 6510 features Different CPU types 6510 Instruction Timing How Real Programmers Acknowledge Interrupts Memory Management Autostart Code Notes References 6510 Instructions by Addressing Modes off- ++++++++++ Positive ++++++++++ ---------- Negative ---------- set 00 20 40 60 80 a0 c0 e0 mode +00 BRK JSR RTI RTS NOP* LDY CPY CPX Impl/immed +01 ORA AND EOR ADC STA LDA CMP SBC (indir,x) +02 t t t t NOP*t LDX NOP*t NOP*t ? /immed +03 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* (indir,x) +04 NOP* BIT NOP* NOP* STY LDY CPY CPX Zeropage +05 ORA AND EOR ADC STA LDA CMP SBC Zeropage +06 ASL ROL LSR ROR STX LDX DEC INC Zeropage +07 SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Zeropage +08 PHP PLP PHA PLA DEY TAY INY INX Implied +09 ORA AND EOR ADC NOP* LDA CMP SBC Immediate +0a ASL ROL LSR ROR TXA TAX DEX NOP Accu/impl +0b ANC** ANC** ASR** ARR** ANE** LXA** SBX** SBC* Immediate +0c NOP* BIT JMP JMP () STY LDY CPY CPX Absolute +0d ORA AND EOR ADC STA LDA CMP SBC Absolute +0e ASL ROL LSR ROR STX LDX DEC INC Absolute +0f SLO* RLA* SRE* RRA* SAX* LAX* DCP* ISB* Absolute +10 BPL BMI BVC BVS BCC BCS BNE BEQ Relative +11 ORA AND EOR ADC STA LDA CMP SBC (indir),y +12 t t t t t t t t ? +13 SLO* RLA* SRE* RRA* SHA** LAX* DCP* ISB* (indir),y +14 NOP* NOP* NOP* NOP* STY LDY NOP* NOP* Zeropage,x +15 ORA AND EOR ADC STA LDA CMP SBC Zeropage,x +16 ASL ROL LSR ROR STX y) LDX y) DEC INC Zeropage,x +17 SLO* RLA* SRE* RRA* SAX* y) LAX* y) DCP* ISB* Zeropage,x +18 CLC SEC CLI SEI TYA CLV CLD SED Implied +19 ORA AND EOR ADC STA LDA CMP SBC Absolute,y +1a NOP* NOP* NOP* NOP* TXS TSX NOP* NOP* Implied +1b SLO* RLA* SRE* RRA* SHS** LAS** DCP* ISB* Absolute,y +1c NOP* NOP* NOP* NOP* SHY** LDY NOP* NOP* Absolute,x +1d ORA AND EOR ADC STA LDA CMP SBC Absolute,x +1e ASL ROL LSR ROR SHX**y) LDX y) DEC INC Absolute,x +1f SLO* RLA* SRE* RRA* SHA**y) LAX* y) DCP* ISB* Absolute,x ROR intruction is available on MC650x microprocessors after June, 1976. Legend: t Jams the machine *t Jams very rarely * Undocumented command ** Unusual operation y) indexed using Y instead of X () indirect instead of absolute Note that the NOP instructions do have other addressing modes than the implied addressing. The NOP instruction is just like any other load instruction, except it does not store the result anywhere nor affects the flags. 6502 Registers The NMOS 65xx processors are not ruined with too many registers. In addition to that, the registers are mostly 8-bit. Here is a brief description of each register: PC Program Counter This register points the address from which the next instruction byte (opcode or parameter) will be fetched. Unlike other registers, this one is 16 bits in length. The low and high 8-bit halves of the register are called PCL and PCH, respectively. The Program Counter may be read by pushing its value on the stack. This can be done either by jumping to a subroutine or by causing an interrupt. S Stack pointer The NMOS 65xx processors have 256 bytes of stack memory, ranging from $0100 to $01FF. The S register is a 8-bit offset to the stack page. In other words, whenever anything is being pushed on the stack, it will be stored to the address $0100+S. The Stack pointer can be read and written by transfering its value to or from the index register X (see below) with the TSX and TXS instructions. P Processor status This 8-bit register stores the state of the processor. The bits in this register are called flags. Most of the flags have something to do with arithmetic operations. The P register can be read by pushing it on the stack (with PHP or by causing an interrupt). If you only need to read one flag, you can use the branch instructions. Setting the flags is possible by pulling the P register from stack or by using the flag set or clear instructions. Following is a list of the flags, starting from the 8th bit of the P register (bit 7, value $80): N Negative flag This flag will be set after any arithmetic operations (when any of the registers A, X or Y is being loaded with a value). Generally, the N flag will be copied from the topmost bit of the register being loaded. Note that TXS (Transfer X to S) is not an arithmetic operation. Also note that the BIT instruction affects the Negative flag just like arithmetic operations. Finally, the Negative flag behaves differently in Decimal operations (see description below). V oVerflow flag Like the Negative flag, this flag is intended to be used with 8-bit signed integer numbers. The flag will be affected by addition and subtraction, the instructions PLP, CLV and BIT, and the hardware signal -SO. Note that there is no SEV instruction, even though the MOS engineers loved to use East European abbreviations, like DDR (Deutsche Demokratische Republik vs. Data Direction Register). (The Russian abbreviation for their former trade association COMECON is SEV.) The -SO (Set Overflow) signal is available on some processors, at least the 6502, to set the V flag. This enables response to an I/O activity in equal or less than three clock cycles when using a BVC instruction branching to itself ($50 $FE). The CLV instruction clears the V flag, and the PLP and BIT instructions copy the flag value from the bit 6 of the topmost stack entry or from memory. After a binary addition or subtraction, the V flag will be set on a sign overflow, cleared otherwise. What is a sign overflow? For instance, if you are trying to add 123 and 45 together, the result (168) does not fit in a 8-bit signed integer (upper limit 127 and lower limit -128). Similarly, adding -123 to -45 causes the overflow, just like subtracting -45 from 123 or 123 from -45 would do. Like the N flag, the V flag will not be set as expected in the Decimal mode. Later in this document is a precise operation description. A common misbelief is that the V flag could only be set by arithmetic operations, not cleared. 1 unused flag To the current knowledge, this flag is always 1. B Break flag This flag is used to distinguish software (BRK) interrupts from hardware interrupts (IRQ or NMI). The B flag is always set except when the P register is being pushed on stack when jumping to an interrupt routine to process only a hardware interrupt. The official NMOS 65xx documentation claims that the BRK instruction could only cause a jump to the IRQ vector ($FFFE). However, if an NMI interrupt occurs while executing a BRK instruction, the processor will jump to the NMI vector ($FFFA), and the P register will be pushed on the stack with the B flag set. D Decimal mode flag This flag is used to select the (Binary Coded) Decimal mode for addition and subtraction. In most applications, the flag is zero. The Decimal mode has many oddities, and it operates differently on CMOS processors. See the description of the ADC, SBC and ARR instructions below. I Interrupt disable flag This flag can be used to prevent the processor from jumping to the IRQ handler vector ($FFFE) whenever the hardware line -IRQ is active. The flag will be automatically set after taking an interrupt, so that the processor would not keep jumping to the interrupt routine if the -IRQ signal remains low for several clock cycles. Z Zero flag The Zero flag will be affected in the same cases than the Negative flag. Generally, it will be set if an arithmetic register is being loaded with the value zero, and cleared otherwise. The flag will behave differently in Decimal operations. C Carry flag This flag is used in additions, subtractions, comparisons and bit rotations. In additions and subtractions, it acts as a 9th bit and lets you to chain operations to calculate with bigger than 8-bit numbers. When subtracting, the Carry flag is the negative of Borrow: if an overflow occurs, the flag will be clear, otherwise set. Comparisons are a special case of subtraction: they assume Carry flag set and Decimal flag clear, and do not store the result of the subtraction anywhere. There are four kinds of bit rotations. All of them store the bit that is being rotated off to the Carry flag. The left shifting instructions are ROL and ASL. ROL copies the initial Carry flag to the lowmost bit of the byte; ASL always clears it. Similarly, the ROR and LSR instructions shift to the right. A Accumulator The accumulator is the main register for arithmetic and logic operations. Unlike the index registers X and Y, it has a direct connection to the Arithmetic and Logic Unit (ALU). This is why many operations are only available for the accumulator, not the index registers. X Index register X This is the main register for addressing data with indices. It has a special addressing mode, indexed indirect, which lets you to have a vector table on the zero page. Y Index register Y The Y register has the least operations available. On the other hand, only it has the indirect indexed addressing mode that enables access to any memory place without having to use self-modifying code. 6510/8502 Undocumented Commands -- A brief explanation about what may happen while using don't care states. ANE $8B A = (A | #$EE) & X & #byte same as A = ((A & #$11 & X) | ( #$EE & X)) & #byte In real 6510/8502 the internal parameter #$11 may occasionally be #$10, #$01 or even #$00. This occurs when the video chip starts DMA between the opcode fetch and the parameter fetch of the instruction. The value probably depends on the data that was left on the bus by the VIC-II. LXA $AB C=Lehti: A = X = ANE Alternate: A = X = (A & #byte) TXA and TAX have to be responsible for these. SHA $93,$9F Store (A & X & (ADDR_HI + 1)) SHX $9E Store (X & (ADDR_HI + 1)) SHY $9C Store (Y & (ADDR_HI + 1)) SHS $9B SHA and TXS, where X is replaced by (A & X). Note: The value to be stored is copied also to ADDR_HI if page boundary is crossed. SBX $CB Carry and Decimal flags are ignored but the Carry flag will be set in substraction. This is due to the CMP command, which is executed instead of the real SBC. ARR $6B This instruction first performs an AND between the accumulator and the immediate parameter, then it shifts the accumulator to the right. However, this is not the whole truth. See the description below. Many undocumented commands do not use AND between registers, the CPU just throws the bytes to a bus simultaneously and lets the open-collector drivers perform the AND. I.e. the command called 'SAX', which is in the STORE section (opcodes $A0...$BF), stores the result of (A & X) by this way. More fortunate is its opposite, 'LAX' which just loads a byte simultaneously into both A and X. $6B ARR This instruction seems to be a harmless combination of AND and ROR at first sight, but it turns out that it affects the V flag and also has a special kind of decimal mode. This is because the instruction has inherited some properties of the ADC instruction ($69) in addition to the ROR ($6A). In Binary mode (D flag clear), the instruction effectively does an AND between the accumulator and the immediate parameter, and then shifts the accumulator to the right, copying the C flag to the 8th bit. It sets the Negative and Zero flags just like the ROR would. The ADC code shows up in the Carry and oVerflow flags. The C flag will be copied from the bit 6 of the result (which doesn't seem too logical), and the V flag is the result of an Exclusive OR operation between the bit 6 and the bit 5 of the result. This makes sense, since the V flag will be normally set by an Exclusive OR, too. In Decimal mode (D flag set), the ARR instruction first performs the AND and ROR, just like in Binary mode. The N flag will be copied from the initial C flag, and the Z flag will be set according to the ROR result, as expected. The V flag will be set if the bit 6 of the accumulator changed its state between the AND and the ROR, cleared otherwise. Now comes the funny part. If the low nybble of the AND result, incremented by its lowmost bit, is greater than 5, the low nybble in the ROR result will be incremented by 6. The low nybble may overflow as a consequence of this BCD fixup, but the high nybble won't be adjusted. The high nybble will be BCD fixed in a similar way. If the high nybble of the AND result, incremented by its lowmost bit, is greater than 5, the high nybble in the ROR result will be incremented by 6, and the Carry flag will be set. Otherwise the C flag will be cleared. To help you understand this description, here is a C routine that illustrates the ARR operation in Decimal mode: unsigned A, /* Accumulator */ AL, /* low nybble of accumulator */ AH, /* high nybble of accumulator */ C, /* Carry flag */ Z, /* Zero flag */ V, /* oVerflow flag */ N, /* Negative flag */ t, /* temporary value */ s; /* value to be ARRed with Accumulator */ t = A & s; /* Perform the AND. */ AH = t >> 4; /* Separate the high */ AL = t & 15; /* and low nybbles. */ N = C; /* Set the N and */ Z = !(A = (t >> 1) | (C << 7)); /* Z flags traditionally */ V = (t ^ A) & 64; /* and V flag in a weird way. */ if (AL + (AL & 1) > 5) /* BCD "fixup" for low nybble. */ A = (A & 0xF0) | ((A + 6) & 0xF); if (C = AH + (AH & 1) > 5) /* Set the Carry flag. */ A = (A + 0x60) & 0xFF; /* BCD "fixup" for high nybble. */ $CB SBX X <- (A & X) - Immediate The 'SBX' ($CB) may seem to be very complex operation, even though it is a combination of the subtraction of accumulator and parameter, as in the 'CMP' instruction, and the command 'DEX'. As a result, both A and X are connected to ALU but only the subtraction takes place. Since the comparison logic was used, the result of subtraction should be normally ignored, but the 'DEX' now happily stores to X the value of (A & X) - Immediate. That is why this instruction does not have any decimal mode, and it does not affect the V flag. Also Carry flag will be ignored in the subtraction but set according to the result. Proof: begin 644 vsbx M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```*D`H#V1*Z`_D2N@09$KJ0>% M^QBE^VEZJ+$KH#F1*ZD`2"BI`*(`RP`(:-B@.5$K*4#P`E@`H#VQ*SAI`)$K JD-Z@/[$K:0"1*Y#4J2X@TO\XH$&Q*VD`D2N0Q,;[$+188/_^]_:_OK>V ` end and begin 644 sbx M`0@9$,D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI`*!-D2N@3Y$KH%&1*ZD# MA?L8I?M*2)`#J1@LJ3B@29$K:$J0`ZGX+*G8R)$K&/BXJ?2B8\L)AOP(:(7] MV#B@3;$KH$\Q*Z!1\2L(1?SP`0!H1?TIM]#XH$VQ*SAI`)$KD,N@3[$K:0"1 9*Y#!J2X@TO\XH%&Q*VD`D2N0L<;[$))88-#X ` end These test programs show if your machine is compatible with ours regarding the opcode $CB. The first test, vsbx, proves that SBX does not affect the V flag. The latter one, sbx, proves the rest of our theory. The vsbx test tests 33554432 SBX combinations (16777216 different A, X and Immediate combinations, and two different V flag states), and the sbx test doubles that amount (16777216*4 D and C flag combinations). Both tests have run successfully on a C64 and a Vic20. They ought to run on C16, +4 and the PET series as well. The tests stop with BRK, if the opcode $CB does not work as expected. Successful operation ends in RTS. As the tests are very slow, they print dots on the screen while running so that you know that the machine has not jammed. On computers running at 1 MHz, the first test prints approximately one dot every four seconds and a total of 2048 dots, whereas the second one prints half that amount, one dot every seven seconds. If the tests fail on your machine, please let us know your processor's part number and revision. If possible, save the executable (after it has stopped with BRK) under another name and send it to us so that we know at which stage the program stopped. The following program is a Commodore 64 executable that Marko Mäkelä developed when trying to find out how the V flag is affected by SBX. (It was believed that the SBX affects the flag in a weird way, and this program shows how SBX sets the flag differently from SBC.) You may find the subroutine at $C150 useful when researching other undocumented instructions' flags. Run the program in a machine language monitor, as it makes use of the BRK instruction. The result tables will be written on pages $C2 and $C3. begin 644 sbx-c100 M`,%XH`",#L&,$,&,$L&XJ8*B@LL7AOL(:(7\N#BM#L$M$,'M$L$(Q?OP`B@` M:$7\\`,@4,'N#L'0U.X0P=#/SB#0[A+!T,<``````````````)BJ\!>M#L$M L$,'=_\'0":T2P=W_PM`!8,K0Z:T.P2T0P9D`PID`!*T2P9D`PYD`!<C0Y``M ` end Other undocumented instructions usually cause two preceding opcodes being executed. However 'NOP' seems to completely disappear from 'SBC' code $EB. The most difficult to comprehend are the rest of the instructions located on the '$0B' line. All the instructions located at the positive (left) side of this line should rotate either memory or the accumulator, but the addressing mode turns out to be immediate! No problem. Just read the operand, let it be ANDed with the accumulator and finally use accumulator addressing mode for the instructions above them. RELIGION_MODE_ON /* This part of the document is not accurate. You can read it as a fairy tale, but do not count on it when performing your own measurements. */ The rest two instructions on the same line, called 'ANE' and 'LXA' ($8B and $AB respectively) often give quite unpredictable results. However, the most usual operation is to store ((A | #$ee) & X & #$nn) to accumulator. Note that this does not work reliably in a real 64! In the Commodore 128 the opcode $8B uses values 8C, CC, EE, and occasionally 0C and 8E for the OR instead of EE,EF,FE and FF used in the C64. With a C128 running at 2 MHz #$EE is always used. Opcode $AB does not cause this OR taking place on 8502 while 6510 always performs it. Note that this behaviour depends on processor and/or video chip revision. Let's take a closer look at $8B (6510). A <- X & D & (A | VAL) where VAL comes from this table: X high D high D low VAL even even --- $EE (1) even odd --- $EE odd even --- $EE odd odd 0 $EE odd odd not 0 $FE (2) (1) If the bottom 2 bits of A are both 1, then the LSB of the result may be 0. The values of X and D are different every time I run the test. This appears to be very rare. (2) VAL is $FE most of the time. Sometimes it is $EE - it seems to be random, not related to any of the data. This is much more common than (1). In decimal mode, VAL is usually $FE. Two different functions have been discovered for LXA, opcode $AB. One is A = X = ANE (see above) and the other, encountered with 6510 and 8502, is less complicated A = X = (A & #byte). However, according to what is reported, the version altering only the lowest bits of each nybble seems to be more common. What happens, is that $AB loads a value into both A and X, ANDing the low bit of each nybble with the corresponding bit of the old A. However, there are exceptions. Sometimes the low bit is cleared even when A contains a '1', and sometimes other bits are cleared. The exceptions seem random (they change every time I run the test). Oops - that was in decimal mode. Much the same with D=0. What causes the randomness? Probably it is that it is marginal logic levels - when too much wired-anding goes on, some of the signals get very close to the threshold. Perhaps we're seeing some of them step over it. The low bit of each nybble is special, since it has to cope with carry differently (remember decimal mode). We never see a '0' turn into a '1'. Since these instructions are unpredictable, they should not be used. There is still very strange instruction left, the one named SHA/X/Y, which is the only one with only indexed addressing modes. Actually, the commands 'SHA', 'SHX' and 'SHY' are generated by the indexing algorithm. While using indexed addressing, effective address for page boundary crossing is calculated as soon as possible so it does not slow down operation. As a result, in the case of SHA/X/Y, the address and data are processed at the same time making AND between them to take place. Thus, the value to be stored by SAX, for example, is in fact (A & X & (ADDR_HI + 1)). On page boundary crossing the same value is copied also to high byte of the effective address. RELIGION_MODE_OFF Register selection for load and store bit1 bit0 A X Y 0 0 x 0 1 x 1 0 x 1 1 x x So, A and X are selected by bits 1 and 0 respectively, while ~(bit1|bit0) enables Y. Indexing is determined by bit4, even in relative addressing mode, which is one kind of indexing. Lines containing opcodes xxx000x1 (01 and 03) are treated as absolute after the effective address has been loaded into CPU. Zeropage,y and Absolute,y (codes 10x1 x11x) are distinquished by bit5. Decimal mode in NMOS 6500 series Most sources claim that the NMOS 6500 series sets the N, V and Z flags unpredictably when performing addition or subtraction in decimal mode. Of course, this is not true. While testing how the flags are set, I also wanted to see what happens if you use illegal BCD values. ADC works in Decimal mode in a quite complicated way. It is amazing how it can do that all in a single cycle. Here's a C code version of the instruction: unsigned A, /* Accumulator */ AL, /* low nybble of accumulator */ AH, /* high nybble of accumulator */ C, /* Carry flag */ Z, /* Zero flag */ V, /* oVerflow flag */ N, /* Negative flag */ s; /* value to be added to Accumulator */ AL = (A & 15) + (s & 15) + C; /* Calculate the lower nybble. */ AH = (A >> 4) + (s >> 4) + (AL > 15); /* Calculate the upper nybble. */ if (AL > 9) AL += 6; /* BCD fixup for lower nybble. */ Z = ((A + s + C) & 255 != 0); /* Zero flag is set just like in Binary mode. */ /* Negative and Overflow flags are set with the same logic than in Binary mode, but after fixing the lower nybble. */ N = (AH & 8 != 0); V = ((AH << 4) ^ A) & 128 && !((A ^ s) & 128); if (AH > 9) AH += 6; /* BCD fixup for upper nybble. */ /* Carry is the only flag set after fixing the result. */ C = (AH > 15); A = ((AH << 4) | (AL & 15)) & 255; The C flag is set as the quiche eaters expect, but the N and V flags are set after fixing the lower nybble but before fixing the upper one. They use the same logic than binary mode ADC. The Z flag is set before any BCD fixup, so the D flag does not have any influence on it. Proof: The following test program tests all 131072 ADC combinations in Decimal mode, and aborts with BRK if anything breaks this theory. If everything goes well, it ends in RTS. begin 644 dadc M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'BI&* A/N$_$B@+)$KH(V1 M*Q@(I?PI#X7]I?LI#V7]R0J0 FD%J"D/A?VE^RGP9?PI\ C $) ":0^JL @H ML ?)H) &""@X:5\X!?V%_0AH*3W@ ! ""8"HBD7[$ JE^T7\, 28"4"H**7[ M9?S0!)@) J@8N/BE^V7\V A%_= G:(3]1?W0(.;[T(?F_-"#:$D8\ )88*D= 0&&4KA?NI &4LA?RI.&S[ A% end All programs in this chapter have been successfully tested on a Vic20 and a Commodore 64 and a Commodore 128D in C64 mode. They should run on C16, +4 and on the PET series as well. If not, please report the problem to Marko Mäkelä. Each test in this chapter should run in less than a minute at 1 MHz. SBC is much easier. Just like CMP, its flags are not affected by the D flag. Proof: begin 644 dsbc-cmp-flags M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'B@ (3[A/RB XH8:66HL2N@ M09$KH$R1*XII::BQ*Z!%D2N@4)$K^#BXI?OE_-@(:(7].+BE^^7\"&A%_? ! 5 .;[T./F_-#?RA"_8!@X&#CEY<7% end The only difference in SBC's operation in decimal mode from binary mode is the result-fixup: unsigned A, /* Accumulator */ AL, /* low nybble of accumulator */ AH, /* high nybble of accumulator */ C, /* Carry flag */ Z, /* Zero flag */ V, /* oVerflow flag */ N, /* Negative flag */ s; /* value to be added to Accumulator */ AL = (A & 15) - (s & 15) - !C; /* Calculate the lower nybble. */ if (AL & 16) AL -= 6; /* BCD fixup for lower nybble. */ AH = (A >> 4) - (s >> 4) - (AL & 16); /* Calculate the upper nybble. */ if (AH & 16) AH -= 6; /* BCD fixup for upper nybble. */ /* The flags are set just like in Binary mode. */ C = (A - s - !C) & 256 != 0; Z = (A - s - !C) & 255 != 0; V = ((A - s - !C) ^ s) & 128 && (A ^ s) & 128; N = (A - s - !C) & 128 != 0; A = ((AH << 4) | (AL & 15)) & 255; Again Z flag is set before any BCD fixup. The N and V flags are set at any time before fixing the high nybble. The C flag may be set in any phase. Decimal subtraction is easier than decimal addition, as you have to make the BCD fixup only when a nybble overflows. In decimal addition, you had to verify if the nybble was greater than 9. The processor has an internal "half carry" flag for the lower nybble, used to trigger the BCD fixup. When calculating with legal BCD values, the lower nybble cannot overflow again when fixing it. So, the processor does not handle overflows while performing the fixup. Similarly, the BCD fixup occurs in the high nybble only if the value overflows, i.e. when the C flag will be cleared. Because SBC's flags are not affected by the Decimal mode flag, you could guess that CMP uses the SBC logic, only setting the C flag first. But the SBX instruction shows that CMP also temporarily clears the D flag, although it is totally unnecessary. The following program, which tests SBC's result and flags, contains the 6502 version of the pseudo code example above. begin 644 dsbc M 0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@ 'BI&* A/N$_$B@+)$KH':1 M*S@(I?PI#X7]I?LI#^7]L /I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL KI7RBP M#ND/.+ )*+ &Z0^P NE?A/T%_87]*+BE^^7\"&BH.+CXI?OE_-@(1?W0FVB$ 8_47]T)3F^]">YOS0FFA)&- $J3C0B%A@ end Obviously the undocumented instructions RRA (ROR+ADC) and ISB (INC+SBC) have inherited also the decimal operation from the official instructions ADC and SBC. The program droradc proves this statement for ROR, and the dincsbc test proves this for ISB. Finally, dincsbc-deccmp proves that ISB's and DCP's (DEC+CMP) flags are not affected by the D flag. begin 644 droradc M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH(V1 M*S@(I?PI#X7]I?LI#V7]R0J0`FD%J"D/A?VE^RGP9?PI\`C`$)`":0^JL`@H ML`?)H)`&""@X:5\X!?V%_0AH*3W@`!`""8"HBD7[$`JE^T7\,`28"4"H**7[ M9?S0!)@)`J@XN/BE^R;\9_S8"$7]T"=HA/U%_=`@YOO0A>;\T(%H21CP`EA@ 2J1T892N%^ZD`92R%_*DX;/L` ` end begin 644 dincsbc M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'BI&*``A/N$_$B@+)$KH':1 M*S@(I?PI#X7]I?LI#^7]L`/I!1@I#ZBE_"GPA?VE^RGP"#CE_2GPL`KI7RBP M#ND/.+`)*+`&Z0^P`NE?A/T%_87]*+BE^^7\"&BH.+CXI?O&_.?\V`A%_="9 ::(3]1?W0DN;[T)SF_-"8:$D8T`2I.-"&6&#\ ` end begin 644 dincsbc-deccmp M`0@9",D'GL(H-#,IJC(U-JS"*#0T*:HR-@```'B@`(3[A/RB`XH8:7>HL2N@ M3Y$KH%R1*XII>ZBQ*Z!3D2N@8)$KBFE_J+$KH%61*Z!BD2OX.+BE^^;\Q_S8 L"&B%_3BXI?OF_,?\"&A%_?`!`.;[T-_F_-#;RA"M8!@X&#CFYL;&Q\?GYP#8 ` end 6510 features o PHP always pushes the Break (B) flag as a `1' to the stack. Jukka Tapanimäki claimed in C=lehti issue 3/89, on page 27 that the processor makes a logical OR between the status register's bit 4 and the bit 8 of the stack pointer register (which is always 1). He did not give any reasons for this argument, and has refused to clarify it afterwards. Well, this was not the only error in his article... o Indirect addressing modes do not handle page boundary crossing at all. When the parameter's low byte is $FF, the effective address wraps around and the CPU fetches high byte from $xx00 instead of $xx00+$0100. E.g. JMP ($01FF) fetches PCL from $01FF and PCH from $0100, and LDA ($FF),Y fetches the base address from $FF and $00. o Indexed zero page addressing modes never fix the page address on crossing the zero page boundary. E.g. LDX #$01 : LDA ($FF,X) loads the effective address from $00 and $01. o The processor always fetches the byte following a relative branch instruction. If the branch is taken, the processor reads then the opcode from the destination address. If page boundary is crossed, it first reads a byte from the old page from a location that is bigger or smaller than the correct address by one page. o If you cross a page boundary in any other indexed mode, the processor reads an incorrect location first, a location that is smaller by one page. o Read-Modify-Write instructions write unmodified data, then modified (so INC effectively does LDX loc;STX loc;INX;STX loc) o -RDY is ignored during writes (This is why you must wait 3 cycles before doing any DMA -- the maximum number of consecutive writes is 3, which occurs during interrupts except -RESET.) o Some undefined opcodes may give really unpredictable results. o All registers except the Program Counter remain unmodified after -RESET. (This is why you must preset D and I flags in the RESET handler.) Different CPU types The Rockwell data booklet 29651N52 (technical information about R65C00 microprocessors, dated October 1984), lists the following differences between NMOS R6502 microprocessor and CMOS R65C00 family: 1. Indexed addressing across page boundary. NMOS: Extra read of invalid address. CMOS: Extra read of last instruction byte. 2. Execution of invalid op codes. NMOS: Some terminate only by reset. Results are undefined. CMOS: All are NOPs (reserved for future use). 3. Jump indirect, operand = XXFF. NMOS: Page address does not increment. CMOS: Page address increments and adds one additional cycle. 4. Read/modify/write instructions at effective address. NMOS: One read and two write cycles. CMOS: Two read and one write cycle. 5. Decimal flag. NMOS: Indeterminate after reset. CMOS: Initialized to binary mode (D=0) after reset and interrupts. 6. Flags after decimal operation. NMOS: Invalid N, V and Z flags. CMOS: Valid flag adds one additional cycle. 7. Interrupt after fetch of BRK instruction. NMOS: Interrupt vector is loaded, BRK vector is ignored. CMOS: BRK is executed, then interrupt is executed. 6510 Instruction Timing The NMOS 6500 series processors always perform at least two reads for each instruction. In addition to the operation code (opcode), they fetch the next byte. This is quite efficient, as most instructions are two or three bytes long. The processors also use a sort of pipelining. If an instruction does not store data in memory on its last cycle, the processor can fetch the opcode of the next instruction while executing the last cycle. For instance, the instruction EOR #$FF truly takes three cycles. On the first cycle, the opcode $49 will be fetched. During the second cycle the processor decodes the opcode and fetches the parameter #$FF. On the third cycle, the processor will perform the operation and store the result to accumulator, but simultaneously it fetches the opcode for the next instruction. This is why the instruction effectively takes only two cycles. The following tables show what happens on the bus while executing different kinds of instructions. Interrupts NMI and IRQ both take 7 cycles. Their timing diagram is much like BRK's (see below). IRQ will be executed only when the I flag is clear. IRQ and BRK both set the I flag, whereas the NMI does not affect its state. The processor will usually wait for the current instruction to complete before executing the interrupt sequence. To process the interrupt before the next instruction, the interrupt must occur before the last cycle of the current instruction. There is one exception to this rule: the BRK instruction. If a hardware interrupt (NMI or IRQ) occurs before the fourth (flags saving) cycle of BRK, the BRK instruction will be skipped, and the processor will jump to the hardware interrupt vector. This sequence will always take 7 cycles. You do not completely lose the BRK interrupt, the B flag will be set in the pushed status register if a BRK instruction gets interrupted. When BRK and IRQ occur at the same time, this does not cause any problems, as your program will consider it as a BRK, and the IRQ would occur again after the processor returned from your BRK routine, unless you cleared the interrupt source in your BRK handler. But the simultaneous occurrence of NMI and BRK is far more fatal. If you do not check the B flag in the NMI routine and subtract two from the return address when needed, the BRK instruction will be skipped. If the NMI and IRQ interrupts overlap each other (one interrupt occurs before fetching the interrupt vector for the other interrupt), the processor will most probably jump to the NMI vector in every case, and then jump to the IRQ vector after processing the first instruction of the NMI handler. This has not been measured yet, but the IRQ is very similar to BRK, and many sources state that the NMI has higher priority than IRQ. However, it might be that the processor takes the interrupt that comes later, i.e. you could lose an NMI interrupt if an IRQ occurred in four cycles after it. After finishing the interrupt sequence, the processor will start to execute the first instruction of the interrupt routine. This proves that the processor uses a sort of pipelining: it finishes the current instruction (or interrupt sequence) while reading the opcode of the next instruction. RESET does not push program counter on stack, and it lasts probably 6 cycles after deactivating the signal. Like NMI, RESET preserves all registers except PC. Instructions accessing the stack BRK # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away), increment PC 3 $0100,S W push PCH on stack, decrement S 4 $0100,S W push PCL on stack, decrement S 5 $0100,S W push P on stack (with B flag set), decrement S 6 $FFFE R fetch PCL 7 $FFFF R fetch PCH RTI # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away) 3 $0100,S R increment S 4 $0100,S R pull P from stack, increment S 5 $0100,S R pull PCL from stack, increment S 6 $0100,S R pull PCH from stack RTS # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away) 3 $0100,S R increment S 4 $0100,S R pull PCL from stack, increment S 5 $0100,S R pull PCH from stack 6 PC R increment PC PHA, PHP # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away) 3 $0100,S W push register on stack, decrement S PLA, PLP # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away) 3 $0100,S R increment S 4 $0100,S R pull register from stack JSR # address R/W description --- ------- --- ------------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R fetch low address byte, increment PC 3 $0100,S R internal operation (predecrement S?) 4 $0100,S W push PCH on stack, decrement S 5 $0100,S W push PCL on stack, decrement S 6 PC R copy low address byte to PCL, fetch high address byte to PCH Accumulator or implied addressing # address R/W description --- ------- --- ----------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R read next instruction byte (and throw it away) Immediate addressing # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch value, increment PC Absolute addressing JMP # address R/W description --- ------- --- ------------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R fetch low address byte, increment PC 3 PC R copy low address byte to PCL, fetch high address byte to PCH Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, increment PC 4 address R read from effective address Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, increment PC 4 address R read from effective address 5 address W write the value back to effective address, and do the operation on it 6 address W write the new value to effective address Write instructions (STA, STX, STY, SAX) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, increment PC 4 address W write register to effective address Zero page addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address R read from effective address Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address R read from effective address 4 address W write the value back to effective address, and do the operation on it 5 address W write the new value to effective address Write instructions (STA, STX, STY, SAX) # address R/W description --- ------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address W write register to effective address Zero page indexed addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, NOP) # address R/W description --- --------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address R read from address, add index register to it 4 address+I* R read from effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- --------- --- --------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address R read from address, add index register X to it 4 address+X* R read from effective address 5 address+X* W write the value back to effective address, and do the operation on it 6 address+X* W write the new value to effective address Note: * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Write instructions (STA, STX, STY, SAX) # address R/W description --- --------- --- ------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R fetch address, increment PC 3 address R read from address, add index register to it 4 address+I* W write to effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address is always zero, i.e. page boundary crossings are not handled. Absolute indexed addressing Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT, LAX, LAE, SHS, NOP) # address R/W description --- --------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, add index register to low address byte, increment PC 4 address+I* R read from effective address, fix the high byte of effective address 5+ address+I R re-read from effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. + This cycle will be executed only if the effective address was invalid during cycle #4, i.e. page boundary was crossed. Read-Modify-Write instructions (ASL, LSR, ROL, ROR, INC, DEC, SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- --------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, add index register X to low address byte, increment PC 4 address+X* R read from effective address, fix the high byte of effective address 5 address+X R re-read from effective address 6 address+X W write the value back to effective address, and do the operation on it 7 address+X W write the new value to effective address Notes: * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Write instructions (STA, STX, STY, SHA, SHX, SHY) # address R/W description --- --------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch low byte of address, increment PC 3 PC R fetch high byte of address, add index register to low address byte, increment PC 4 address+I* R read from effective address, fix the high byte of effective address 5 address+I W write to effective address Notes: I denotes either index register (X or Y). * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Because the processor cannot undo a write to an invalid address, it always reads from the address first. Relative addressing (BCC, BCS, BNE, BEQ, BPL, BMI, BVC, BVS) # address R/W description --- --------- --- --------------------------------------------- 1 PC R fetch opcode, increment PC 2 PC R fetch operand, increment PC 3 PC R Fetch opcode of next instruction, If branch is taken, add operand to PCL. Otherwise increment PC. 4+ PC* R Fetch opcode of next instruction. Fix PCH. If it did not change, increment PC. 5! PC R Fetch opcode of next instruction, increment PC. Notes: The opcode fetch of the next instruction is included to this diagram for illustration purposes. When determining real execution times, remember to subtract the last cycle. * The high byte of Program Counter (PCH) may be invalid at this time, i.e. it may be smaller or bigger by $100. + If branch is taken, this cycle will be executed. ! If branch occurs to different page, this cycle will be executed. Indexed indirect addressing Read instructions (LDA, ORA, EOR, AND, ADC, CMP, SBC, LAX) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R read from the address, add X to it 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address R read from effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R read from the address, add X to it 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address R read from effective address 7 address W write the value back to effective address, and do the operation on it 8 address W write the new value to effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Write instructions (STA, SAX) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R read from the address, add X to it 4 pointer+X R fetch effective address low 5 pointer+X+1 R fetch effective address high 6 address W write to effective address Note: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. Indirect indexed addressing Read instructions (LDA, EOR, AND, ORA, ADC, SBC, CMP) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 6+ address+Y R read from effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. + This cycle will be executed only if the effective address was invalid during cycle #5, i.e. page boundary was crossed. Read-Modify-Write instructions (SLO, SRE, RLA, RRA, ISB, DCP) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 6 address+Y R read from effective address 7 address+Y W write the value back to effective address, and do the operation on it 8 address+Y W write the new value to effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Write instructions (STA, SHA) # address R/W description --- ----------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address, increment PC 3 pointer R fetch effective address low 4 pointer+1 R fetch effective address high, add Y to low byte of effective address 5 address+Y* R read from effective address, fix high byte of effective address 6 address+Y W write to effective address Notes: The effective address is always fetched from zero page, i.e. the zero page boundary crossing is not handled. * The high byte of the effective address may be invalid at this time, i.e. it may be smaller by $100. Absolute indirect addressing (JMP) # address R/W description --- --------- --- ------------------------------------------ 1 PC R fetch opcode, increment PC 2 PC R fetch pointer address low, increment PC 3 PC R fetch pointer address high, increment PC 4 pointer R fetch low address to latch 5 pointer+1* R fetch PCH, copy latch to PCL Note: * The PCH will always be fetched from the same page than PCL, i.e. page boundary crossing is not handled. How Real Programmers Acknowledge Interrupts With RMW instructions: ; beginning of combined raster/timer interrupt routine LSR $D019 ; clear VIC interrupts, read raster interrupt flag to C BCS raster ; jump if VIC caused an interrupt ... ; timer interrupt routine Operational diagram of LSR $D019: # data address R/W --- ---- ------- --- --------------------------------- 1 4E PC R fetch opcode 2 19 PC+1 R fetch address low 3 D0 PC+2 R fetch address high 4 xx $D019 R read memory 5 xx $D019 W write the value back, rotate right 6 xx/2 $D019 W write the new value back The 5th cycle acknowledges the interrupt by writing the same value back. If only raster interrupts are used, the 6th cycle has no effect on the VIC. (It might acknowledge also some other interrupts.) With indexed addressing: ; acknowledge interrupts to both CIAs LDX #$10 LDA $DCFD,X Operational diagram of LDA $DCFD,X: # data address R/W description --- ---- ------- --- --------------------------------- 1 BD PC R fetch opcode 2 FD PC+1 R fetch address low 3 DC PC+2 R fetch address high, add X to address low 4 xx $DC0D R read from address, fix high byte of address 5 yy $DD0D R read from right address ; acknowledge interrupts to CIA 2 LDX #$10 STA $DDFD,X Operational diagram of STA $DDFD,X: # data address R/W description --- ---- ------- --- --------------------------------- 1 9D PC R fetch opcode 2 FD PC+1 R fetch address low 3 DC PC+2 R fetch address high, add X to address low 4 xx $DD0D R read from address, fix high byte of address 5 ac $DE0D W write to right address With branch instructions: ; acknowledge interrupts to CIA 2 LDA #$00 ; clear N flag JMP $DD0A DD0A BPL $DC9D ; branch DC9D BRK ; return You need the following preparations to initialize the CIA registers: LDA #$91 ; argument of BPL STA $DD0B LDA #$10 ; BPL STA $DD0A STA $DD08 ; load the ToD values from the latches LDA $DD0B ; freeze the ToD display LDA #$7F STA $DC0D ; assure that $DC0D is $00 Operational diagram of BPL $DC9D: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DD0A R fetch opcode 2 91 $DD0B R fetch argument 3 xx $DD0C R fetch opcode, add argument to PCL 4 yy $DD9D R fetch opcode, fix PCH ( 5 00 $DC9D R fetch opcode ) ; acknowledge interrupts to CIA 1 LSR ; clear N flag JMP $DCFA DCFA BPL $DD0D DD0D BRK ; Again you need to set the ToD registers of CIA 1 and the ; Interrupt Control Register of CIA 2 first. Operational diagram of BPL $DD0D: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DCFA R fetch opcode 2 11 $DCFB R fetch argument 3 xx $DCFC R fetch opcode, add argument to PCL 4 yy $DC0D R fetch opcode, fix PCH ( 5 00 $DD0D R fetch opcode ) ; acknowledge interrupts to CIA 2 automagically ; preparations LDA #$7F STA $DD0D ; disable all interrupt sources of CIA2 LDA $DD0E AND #$BE ; ensure that $DD0C remains constant STA $DD0E ; and stop the timer LDA #$FD STA $DD0C ; parameter of BPL LDA #$10 STA $DD0B ; BPL LDA #$40 STA $DD0A ; RTI/parameter of LSR LDA #$46 STA $DD09 ; LSR STA $DD08 ; load the ToD values from the latches LDA $DD0B ; freeze the ToD display LDA #$09 STA $0318 LDA #$DD STA $0319 ; change NMI vector to $DD09 LDA #$FF ; Try changing this instruction's operand STA $DD05 ; (see comment below). LDA #$FF STA $DD04 ; set interrupt frequency to 1/65536 cycles LDA $DD0E AND #$80 ORA #$11 LDX #$81 STX $DD0D ; enable timer interrupt STA $DD0E ; start timer LDA #$00 ; To see that the interrupts really occur, STA $D011 ; use something like this and see how LOOP DEC $D020 ; changing the byte loaded to $DD05 from BNE LOOP ; #$FF to #$0F changes the image. When an NMI occurs, the processor jumps to Kernal code, which jumps to ($0318), which points to the following routine: DD09 LSR $40 ; clear N flag BPL $DD0A ; Note: $DD0A contains RTI. Operational diagram of BPL $DD0A: # data address R/W description --- ---- ------- --- --------------------------------- 1 10 $DD0B R fetch opcode 2 11 $DD0C R fetch argument 3 xx $DD0D R fetch opcode, add argument to PCL 4 40 $DD0A R fetch opcode, (fix PCH) With RTI: ; the fastest possible interrupt handler in the 6500 family ; preparations SEI LDA $01 ; disable ROM and enable I/O AND #$FD ORA #$05 STA $01 LDA #$7F STA $DD0D ; disable CIA 2's all interrupt sources LDA $DD0E AND #$BE ; ensure that $DD0C remains constant STA $DD0E ; and stop the timer LDA #$40 STA $DD0C ; store RTI to $DD0C LDA #$0C STA $FFFA LDA #$DD STA $FFFB ; change NMI vector to $DD0C LDA #$FF ; Try changing this instruction's operand STA $DD05 ; (see comment below). LDA #$FF STA $DD04 ; set interrupt frequency to 1/65536 cycles LDA $DD0E AND #$80 ORA #$11 LDX #$81 STX $DD0D ; enable timer interrupt STA $DD0E ; start timer LDA #$00 ; To see that the interrupts really occur, STA $D011 ; use something like this and see how LOOP DEC $D020 ; changing the byte loaded to $DD05 from BNE LOOP ; #$FF to #$0F changes the image. When an NMI occurs, the processor jumps to Kernal code, which jumps to ($0318), which points to the following routine: DD0C RTI How on earth can this clear the interrupts? Remember, the processor always fetches two successive bytes for each instruction. A little more practical version of this is redirecting the NMI (or IRQ) to your own routine, whose last instruction is JMP $DD0C or JMP $DC0C. If you want to confuse more, change the 0 in the address to a hexadecimal digit different from the one you used when writing the RTI. Or you can combine the latter two methods: DD09 LSR $xx ; xx is any appropriate BCD value 00-59. BPL $DCFC DCFC RTI This example acknowledges interrupts to both CIAs. If you want to confuse the examiners of your code, you can use any of these techniques. Although these examples use no undefined opcodes, they do not necessarily run correctly on CMOS processors. However, the RTI example should run on 65C02 and 65C816, and the latter branch instruction example might work as well. The RMW instruction method has been used in some demos, others were developed by Marko Mäkelä. His favourite is the automagical RTI method, although it does not have any practical applications, except for some time dependent data decryption routines for very complicated copy protections. Memory Management The processor's point of view The Commodore 64 has access to more memory than its processor can directly handle. This is possible by banking the memory. There are five user configurable inputs that affect the banking. Three of them can be controlled by program, and the rest two serve as control lines on the memory expansion port. The 6510 MPU has an integrated I/O port with six I/O lines. This port is accessed through the memory locations 0 and 1. The location 0 is the Data Direction Register for the Peripheral data Register, which is mapped to the other location. When a bit in the DDR is set, the corresponding PR bit controls the state of a corresponding Peripheral line as an output. When it is clear, the state of the Peripheral line is reflected by the Peripheral register. The Peripheral lines are numbered from 0 to 5, and they are mapped to the DDR and PR bits 0 - 5, respectively. The 8502 processor, which is used in the Commodore 128, has seven Peripheral lines in its I/O port. The pin P6 is connected to the ASC/CC key (Caps lock in English versions). The I/O lines have the following functions: Direction Line Function --------- ---- -------- out P5 Cassette motor control. (0 = motor spins) in P4 Cassette sense. (0 = PLAY button depressed) out P3 Cassette write data. out P2 CHAREN out P1 HIRAM out P0 LORAM The default value of the DDR register is $2F, so all lines except Cassette sense are outputs. The default PR value is $37 (Datassette motor stopped, and all three memory management lines high). If you turn any memory management line to input, the external pull-up resistors make it to look like it is outputting logical "1". This is actually why the computer always switches the ROMs in upon startup: Pulling the -RESET line low resets all Peripheral lines to inputs, thus setting all three processor-driven memory management lines to logical "1" level. The two remaining memory management lines are -EXROM and -GAME on the cartridge port. Each line has a pull-up resistor, so the lines are "1" by default. Even though the memory banking has been implemented with a 82S100 Programmable _Logic_ Array, there is only one control line that seems to behave logically at first sight, the -CHAREN line. It is mostly used to choose between I/O address space and the character generator ROM. The following memory map introduces the oddities of -CHAREN and the other memory management lines. It is based on the memory maps in the Commodore 64 Programmer's Reference Guide, pp. 263 - 267, and some errors and inaccuracies have been corrected. The leftmost column of the table contains addresses in hexadecimal notation. The columns aside it introduce all possible memory configurations. The default mode is on the left, and the absolutely most rarely used Ultimax game console configuration is on the right. (Has anybody ever seen any Ultimax games?) Each memory configuration column has one or more four-digit binary numbers as a title. The bits, from left to right, represent the state of the -LORAM, -HIRAM, -GAME and -EXROM lines, respectively. The bits whose state does not matter are marked with "x". For instance, when the Ultimax video game configuration is active (the -GAME line is shorted to ground), the -LORAM and -HIRAM lines have no effect. default 001x Ultimax 1111 101x 1000 011x 00x0 1110 0100 1100 xx01 10000 ---------------------------------------------------------------------- F000 Kernal RAM RAM Kernal RAM Kernal Kernal Kernal ROMH(* E000 ---------------------------------------------------------------------- D000 IO/C IO/C IO/RAM IO/C RAM IO/C IO/C IO/C I/O ---------------------------------------------------------------------- C000 RAM RAM RAM RAM RAM RAM RAM RAM - ---------------------------------------------------------------------- B000 BASIC RAM RAM RAM RAM BASIC ROMH ROMH - A000 ---------------------------------------------------------------------- 9000 RAM RAM RAM RAM RAM ROML RAM ROML ROML(* 8000 ---------------------------------------------------------------------- 7000 6000 RAM RAM RAM RAM RAM RAM RAM RAM - 5000 4000 ---------------------------------------------------------------------- 3000 2000 RAM RAM RAM RAM RAM RAM RAM RAM - 1000 ---------------------------------------------------------------------- 0000 RAM RAM RAM RAM RAM RAM RAM RAM RAM ---------------------------------------------------------------------- *) Internal memory does not respond to write accesses to these areas. Legend: Kernal E000-FFFF Kernal ROM. IO/C D000-DFFF I/O address space or Character generator ROM, selected by -CHAREN. If the CHAREN bit is clear, the character generator ROM will be selected. If it is set, the I/O chips are accessible. IO/RAM D000-DFFF I/O address space or RAM, selected by -CHAREN. If the CHAREN bit is clear, the character generator ROM will be selected. If it is set, the internal RAM is accessible. I/O D000-DFFF I/O address space. The -CHAREN line has no effect. BASIC A000-BFFF BASIC ROM. ROMH A000-BFFF or External ROM with the -ROMH line E000-FFFF connected to its -CS line. ROML 8000-9FFF External ROM with the -ROML line connected to its -CS line. RAM various ranges Commodore 64's internal RAM. - 1000-7FFF and Open address space. A000-CFFF The Commodore 64's memory chips do not detect any memory accesses to this area except the VIC-II's DMA and memory refreshes. NOTE: Whenever the processor tries to write to any ROM area (Kernal, BASIC, CHAROM, ROML, ROMH), the data will get "through the ROM" to the C64's internal RAM. For this reason, you can easily copy data from ROM to RAM, without any bank switching. But implementing external memory expansions without DMA is very hard, as you have to use a 256 byte window on the I/O1 or I/O2 area, like GEORAM, or the Ultimax memory configuration, if you do not want the data to be written both to internal and external RAM. However, this is not true for the Ultimax video game configuration. In that mode, the internal RAM ignores all memory accesses outside the area $0000-$0FFF, unless they are performed by the VIC, and you can write to external memory at $1000-$CFFF and $E000-$FFFF, if any, without changing the contents of the internal RAM. A note concerning the I/O area The I/O area of the Commodore 64 is divided as follows: Address range Owner ------------- ----- D000-D3FF MOS 6567/6569 VIC-II Video Interface Controller D400-D7FF MOS 6581 SID Sound Interface Device D800-DBFF Color RAM (only lower nybbles are connected) DC00-DCFF MOS 6526 CIA Complex Interface Adapter #1 DD00-DDFF MOS 6526 CIA Complex Interface Adapter #2 DE00-DEFF User expansion #1 (-I/O1 on Expansion Port) DF00-DFFF User expansion #2 (-I/O2 on Expansion Port) As you can see, the address ranges for the chips are much larger than required. Because of this, you can access the chips through multiple memory areas. The VIC-II appears in its window every $40 addresses. For instance, the addresses $D040 and $D080 are both mapped to the Sprite 0 X co-ordinate register. The SID has one register selection line less, thus it appears at every $20 bytes. The CIA chips have only 16 registers, so there are 16 copies of each in their memory area. However, you should not use other addresses than those specified by Commodore. For instance, the Commodore 128 mapped its additional I/O chips to this same memory area, and the SID responds only to the addresses D400-D4FF, also when in C64 mode. And the Commodore 65, or the C64DX, which unfortunately did not make its way to the market, could narrow the memory window reserved for its CSG 4567 VIC-III. The video chip The MOS 6567/6569 VIC-II Video Interface Controller has access to only 16 kilobytes at a time. To enable the VIC-II to access the whole 64 kB memory space, the main memory is divided to four banks of 16 kB each. The lines PA0 and PA1 of the second CIA are the inverse of the virtual VIC-II address lines VA14 and VA15, respectively. To select a VIC-II bank other than the default, you must program the CIA lines to output the desired bit pair. For instance, the following code selects the memory area $4000-$7FFF (bank 1) for the video controller: LDA $DD02 ; Data Direction Register A ORA #$03 ; Set pins PA0 and PA1 to outputs STA $DD02 LDA $DD00 AND #$FC ; Mask the lowmost bit pair off ORA #$02 ; Select VIC-II bank 1 (the inverse of binary 01 is 10) STA $DD00 Why should you set the pins to outputs? Hardware RESET resets all I/O lines to inputs, and thanks to the CIA's internal pull-up resistors, the inputs actually output logical high voltage level. So, upon -RESET, the video bank 0 is selected automatically, and older Kernals could leave it uninitialized. Note that the VIC-II always fetches its information from the internal RAM, totally ignoring the memory configuration lines. There is only one exception to this rule: The character generator ROM. Unless the Ultimax mode is selected, VIC-II "sees" character generator ROM in the memory areas 1000-1FFF and 9000-9FFF. If the Ultimax configuration is active, the VIC-II fetches all data from the internal RAM. Accessing the memory places 0 and 1 Although the addresses 0 and 1 of the processor are hard-wired to its on-chip I/O port registers, you can access the memory places 0 and 1. The video chip always reads from RAM (or character generator ROM), so you can use it to read also from 0 and 1. Enable the bit-map screen and set the start address of the graphics screen to 0. Now you can see these two memory locations in the upper left corner. Alternatively, you could set the character generator start address to 0, in which case you would see these locations in @ characters (code 0). Or, you can activate a sprite with start address 0. Whichever method you choose, you can read these locations with sprite collision registers. Define a sprite consisting of only one dot, and move it to read the 8 bits of each byte with the sprite to sprite or sprite to background collision registers. But how can you write to these locations? If you execute the command POKE 53265,59, you will see that the memory place 1 changes its value wildly. If you disable the interrupts (POKE53664,127), it will remain stable. How is this possible? When the processor writes to 0 or 1, it will put the address on the address bus and set the R/-W line to indicate a write cycle, but it does not put the data on the data bus. Thus, it writes "random" data. Of course this data is not truly random. Actually it is something that the video chip left on the bus on its clock half. So, if you want to write a certain value on 0 or 1, you have to make the video chip to read that value just before the store cycle. This requires very accurate timing, but it can be achieved even with a carefully written BASIC program. Just wait the video chip to be in the top or bottom border and the beam to be in the middle of the screen (not in the side borders). At this area, the video chip will always read the last byte of the video bank (by default $3FFF). Now, if you store anything to the I/O port registers 0 or 1 while the video chip is refreshing this screen area, the contents of the memory place $3FFF will be written to the respective memory place (0 or 1). Note that this trick does not work reliably on all computers. You need good RF protection, as the data bus will not be driven at all when the value remains on it. On the C128 in its 2 MHz mode, you can write to the memory places with an easier kludge. Just make sure that the video chip is not performing the memory refresh (as it would slow down to 1 MHz in that case), and use some instruction that reads from a proper memory location before writing to 0 or 1. Indexed zero-page addressing modes are good for it. I tested this trick with LDX#1 followed by STA $FF,X. As you can read from the instruction timing section of this document, the instruction first reads from $FF (the base address) and then writes to 0. The timing can be done with a simple LDA$D012:CMP$D012:BEQ *-3 loop. But in the C128 mode you can relocate the stack page to zero page, so this trick is not really useful. You can also read the memory places 0 and 1 much faster than with sprite collisions. Just make the video chip to read from 0 or 1, and then read from non-connected address space ($DE00-$DFFF on a stock C64; also $D700-$D7FF on C128's). Actually, you can produce a complete map of the video timing on your computer by making a loop that reads from open address space, pausing one frame and one cycle in between. And if you are into copy protections, you could write a program on the open address space. Just remember that there must be a byte on the bus for each clock cycle. These tricks unfortunately do not work reliably on all units. So far I have had the opportunity to try it on three computers, two of which were Commodore 128 DCR's (C128's housed in metal case with a 1571 floppy disk drive, whose controller is integrated on the mother board). One C128DCR drove some of its data bits too heavily to high state. No wonder, since its housing consisted of some newspapers spread on the floor. Autostart Code Although this document concentrates on hardware, there is one thing that you must know about the firmware to get complete control over your computer. As the Commodore 64 always switches the ROMs on upon -RESET, you cannot relocate the RESET vector by writing something in RAM. Instead, you have to use the Autostart code that will be recognized by the KERNAL ROM. If the memory places from $8004 through $8008 contain the PETSCII string 'CBM80' (C3 C2 CD 38 30), the RESET routine jumps to ($8000) and the default NMI handler jumps to ($8002). Some programs that load into RAM take advantage of this and don't let the machine to be reset. You don't have to modify the ROM to get rid of this annoying behaviour. Simply ground the -EXROM line for the beginning of the RESET sequence. Notes See the MCS 6500 Microcomputer Family Programming Manual for less information. References: C64 Memory Maps C64 Programmer's Reference Guide, pp. 262-267 C64 Schematic Diagram 6510 Block Diagram C64 Programmer's Reference Guide, p. 404 Instruction Set C64 Programmer's Reference Guide, pp. 254-255, 416-417 C64/128 Real Programmer's Revenge Guide C=Lehti magazine 4/87 |