AVR Top Octave Tone Generator

After reading the hackaday entry “ASK HACKADAY: HOW DO YOU DIY A TOP-OCTAVE GENERATOR?” (https://hackaday.com/2018/05/24/ask-hackaday-diy-top-octave-generator/), I decided to take up the challenge.

On a standard 16 MHz Arduino Uno, I was able to get 10 outputs running until I ran out of CPU clock cycles.

Switching to a 20 MHz clock, all 12 outputs are operational.

The basic idea of the code is to generate, in real time, a table entry of bits to flip (2 bytes) and the delay until the next flip (1 byte), and an ISR that consumes the table entries while setting up an interrupt after the next delay is complete.

The main loop generates the entries for the table. Its job is to calculate the table entries faster than the ISR can consume them. With a 16 MHz clock, the best we can do is to calculate 10 outputs while staying ahead of the ISR. With a 20 MHz clock, all 12 outputs can be calculated faster than the ISR can read them.

Because the AVR clock is 20 MHz and the delays are in increments of 20 clock cycles, the shortest delay is 1 us. To match the Top Octave Generator values with a 2 MHz clock, we only generate 50% duty cycle outputs for the even values at 2 MHz. For the odd values, we generate a low time that is one period longer than the high time.

Note 2MHz Delay Value 1MHz high Delay Value 1MHz Low Delay Value
C8 239 119 120
B7 253 126 127
A#7 268 134 134
A7 284 142 142
G#7 301 150 151
G7 319 159 160
F#7 338 169 169
F7 358 179 179
E7 379 189 190
D#7 402 201 201
D7 426 213 213
C#7 451 225 226

Every time the Output Compare matches the Timer Count, the main loop is interrupted and the ISR runs. This will consume at least one entry in the table before it returns to the main loop.

I added one extra 0 byte at the end of the entry to keep everything on a modulo 4 boundary, and to automatically do the pointer wrap at the end of the table by incrementing only XL. There are 256 bytes for this table, or 64 entries.

The main loop is written in assembly for speed. The C code version of this loop is as follows:

#define NUM_TONES 12

uint8_t tc[12];
uint8_t cnt[12];
int8_t  phase[12];

uint8_t buf[256];

// Load the terminal counts, and
// initialize the counters
tc[0]  = cnt[0]  = 119;  // 119 and 120
tc[1]  = cnt[1]  = 126;  // 126 and 127
tc[2]  = cnt[2]  = 134;
tc[3]  = cnt[3]  = 142;
tc[4]  = cnt[4]  = 150;  // 150 and 151
tc[5]  = cnt[5]  = 159;  // 159 and 160
tc[6]  = cnt[6]  = 169;
tc[7]  = cnt[7]  = 179;
tc[8]  = cnt[8]  = 189;  // 189 and 190
tc[9]  = cnt[9]  = 201;
tc[10] = cnt[10] = 213;
tc[11] = cnt[11] = 225;  // 225 and 226

// If phase == 0, then use the same terminal count
// for the high and low time.
// If phase == 1, then the low time will be
// one cycle longer than the high time.
phase[0] = 1;
phase[1] = 1;
phase[2] = 0;
phase[3] = 0;
phase[4] = 1;
phase[5] = 1;
phase[6] = 0;
phase[7] = 0;
phase[8] = 1;
phase[9] = 0;
phase[10] = 0;
phase[11] = 1;

volatile uint8_t rptr = 0;
uint8_t wptr = 0;
uint8_t min = 119;  // Start with tc of the smallest entry
uint16_t prev_tog = 0;  // First table entry has no outputs toggle

while (1) {
  uint8_t  next_min = 0xff;
  uint16_t tog = 0;

  for (int ii = NUM_TONES-1; ii >= 0; ii--) {
    cnt[ii] -= min_val;

    if (cnt[ii] == 0) {
      // This counter has expired
      // Reload the counter
      cnt[ii] = tc[ii];

      // Adjust the next terminal count (if necessary)
      tc[ii] += phase[ii];

      // If the phase is 0, then this doesn't have any effect.
      // Otherwise, this will cause the terminal count to
      // increment or decrement each time the counter expires.
      phase[ii] = -phase[ii];

      /* Toggle this output */
      tog |= (1<<ii);
    }

    // Find the smallest value before counter expiration.
    // The smallest value will be the delay until
    // the next counter expires next pass through the loop.
    if (cnt[ii] < next_min) {
      next_min = cnt[ii];
    }
  }

  // Add entry to buffer
  buf[wptr++] = prev_tog & 0xff;
  buf[wptr++] = prev_tog >> 8;
  buf[wptr++] = min_val-1;  // 0 == smallest delay (20 clocks)
  wptr = (wptr + 1) & 0xff; // Make entry mod4, keep wptr on table

  // The calculated delay must complete before the bits toggle.
  // This delays the toggle by one pass through the loop.
  min_val = next_min;
  prev_tog = tog;

  // Loop here until the buffer has room
  while (rptr == wptr)
    ;
}

An interrupt service routine reads the table and changes the 8 bits of PORTD and 4 bits of PORTB.

The 20-cycle loop (when the delay == 0) is:

dly0:   LD      r0,X+
        OUT     PIND,r0
        LD      r0,X+
        OUT     PINB,r0
        LD      dly_lsb,X+
        INC     XL
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        TST     dly_lsb
        BREQ    dly0

If we need a 40-cycle loop (delay == 1), then we can follow this with:

        CPI     dly_lsb,1
        BREQ    dly1

dly1:   NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        NOP
        RJMP    dly0

We do the same thing for a 60-cycle loop (delay == 2). But if the delay is 80 cycles or larger (delay > 2), then we set up the timer to generate an interrupt after the correct number of cycles has passed, and then we return from the ISR. This allows the main loop to get some work done instead of just burning cycles with NOPs.

;;; Convert delay to number of clock cycles
        LDI     tmp,20
        MUL     dly_lsb,tmp
        MOVW    dly_lsb,r0

;;; Compensate for delay in prologue and epilogue of ISR
        SUBI    dly_lsb,30
        SBC     dly_msb,c_zero

;;; Update Output Compare for next delay
        LDS     tcnt_l,TCNT1L
        LDS     tcnt_h,TCNT1H
        ADD     tcnt_l,dly_lsb
        ADC     tcnt_h,dly_msb
        STS     OCR1AH,tcnt_h
        STS     OCR1AL,tcnt_l

;;; Restore Status register
        POP r0
        OUT 0x3f,r0
        RETI

The prologue to the ISR has to compensate for the possibility of the interrupt occurring either on a 1-cycle or 2-cycle instruction (the code uses no 3 or more cycle instructions). It does that by comparing the Output Compare value to the Timer Count inside the ISR. If the delay is one more than expected, then the interrupt happened during a 2-cycle instruction.

The prologue of the ISR does the compensation:

;;; Save Status register
isr:    IN      r0,0x3f
        PUSH    r0
;;; Compare current Timer Count to Output Compare value
        LDS     tmp,TCNT1L
        LDS     r0,OCR1AL
        SUB     tmp,r0
        SUBI    tmp,12        ; Delta for 1-cc instruction
;;; If the interrupt happened on a 2-cc instruction, branch
        BRNE    dly0

;;; The interrupt happened on a 1-cc instruction
;;; Execute 1 extra NOP to equalize the delay
        NOP
        NOP

dly0:

The smallest device that can be used must have these features:

  1. At least one 16-bit Timer
  2. At least 512 bytes of memory (256-byte table plus 36 bytes for cnt[], tc[], and phase[])
  3. Supports a 20 MHz processor clock
  4. Has at least 12 outputs for pin toggling

The smallest part I was able to find that meets these criteria was the ATTINY816, which costs $0.50 in 5K pricing (or $0.90 for Qty. 1 in an SOIC20 package, according to DigiKey).

Here is the Arduino wrapper and gcc assembly source for the 20MHz 12-output version:

extern "C" {
  // function prototypes
  void tstart();
}

void setup() {
  /* Turn off timer0 interrupt */
  TIMSK0 = 0;

  tstart();
}

void loop() {
}
;;; tone_loop_20.S
;;;
;;;  Created: 6/5/2018 11:38:12 AM
;;;   Author: aprimatic
;;;
;;; Copyright 2018 APWizardry LLC
;;;
;;; Redistribution and use in source and binary forms, with or
;;; without modification, are permitted provided that the following
;;; conditions are met:
;;;
;;; 1. Redistributions of source code must retain the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer.
;;;
;;; 2. Redistributions in binary form must reproduce the above
;;; copyright notice, this list of conditions and the following
;;; disclaimer in the documentation and/or other materials provided
;;; with the distribution.
;;;
;;; 3. Neither the name of the copyright holder nor the names of
;;; its contributors may be used to endorse or promote products
;;; derived from this software without specific prior written
;;; permission.
;;;
;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
;;; CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
;;; INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
;;; MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
;;; DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
;;; CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
;;; SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
;;; NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
;;; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
;;; HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
;;; CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
;;; OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
;;; EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#define __SFR_OFFSET 0
  
#include <avr/io.h>

;;; Registers that aren't used with immediate modes
#define togD      r2
#define togB      r3
#define prev_togD r4
#define prev_togB r5
#define tcnt_l    r6
#define tcnt_h    r7
#define cnt       r8
#define tc        r9
#define c_zero    r10

;;; Registers that are used with immediate modes
#define tmp       r16
#define phase     r17
#define dly_lsb   r18
#define dly_msb   r19
#define min_val   r20
#define next_min  r21

;;; Pointers to arrays
#define p_cnt   0x100
#define p_tc    0x110
#define p_phase 0x120

;;; Pointer to 256-byte buffer
#define p_buf   0x200

.section .text
.global TIMER1_COMPA_vect

;;; Timer 1 Output Compare Interrupt Service Routine
;;; Preserves: Flags
;;; Modifies:
;;; r0, r1, tcnt_l(r8), tcnt_h(r9), tmp(r16),
;;; dly_lsb(r18), dly_msb(r19), XL(r30)
TIMER1_COMPA_vect:
;;; PUSH FLAGS
        IN      r0,0x3f                 ; 1
        PUSH    r0                      ; 2
;;; Read TCNT1
        LDS     tmp,TCNT1L              ; 2
;;; Compensate for 2-cycle instructions delaying interrupt for 1cc
        LDS     r0,OCR1AL               ; 2
;;; Subtract OCRA
        SUB     tmp,r0                  ; 1
;;; Subtract elapsed time to enter ISR
        SUBI    tmp,12                  ; 1
;;; If we were interrupted on a 2cc instruction, branch
        BRNE    dly0                    ; 2

;;; We were interrupted on a 1cc instruction
;;; Add one extra NOP to equalize the paths
        NOP                             ; 1-1
        NOP                             ; 1
                                        ;---
                                        ; 11/12
;;; This is the 20cc loop if dly_val == 0
dly0:   LD      r0,X+                   ; 2
        OUT     PIND,r0                 ; 1
        LD      r0,X+                   ; 2
        OUT     PINB,r0                 ; 1
        LD      dly_lsb,X+              ; 2
        INC     XL                      ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        TST     dly_lsb                 ; 1
        BREQ    dly0                    ; 2
                                        ;---
                                        ; 20
;;; This is the 40cc loop if dly_val == 1
        CPI     dly_lsb,1               ; 1-1
        BREQ    dly1                    ; 2
                                        ;---
                                        ; 2
;;; This is the 60cc loop if dly_val == 2
        CPI     dly_lsb,2               ; 1-1
        BREQ    dly2                    ; 2
                                        ;---
                                        ; 2
;;; Multiply delay by 20
        LDI     tmp,20                  ; 1-1
        MUL     dly_lsb,tmp             ; 2
        MOVW    dly_lsb,r0              ; 1
;;; Adjust delay
        SUBI    dly_lsb,30              ; 1
        SBC     dly_msb,c_zero          ; 1
;;; Get current timer value
        LDS     tcnt_l,TCNT1L           ; 2
        LDS     tcnt_h,TCNT1H           ; 2
;;; Add adjusted delay to current timer value
        ADD     tcnt_l,dly_lsb          ; 1
        ADC     tcnt_h,dly_msb          ; 1
;;; Set up next Output Compare
        STS     OCR1AH,tcnt_h           ; 2
        STS     OCR1AL,tcnt_l           ; 2

        POP     r0                      ; 2
        OUT     0x3f,r0                 ; 1
        RETI                            ; 4
                                        ;---
                                        ; 22
;;; These extra cycles keep dly1 and dly2 on 20cc boundaries
dly2:   NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
                                        ;---
                                        ; 18
;;; Fall through
dly1:   NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        NOP                             ; 1
        RJMP    dly0                    ; 2
                                        ;---
                                        ; 18
.global tstart
;;; Start of code
tstart: CLI
        CLR     c_zero
;;; Set PORTD to all outputs
        LDI     tmp,0xff
        OUT     DDRD,tmp
;;; Set PORTB to outputs on bottom 4 bits
        LDI     tmp,0x0f
        OUT     DDRB,tmp
        
;;; Initialize tc
;;; Initialize cnt
        LDI     tmp,119         ; C8
        STS     p_tc,tmp
        STS     p_cnt,tmp
        LDI     tmp,126         ; B7
        STS     p_tc+1,tmp
        STS     p_cnt+1,tmp
        LDI     tmp,134         ; A#7
        STS     p_tc+2,tmp
        STS     p_cnt+2,tmp
        LDI     tmp,142         ; A7
        STS     p_tc+3,tmp
        STS     p_cnt+3,tmp
        LDI     tmp,150         ; G#7
        STS     p_tc+4,tmp
        STS     p_cnt+4,tmp
        LDI     tmp,159         ; G7
        STS     p_tc+5,tmp
        STS     p_cnt+5,tmp
        LDI     tmp,169         ; F#7
        STS     p_tc+6,tmp
        STS     p_cnt+6,tmp
        LDI     tmp,179         ; F7
        STS     p_tc+7,tmp
        STS     p_cnt+7,tmp
        LDI     tmp,189         ; E7
        STS     p_tc+8,tmp
        STS     p_cnt+8,tmp
        LDI     tmp,201         ; D#7
        STS     p_tc+9,tmp
        STS     p_cnt+9,tmp
        LDI     tmp,213         ; D7
        STS     p_tc+10,tmp
        STS     p_cnt+10,tmp
        LDI     tmp,225         ; C#7
        STS     p_tc+11,tmp
        STS     p_cnt+11,tmp
        
;;; Initialize phases
        LDI     tmp,1
        STS     p_phase,tmp
        LDI     tmp,1
        STS     p_phase+1,tmp
        LDI     tmp,0
        STS     p_phase+2,tmp
        LDI     tmp,0
        STS     p_phase+3,tmp
        LDI     tmp,1
        STS     p_phase+4,tmp
        LDI     tmp,1
        STS     p_phase+5,tmp
        LDI     tmp,0
        STS     p_phase+6,tmp
        LDI     tmp,0
        STS     p_phase+7,tmp
        LDI     tmp,1
        STS     p_phase+8,tmp
        LDI     tmp,0
        STS     p_phase+9,tmp
        LDI     tmp,0
        STS     p_phase+10,tmp
        LDI     tmp,1
        STS     p_phase+11,tmp

;;; Set up rptr register
        LDI     XL,lo8(p_buf)
        LDI     XH,hi8(p_buf)
                
;;; Set up wptr regsiter
        LDI     ZL,lo8(p_buf)
        LDI     ZH,hi8(p_buf)
                
;;; Set up initial min_val
; uint16_t = min_val = 119;
        LDI     tmp,119
        MOV     min_val,tmp

;;; Start with no toggle
        CLR     prev_togB
        CLR     prev_togD

;;; Set up Timer1
;;; Inital TCNT = 0
        STS     TCNT1H,c_zero
        STS     TCNT1L,c_zero
;;; Initial OCR1A = 0x0800
;;; This allows the main loop to get a few dozen entries ahead of 
;;; the ISR
        LDI     tmp,8
        STS     OCR1AH,tmp
        STS     OCR1AL,c_zero

;;; Normal Port Operation
;;; clkIO/1 (no prescaling)
        CLR     tmp
        STS     TCCR1A,tmp
        LDI     tmp,1<<CS10
        STS     TCCR1B,tmp

;;; Enable OCR1A Interrupt
        LDI     tmp,1<<OCIE1A
        STS     TIMSK1,tmp

;;; Enable Interrupts
        SEI

;;; Set up MSB of Y register (doesn't ever change)
        LDI     YH,hi8(p_cnt)

;;; Main loop
; while (1) {
;   next_min = 255;
lp0:    LDI     next_min,255
;   tog = 0;
        CLR     togD

        LDI     YL,11
;   for(int ii = 11; ii >= 0; ii--) {
;   cnt[ii] -= min_val;
lp1:    LDD     cnt,Y+(p_cnt&0xff)
        SUB     cnt,min_val
        CLC
;   if (cnt[ii] == 0) {
        BRNE    ar1

;     cnt[ii] = tc[ii];
        LDD     tc,Y+(p_tc&0xff)
        MOV     cnt,tc
        LDD     phase,Y+(p_phase&0xff)
;     tc[ii] += phase[ii];
        ADD     tc,phase
        STD     Y+(p_tc&0xff),tc
;     phase[ii] = -phase[ii];
        NEG     phase
        STD     Y+(p_phase&0xff),phase
;     tog |= (1<<ii);
        SEC

ar1:    ROL     togD
        ROL     togB
;   }
        STD     Y+(p_cnt&0xff),cnt

;   if (cnt[ii] < next_min) {
        CP      cnt,next_min
        BRSH    ar2

; next_min = cnt[ii];
        MOV     next_min,cnt
;   }
ar2:    SUBI    YL,1
        BRCC    lp1
; }
;;; Store toggle bits and delay into table
; buf[wptr++] = prev_tog & 0xff;
        ST      Z+,prev_togD
; buf[wptr++] = prev_tog >> 8;
        ST      Z+,prev_togB
; buf[wptr++] = min_val - 1;
        DEC     min_val
        ST      Z+,min_val
; wptr = (wptr + 1) & 0xff;
        INC     ZL

; min_val = next_min;
        MOV     min_val,next_min
; prev_tog = tog;
        MOVW    prev_togD,togD

; while (rptr == wptr)
;   ;
lp2:    CP      XL,ZL
        BREQ    lp2
;;; Go back to top
; }
        RJMP    lp0

Edit: This article was the subject of an Ask Hackaday Answered article! (https://hackaday.com/2018/08/22/ask-hackaday-answered-the-tale-of-the-top-octave-generator)

18 thoughts on “AVR Top Octave Tone Generator

    1. Hi Davide,

      The dividers could not be integers and still meet the tuning accuracy with a 200kHz clock. Take B7 for example, which has the ideal frequency of 7092 Hz.

      With a 2 MHz clock, B7 needs a divider of 253 to achieve the frequency 7905 Hz. This has an error less than 1 cent.

      With a 200 kHz clock, the divider would need to be 25.3, which is hard to do in either code or hardware. Rounding it to 25 would generate a frequency of 8000 Hz, which is sharp by 21 cents. This would sound bad, especially since other notes could be off by -18 to 25 cents from ideal, using 200 kHz as a base clock.

  1. What made you decide to set the unit delay to 1 us ?
    Would a faster architecture –e.g. Due– allow for a finer time-grid, or
    is that pushing the limits too far?

    1. To emulate the Top Octave Generator, the fastest update rate necessary is one loop every 500 ns (2 MHz). This rate made the loop too small for the 8-bit AVR, but by doubling it to 1 us, the 8-bit AVR could handle it.

      I imagine a faster processor can potentially handle a 500 ns update rate, but there is nothing to be gained in emulating the Top Octave Generator any faster than 2 MHz, since that was the original base frequency of the component. If you’d like to go beyond the accuracy of the Top Octave Generator, then maybe higher update rates could help you achieve that.

      Regards,

      Ag Primatic

  2. Could you verify your TOG is generating notes C#7 (2217.3Hz) through C8 (4184.1Hz). I based my efforts on a TOG IC like MK50242 which generates C#8(4434.6Hz) through C9(8368.2Hz) from a 2MHz clock. No wonder it was so **** difficult.

    1. Hi Alan, I need to realize the TOG object of the discussion and I would be interested to see also the code and the complete project of the schemes you have created, in order to replace the original generator Mk50240P on my crumar organizer 2. Buying a Mk 502040p on ebay costs more than what I paid for the organ.
      My email: giovenuti@libero.it.
      Thank you
      Gioacchino Venuti
      .

  3. After a couple of nights trying to wrap my brain around your elegant TOG algorithm, I came to the conclusion that this technique coupled with an ATtiny816 would be a perfect match for an upcoming tone generator project. One thing that I don’t quite get is how the toggle bits are used by the ISR. Shouldn’t they be exclusive ORed with the current state of the PORTD and PORTB GPIOs? Please explain.

    Thanks for the obviously significant effort and creativity that went into this solution for resurrecting and surpassing the capability of the obsolete MK50240.

    – Ken

    1. Hi Ken,

      Glad you enjoyed the article and the code. Hope it is useful to you for your upcoming project.

      Writing a ‘1’ to a bit in the PINB or PIND registers on an AVR toggles the output without requiring an XOR instructions. Please see Section 11.2.2 of the ATmega328P data sheet for a longer description.

      Regards,

      Ag Primatic

      1. Ag,
        Mea culpa. I should have read the data sheet more closely.

        A couple more questions if you don’t mind. My application requires generating frequencies from F6 down to D#3 but hopefully at the 1 MHz interrupt resolution. Consequently, I will need to use 16-bit register accesses for the Timer/Counter, Ring Buffer, ISR and Main Loop. The Main Loop should be fine, but I’m concerned about the ISR. What would be the limitations on the cumulative clock cycle delays of the Prologue and Epilogue sequences? I’m not clear about the significance of the subtraction by 12 in the Prologue and 30 in the Epilogue. Also, I’m hoping to avoid a 16-bit multiplication in the Epilogue.
        Thanks again,
        – Ken

        1. Hi Ken,

          No problem about reading the datasheet. It’s 452 pages, so I wouldn’t expect you to have it memorized! 🙂

          It’s been a while since I wrote the code, but here’s my recollection on what the constants in the prologue and epilogue are:

          The 12 in the prologue is the number of clock cycles it takes to interrupt a 1-cycle instruction (7 cycles for the interrupt + 5 cycles of instructions at the beginning of the ISR to read the timer). It’s used to decide if we were interrupted on a 1-cycle or 2-cycle instruction, and the delay through the ISR prologue is adjusted accordingly. This wouldn’t change with the modifications you are suggesting.

          To calculate when the next ISR should occur, ideally I would add 20*delay_value to current timer and put this into the output compare register. However, the prologue and the epilogue take some time to execute, so this value needs to be adjusted downward to compensate for the time these two blocks of code take. The 30 in the epilogue was the number of cycles that I had to subtract from the next output compare value so that the next interrupt occurred at the correct time.

          Regards,

          Ag

  4. Hi Ag,

    I saw in the comments on the Hackaday post that you came up with a version that runs at 16MHz by using the addition two 8bit timers.

    Would you mind sharing the code for that?

    Many thanks,

    David

    1. Hi David,

      Here is the code that runs at 16 MHz using two 8-bit timers:

      ;;; tone_loop_20_d.S
      ;;;
      ;;;  Created: 6/5/2018 11:38:12 AM
      ;;;   Author: aprimatic
      ;;;
      ;;; Copyright 2018 APWizardry LLC
      ;;;
      ;;; Redistribution and use in source and binary forms, with or without
      ;;; modification, are permitted provided that the following conditions
      ;;; are met:
      ;;;
      ;;; 1. Redistributions of source code must retain the above copyright
      ;;; notice, this list of conditions and the following disclaimer.
      ;;;
      ;;; 2. Redistributions in binary form must reproduce the above
      ;;; copyright notice, this list of conditions and the following
      ;;; disclaimer in the documentation and/or other materials provided
      ;;; with the distribution.
      ;;;
      ;;; 3. Neither the name of the copyright holder nor the names of its
      ;;; contributors may be used to endorse or promote products derived
      ;;; from this software without specific prior written permission.
      ;;;
      ;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
      ;;; "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
      ;;; LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
      ;;; FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
      ;;; COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
      ;;; INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
      ;;; (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
      ;;; SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
      ;;; HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
      ;;; STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
      ;;; ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
      ;;; OF THE POSSIBILITY OF SUCH DAMAGE.
      
      
      ;;; Pin PortD PortB Note
      ;;;  2   PD0        A#7
      ;;;  3   PD1        A7
      ;;;  4   PD2        G#7
      ;;;  5   PD3        G7
      ;;;  6   PD4        F#7
      ;;; 11   PD5        F7
      ;;; 12   OC0A       C8
      ;;; 13   PD7        E7
      ;;; 14        PB0   D#7
      ;;; 15        PB1   D7
      ;;; 16        PB2   C#7
      ;;; 17        OC2A  B7
      
      #define __SFR_OFFSET 0
        
      #include 
      
      ;;; Registers that aren't used with immediate modes
      #define togD      r2
      #define togB      r3
      #define prev_togD r4
      #define prev_togB r5
      #define tcnt_l    r6
      #define tcnt_h    r7
      #define cnt       r8
      #define tc        r9
      #define c_zero    r10
      
      ;;; Registers that are used with immediate modes
      #define tmp       r16
      #define phase     r17
      #define dly_lsb   r18
      #define dly_msb   r19
      #define min_val   r20
      #define next_min  r21
      	
      ;;; Pointers to arrays
      #define p_cnt   0x100
      #define p_tc    0x110
      #define p_phase 0x120
      
      ;;; Pointer to 256-byte buffer
      #define p_buf   0x200
      
      .section .text
      .global TIMER1_COMPA_vect
      
      
      ;;; Timer 1 Output Compare Interrupt Service Routine
      ;;; Preserves: Flags
      ;;; Modifies:
      ;;; r0, r1, tcnt_l(r8), tcnt_h(r9), tmp(r16),
      ;;; dly_lsb(r18), dly_msb(r19), XL(r30)
      TIMER1_COMPA_vect:
      ;;; PUSH FLAGS
      	IN	r0,0x3f                 ; 1
      	PUSH	r0                      ; 2
      ;;; Read TCNT1
      	LDS	tmp,TCNT1L		; 2
      ;;; Compensate for 2-cycle instructions delaying interrupt for 1cc
      	LDS	r0,OCR1AL		; 2
      ;;; Subtract OCRA
      	SUB	tmp,r0			; 1
      ;;; Subtract elapsed time to enter ISR
      	SUBI	tmp,12			; 1
      ;;; If we were interrupted on a 2cc instruction, branch
      	BRNE	dly0			; 2
      
      ;;; We were interrupted on a 1cc instruction
      ;;; Add one extra NOP to equalize the paths
              NOP                             ; 1-1
              NOP                             ; 1
      					;---
      					; 11/12
      
      ;;; This is the 16cc loop if dly_val == 0
      dly0:	LD      r0,X+                   ; 2
      	OUT     PIND,r0                 ; 1
      	LD      r0,X+                   ; 2
      	OUT     PINB,r0                 ; 1
      	LD      dly_lsb,X+              ; 2
              INC     XL                      ; 1
              NOP                             ; 1
              NOP                             ; 1
              NOP                             ; 1
              NOP                             ; 1
              TST     dly_lsb                 ; 1
      	BREQ    dly0                    ; 2
      					;---
      					; 16
      ;;; This is the 32cc loop if dly_val == 1
              CPI     dly_lsb,1               ; 1-1
              BREQ    dly1                    ; 2
                                              ;---
                                              ; 2
      ; This is the 48cc loop if dly_val == 2
              CPI     dly_lsb,2               ; 1-1
              BREQ    dly2                    ; 2
                                              ;---
                                              ; 2
      ;;; Multiply delay by 16
              LDI     tmp,16                  ; 1-1
              MUL     dly_lsb,tmp             ; 2
              MOVW    dly_lsb,r0              ; 1
      ;;; Adjust delay
      	SUBI	dly_lsb,30		; 1
      	SBC	dly_msb,c_zero		; 1
      ;;; Get current timer value
      	LDS	tcnt_l,TCNT1L		; 2
      	LDS	tcnt_h,TCNT1H		; 2
      ;;; Add adjusted delay to current timer value
      	ADD	tcnt_l,dly_lsb		; 1
      	ADC	tcnt_h,dly_msb		; 1
      ;;; Set up next Output Compare
      	STS	OCR1AH,tcnt_h		; 2
      	STS	OCR1AL,tcnt_l		; 2
      
      	POP	r0			; 2
      	OUT	0x3f,r0			; 1
      	RETI				; 4
      					;---
      					; 22
      ; These extra cycles keep dly2 and dly3 on 16cc boundaries
      dly2:   NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
              NOP                             ; 1
              NOP                             ; 1
                                              ;---
                                              ; 14
      
      ;;; Fall through
      dly1:   NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
      	NOP                             ; 1
              NOP                             ; 1
              NOP                             ; 1
      	RJMP	dly0                    ; 2
                                              ;---
                                              ; 14
      
      .global tstart
      tstart: CLI
      	CLR     c_zero
      ;;; Set PORTD to all outputs
      	LDI	tmp,0xff
      	OUT	DDRD,tmp
      ;;; Set PORTB to outputs on bottom 4 bits
      	LDI	tmp,0x0f
      	OUT	DDRB,tmp
      	
      ;;; Initialize tc
      ;;; Initialize cnt
      	LDI     tmp,134		; A#7 on PD0
      	STS     p_tc+0,tmp
      	STS     p_cnt+0,tmp
      	LDI     tmp,142		; A7 on PD1
      	STS     p_tc+1,tmp
      	STS     p_cnt+1,tmp
      	LDI     tmp,150		; G#7 on PD2
      	STS     p_tc+2,tmp
      	STS     p_cnt+2,tmp
      	LDI     tmp,159		; G7 on PD3
      	STS     p_tc+3,tmp
      	STS     p_cnt+3,tmp
      	LDI     tmp,169		; F#7 on PD4
      	STS     p_tc+4,tmp
      	STS     p_cnt+4,tmp
      	LDI     tmp,179		; F7 on PD5
      	STS     p_tc+5,tmp
      	STS     p_cnt+5,tmp
      	LDI     tmp,189		; E7 on PD7
      	STS     p_tc+6,tmp
      	STS     p_cnt+6,tmp
      	LDI     tmp,201		; D#7 on PB0
      	STS     p_tc+7,tmp
      	STS     p_cnt+7,tmp
      	LDI     tmp,213		; D7 on PB1
      	STS     p_tc+8,tmp
      	STS     p_cnt+8,tmp
      	LDI     tmp,225		; C#7 on PB2
      	STS     p_tc+9,tmp
      	STS     p_cnt+9,tmp
      	
      ;;; Initialize phases
      	LDI     tmp,0
      	STS     p_phase+0,tmp
      	LDI     tmp,0
      	STS     p_phase+1,tmp
      	LDI     tmp,1
      	STS     p_phase+2,tmp
      	LDI     tmp,1
      	STS     p_phase+3,tmp
      	LDI     tmp,0
      	STS     p_phase+4,tmp
      	LDI     tmp,0
      	STS     p_phase+5,tmp
      	LDI     tmp,1
      	STS     p_phase+6,tmp
      	LDI     tmp,0
      	STS     p_phase+7,tmp
      	LDI     tmp,0
      	STS     p_phase+8,tmp
      	LDI     tmp,1
      	STS     p_phase+9,tmp
      
      ;;; Set up rptr register
      	LDI     XL,lo8(p_buf)
      	LDI     XH,hi8(p_buf)
      		
      ;;; Set up wptr regsiter
      	LDI     ZL,lo8(p_buf)
      	LDI     ZH,hi8(p_buf)
      		
      ;;; Set up initial min_val
      ; uint16_t   min_val = 134;
      	LDI     tmp,134
      	MOV     min_val,tmp
      
      ;;; Start with no toggle
      	CLR	prev_togB
      	CLR	prev_togD
      
      ;;; Set up Timer0 for C8 note generation
      ;;; Toggle OC0A on PD6
       	CLR	tmp
       	OUT	TCCR0B,tmp
       	OUT	TIFR0,tmp
       	STS	TIMSK0,tmp
      	
       	LDI	tmp,238		; C8
       	OUT	OCR0A,tmp
      
       	LDI	tmp,(3<
      

      Regards,

      Ag

  5. I am planning a String synth and your solution would be nice for my project, I know some Arduino basics but not idea how upload this Code to an Arduino uno. Could anyone explain It to me?

    1. Hi Iker,
      This wouldn’t really be the way to implement a String synth, since the output consists of 12 square waves (odd harmonics only), and string synths are typically based on sawtooth waves (even and odd harmonics). Plus, there is no attack/decay/sustain/release or detuning capabilities with this tone generator.
      The top octave tone generator is meant for things like electronic organs, where each key is tied directly to one of the note outputs of the tone generator or a divider for lower octaves.
      But if you want to upload this code into an Arduino uno, you need to create a new project in the Arduino IDE, copy the C code into the blank .ino file, and copy the .S code into another file in the same directory. Then you can build the project and upload it to an Uno. The result will be 12 output pins generating square waves at the proper frequencies for an equally-tempered chromatic scale.

      Regards,
      Ag Primatic

Leave a Reply to Ken White Cancel reply

Your email address will not be published. Required fields are marked *