Using the Digital Discovery to look at Zynq boot sequence

Introduction

When developing new FPGA boards, it's important to know the specifications of the hardware on the board and see the timing of signals. The Digital Discovery provides a High Speed Logic Analyzer that allows you to visualize and analyze the signals traveling through the board. In the process of developing a new Zynq Board, the speed of the QSPI transactions in the boot sequence wasn't an evident specification. This project uses the Digital Discovery to visualize the boot sequence to determine timing.

Inventory

  • Digital Discovery
  • Zynq board with flash
    • Note: This document was written using a Zybo Z7 of a revision earlier than D.0.
  • SOIC clip if available
  • Wires

Figure 1. SOIC clip.

When deciding how to tackle this problem, there were two smaller Instrumentation devices that have a Logic Analyzer, the Analog Discovery 2 and Digital Discovery. There were two reasons for using the Digital Discovery instead of the Analog Discovery 2. The first reason was that the QSPI transactions can take place at much higher clock speeds, over 100 MHz, so having an adequate sample rate is very important. The other reason was that because of the 512 MB DDR memory that the Digital Discovery has, it can perform very large acquisitions.

Step 1: Connecting the Digital Discovery

The following connections are required:

QSPI signal QSPI/clip pin Digital Discovery pin
cs 7 DIO0
clk 16 DIO1
d0 15 DIO2
d1 8 DIO3
d2 9 DIO4
d3 1 DIO5
gnd 10 gnd

Figure 2. Connections

Make sure to check for signal integrity/cross talk when using cables like this. In some cases, twisting a signal with a GND wire will be needed (in this case it was the blue cs wire).

Step 2: QSPI script

A custom interpreter is used which will translate the QSPI signals into data. This is activated by adding a “Custom” channel from the Logic instrument in WaveForms. Below is the js code which interprets the QSPI signals.

// rgData: input, raw digital sample array
// rgValue: output, decoded data array
// rgFlag: output, decoded flag array

var c = rgData.length // c = number of raw samples
var pClock = false; // previous cock signal level
var iStart = 0;     // used to keep track on word start index
var cByte = 0;      // byte count per transmission
var cBits = 0;      // bit counter
var bValue = 0;     // value variable
var fCmd = true;

for(var i = 0; i < c; i++){ // for each sample
    var s = rgData[i]; // current sample
    var fSelect = 1&(s>>0); // pin0 is the select signal
    var fClock = 1&(s>>1); // pin1 is the clock signal
    var fData = 1&(s>>2); // pin2 is the data signal
    var fData4 = 0xF&(s>>2); // DIN 2-5 DQ 0-3
    
    if(fSelect != 0){ // select active low
        // while select inactive reset our counters/variables
        iStart = i+1; // select might become active with next sample
        cByte = 0;
        cBits = 0;
        bValue = 0;
        pClock = false;
        fCmd = true;
        continue;
    }
    if(pClock == 0 && fClock != 0){ // sample on clock rising edge
		
            bValue <<= 4; // serial data bit, MSBit first
            bValue |= fData4;
            
            cBits++;
            if(cBits==2){ // when got the 8th bit of the word store it
                cByte++;
                // store rgValue/Flag from word start index to current sample position
                for(var j = iStart; j < i; j++){
                    // Flag change will be visible on plot even when data remains constant.
                    // This is useful in case we get more consecutive equal values.
                    rgFlag[j] = cByte;
                    rgValue[j] = bValue;
                }
                iStart = i+1; // next word might start after this sample
                cBits = 0;  // reset bit count for the next byte
                bValue = 0; // reset value variable
            }
        
    }
    pClock = fClock; // previous clock level
}

In parallel with this interpreter, we can also use a standard SPI in order to see instructions which are not sent via QSPI, for example the first read instruction.

Step 3: Trigger and acquisition

Although the maximum QSPI clock frequency is about 100 MHz, when booting, a maximum frequency of 25 MHz is used. Also, the entire boot transfer takes about 700 ms. Because of this, both a large number of samples and a decent sample rate are needed, and this is where the Digital Discovery comes in handy. 268 million samples at 200 MHz would translate into a ~1.3 second frame.

The acquisition itself is quite demanding, using a lot of the PC's memory (16 GB) and it also takes a long time to process the data.

The trigger is set on the falling edge of the CS signal.

Below is the entire QSPI transaction captured by Waveforms.

Figure 3. Full transaction.

Notice the short pause near the left end of the acquisition, that is where the clock frequency changes from 5.4 MHz to 25 MHz.

Step 4: Boot transfers

There are two documents that need to be read in order to understand what the data transfers represent. One is the Zynq TRM and the other one is the flash memory's datasheet.

The instructions sent from the Zynq to the flash memory are always sent via SPI using D0. The first instruction sent is 0x03 0x00 0x00 0x20 which means SPI READ from address 0x20 and the reply is also received via SPI using D1, 0x66 0x55 0x99 0xaa. The flash read instruction is explained on page 85 of the datasheet.

Figure 4. First transfer.

In the Zynq TRM pages 170 and 179 explain what that reply means. In short, that set of bytes tell the Zynq that the memory is QSPI capable. It is also important to observe that, at this point, the SPI clock frequency is 5.405 MHz, which is a relatively low speed.

From this point on, since it has been determined that the memory supports QSPI, all transactions will be done on all 4 data lines. For instance, the next instruction will be 0x6b followed by a 3 byte address. 0x6b represents a quad read instruction and the response will be seen on the QSPI interpreter after 8 clock periods, which are “dummy” bytes.

Figure 5. Second transfer.

In this case, the address is 0x1d and 7 bytes are read. These bytes are from addresses 0x1d, 0x1e, 0x1f which are part of an interrupt table and then it reads 4 bytes from address 0x20 which are the same bytes read at the first SPI read.

The Zynq will proceed to read bytes, incrementing the address until it reaches 0x45, which is the end of the bootROM header.

Unfortunately, because we do not have access to the BootROM code, the rest of the boot sequence is not so transparent. At some point, the FSBL (first stage boot loader) will begin to run, most likely where the SPI clock frequency changes to 25 MHz as seen below, 84 ms after the boot process started.

Figure 6. FSBL start.

The FSBL will then read the boot image and analyze the different partitions that it contains, including the .bit file, which will configure the Zynq's PL, and the .elf which will run in the ARM.

More details on the boot image and boot process can be found in this user guide.