STM32F4: Generating parallel signals with the FSMC

STM32F4: Generating parallel signals with the FSMC

The goal: The memory controller can be used to generate a "generic" 16-bit parallel data stream with clock. Address generation will be disregarded, as well as other control signals dedicated to memory chips.

It must be noted that the STM32F40x and STM32F41x have the FSMC (static memories), while theSTM32F42x and STM32F43x have the FMC (static and dynamic memories). The differences between the two concern the support of SDRAM (dynamic RAM), address and data write FIFOs (both data and address, instead of data only for FSMC, and 16-word long instead of 2-word long only for FSMC), and the 32-bit wide data bus for FMC (See [1]).

Set pins (1st attempt)

Only data bus FSMC_D[15:0] and clock FSMC_CLK will be used (set as alternate function). The other pins are set as standard GPIOs (general purpose output).

FSMC is alternate function 12 according to the datasheet (See "Table 9. Alternate function mapping" in [2]).

/* PD: 0, 1, 3, 8, 9, 10, 14, 15 -> alternate function (0b10) */
GPIOD->MODER = 0xA56A559A;
GPIOD->AFR[0] = 0xCCCCCCCC; /* FSMC = AF12 (0xC) */
GPIOD->AFR[1] = 0xCCCCCCCC;
/* PE: 7, 8, 9, 10, 11, 12, 13, 14, 15 -> alternate function (0b10) */
GPIOE->MODER = 0xAAAA9555;
GPIOE->AFR[0] = 0xCCCCCCCC;
GPIOE->AFR[1] = 0xCCCCCCCC;

FSMC setup/init (1st attempt)

Be careful of the the wicked register map documentation of the FSMC block:

This is very misleading, since all other table are ordered as found in memory, but not here.

/* PSRAM, synchronous (burst), non-multiplexed */
/* control register */
FSMC_Bank1->BTCR[0] = FSMC_BCR1_CBURSTRW | FSMC_BCR1_WAITPOL | FSMC_BCR1_BURSTEN | FSMC_BCR1_MWID_0 | FSMC_BCR1_WREN | FSMC_BCR1_MTYP_0 /* PSRAM */ | FSMC_BCR1_MBKEN;
/* timing register */
FSMC_Bank1->BTCR[1] = FSMC_BTR1_CLKDIV_1 /* div 3 */ ;

It is noticable that the timing are all set to 0, except the clock.

Result (1st attempt)

The code writing to the FSMC is using an array and simulate a sequencial memory request, in order to take advantage of the burst mode.

volatile uint16_t* fsmc = (uint16_t*)0x60000000;

for(uint32_t i=0; i<(sizeof(bitstream_bin)/2); i++) {
        uint16_t w = ((uint16_t*)bitstream_bin)[i];
        fsmc[i] = w;
}

The clock is ~54MHz, but the maximum clock is HCLK/2 = 168/2=84MHz. Unfortunately, my oscilloscope is too slow for this.

At least, 4 clock cycles are required to write one data. Data latency (DATLAT lowest value is 2). There is one cyle to give the address, two cyle of latency, one cyle for give the data.

At max FSMC speed (~84MHz), after dividing the clock by 4, the 16-bit parallel transmission would only be ~20MHz.

Bursts are possible up to 32 bits (two 16-bit data words). When using this feature, two data words are send for each address, hence more data is sent, but the clock is hard to use: 3 ticks for the (empty) address, 1 tick for the first data, 1 tick for the second data (5 cycles for 2 data, ~30MHz max).

Set pins (2nd attempt)

/* PD: 0, 1, 8, 9, 10, 14, 15 -> alternate function (0b10) */
GPIOD->MODER = 0xA56A555A;
GPIOD->AFR[0] = 0xCCCCCCCC; /* FSMC = AF12 (0xC) */
GPIOD->AFR[1] = 0xCCCCCCCC;
/* PE: 7, 8, 9, 10, 11, 12, 13, 14, 15 -> alternate function (0b10) */
GPIOE->MODER = 0xAAAA9555;
GPIOE->AFR[0] = 0xCCCCCCCC;
GPIOE->AFR[1] = 0xCCCCCCCC;
/* PB: 7 -> AF */
GPIOB->MODER = 0x55551555;
GPIOB->AFR[0] = 0xCCCCCCCC;
GPIOB->AFR[1] = 0xCCCCCCCC;

FSMC setup/init (2nd attempt)

/* NOR flash, asynchronous, multiplexed */
/* control register */
FSMC_Bank1->BTCR[0] = FSMC_BCR1_WREN | FSMC_BCR1_FACCEN | FSMC_BCR1_MWID_0 /* 16-bit */ | FSMC_BCR1_MTYP_1 /* NOR flash */ | FSMC_BCR1_MUXEN | FSMC_BCR1_MBKEN;
/* timing register */
FSMC_Bank1->BTCR[1] = FSMC_BTR1_CLKDIV_0 | FSMC_BTR1_DATAST_0 | FSMC_BTR1_ADDHLD_0 | FSMC_BTR1_ADDSET_1;

Result (2nd attempt)

We want to use the NADV signal as a new clock CLK.

volatile uint16_t* fsmc = (uint16_t*)0x60000000;
uint16_t w[] = {
        0xFFFF, 0x0000, 0xFFFF, 0x0000,
        0xFFFF, 0x0000, 0xFFFF, 0x0000};

for(uint32_t i=0;i<8;i++) {
        fsmc[0] = w[i];
}

We write to the same address in order to force a new memory transaction and cycle NADV.

The problem is that the data bus is updated after the positive edge of the NADV "clock". This issue can be overcome by multiplexing the address and data bus and put the data value as address. The ADDSET value is also increased in order to have a more balanced clock (ADDSET=3).

for(uint32_t i=0;i<8;i++) {
        uint16_t v = w[i];
        fsmc[v] = v;
}

Unfortately, the overall clock speed decreased because the address "trick".

Conclusion

A "nice looking" 16-bit parallel signal with clock can be generated at approx. 16MHz using the memory controller (FSMC) in asynchronous NOR Flash mode. 20MHz can be achieved with an external clock divider (div 4) in synchronous PSRAM mode. If the clock edge can be aligned with the data edge, 27MHz is possible from SRAM.

Note: the FMC (Flexible Memory Controller, also supporting SDRAM) in SDRAM mode can generate a synchronous burst of one data per clock. In this case, 84MHz is possible in theory. I haven‘t the hardware to test it.

时间: 2024-08-29 01:19:41

STM32F4: Generating parallel signals with the FSMC的相关文章

Flexible implementation of a system management mode (SMM) in a processor

A?system?management?mode?(SMM) of operating a processor includes only a basic set of hardwired hooks or mechanisms in the processor for supporting SMM. Most of SMM functionality, such as the processing actions performed when entering and exiting SMM,

STM32F4 SPI with DMA

STM32F4 SPI with DMA A few people have requested code, so I thought I’d post the code showing how I’ve configured my GPIO, timer, SPI, DMA and NVIC modules, along with some explanation of how the system works.Note that I’m using the STM32F4 Standard

Method and apparatus for an atomic operation in a parallel computing environment

A method and apparatus for a?atomic?operation?is described. A method comprises receiving a first program unit in a parallel computing environment, the first program unit including a memory update?operation?to be performed atomically, the memory updat

One-wire Demo on the STM32F4 Discovery Board

One-wire Demo on the STM32F4 Discovery Board Some of the devs at work were struggling to get their software talking to a Dallas 1-wire device.  I remember doing 1-wire comms back in the 1990s, but I hadn't done any 1-wire lately and all of my old cod

STM32F4 External event -- WFE 待机模式

The STM32F4xx are able to handle external or internal events in order to wake up the core (WFE). The wakeup event can be generated either by: (I've removed normal external interrupt mode details) or configuring an external or internal EXTI line in ev

Parallel file system processing

A treewalk for splitting a file directory is disclosed for parallel execution of work items over a filesystem. The given work item is assigned to a worker. Thereafter, a request is sent to split the file directory to share a portion of the file direc

System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO

A data processor supports the use of multiple memory models by computer programs. At a device external to a data processor, such as a memory controller, memory transactions requests are received from the data processor. Each memory transaction reques

Generate stabilized PWM signals

A standard technique for generating analog voltages using µCs is to use a PWM output and filter the signal with a simple RC filter (Figure 1). The voltage of the PWM signal is directly proportional to the µC's supply voltage, so it is not necessarily

大量数据快速插入方法探究[nologging+parallel+append]

大量数据快速插入方法探究 快速插入千万级别的数据,无非就是nologging+parallel+append. 1     环境搭建 构建一个千万级别的源表,向一个空表insert操作. 参考指标:insert动作完成的实际时间. SQL> drop table test_emp cascadeconstraints purge; Table dropped. SQL> create table test_emp as select *from emp; Table created. SQL&