Many of the Hercules controllers have DMA support. It's a key mechanism to get high throughput with minimal processor load.
However, to play well with your memory and cache, you have to configure the RAM space for the DMA touched memory in a correct way. Otherwise, your DMA data will reside in cache in stead of memory, and you don't want that.
The easy way is to disable cache (in HALCoGen, on the R5-MPU-PMU
tab, or in your code via a_cacheDisable_()
call).
This takes care that DMA generated data is always available in RAM. But the performance hit for such a naive workaround is high. And it's unneccessary. We can leave cache on, and reserve a part of the memory with correct cache settings for DMA.
This blog is inspired by an issue I had with the example_sci_dma
project from HALCoGen.
I tried out that example and it wasn't working for me. I created a post on e2e for that, and we came to a resolution that I'm documenting below.
This is not something I've invented. The HALCoGen example_mibspiDma
explains how to do this. But if you happen to try the sci example before the mibspi one, this post can be useful for you.
What's happening in the project if you follow HALCoGen's Help instructions?
The variables to hold SCI transmit and receive data are in the RAM area. If you look in HALCoGen, this area (Region 3) is configured as write-back. That means it's cached and DMA actions aren't guaranteed to be reflected in the RAM.
The linker allows us to create specific memory areas with specific cache settings. And HALCoGen supports that. We'll reserve a specific area of the RAM with write-through in stead of write-back setting. Our DMA-relevant variables will be placed in that area, and we're good.
There's 3 things to do:
- reserve a RAM region with the right cache config
- change your linker command file to give the linker instructions to manage that region
- tell your code to place the DMA relevant variables in that area
Let's start...
Reserve a RAM region with the Right Cache SettingsWhen you look at the code of HALCoGen example_sci_dma
(go to HALCoGen help -> Examples -> example_mibspiDma.c
), you can see that there are two variables relevant for DMA:
uint8_t TX_DATA[size] = {0};
uint8_t RX_DATA[size] = {0};
These two 100-byte variables together consume 200 bytes. We could do with a memory region of 200 bytes. That would just fit the two variables we're taking in consideration here. (take care to adjust for memory start address alignment if you make your region an exact fit).
Because I have further plans with this code (and because the Hercules example_mibspiDMA
does it the same way), I'm reserving 4 KB of mem for DMA related variables.
In HALCoGen, navigate to the R2-MPU-PMU tab, and activate region 15.
The higher regions have precedence over the lower ones. Generic RAM is in Region 3, and we leave it as-is. We snoop a little piece of that region away in Region 15, and alter the cache settings.
Enable Region 15, and reserve the last 4 KB of RAM for the DMA variables. Region 3 ends on 0x0807FFFF
.
To fully understand how Region 3 is used, have a look at your linker command file generated by HALCoGen. Both Stacks and RAM area allocated there.
If we reserve 4 KB (0x1000
) at the very end of Region 3, our Region 15 starts at 0x0807F000
.
Save your HALCoGen project, and re-generate the code.
Alter the Linker command fileWe'll have to reduce the RAM area (we're stealing 4 KB away from it) and create a new specific area that we'll call SHAREDRAM
.
The original RAM definition (for the TMS570LC43xx) is:
RAM (RW) : origin=0x08001500 length=0x0007EB00
We'll take away 4 K (hex: 0x1000
) :0x0007EB00 - 0x1000 = 0x0007DB00
.
That results in a RAM definition as:
RAM (RW) : origin=0x08001500 length=0x0007DB00
We'll now define the SHAREDRAM
region and place it just after the RAM region, at the area that we've just taken away:
SHAREDRAM (RW) : origin=0x0807F000 length=0x00001000
That's sufficient info for the linker to reserve the space, and the HALCoGen settings take care that this space has write-trough settings.
Now, we have to define and name a memory section, so that we can configure our source code to place specific variables in the new area. We do that in the SECTIONS clause of the Linker command file:
/* USER CODE BEGIN (6) */
.sharedRAM : {} > SHAREDRAM
/* USER CODE END */
That's it for the linker file. Now over to the code...
Tell our code to place the DMA variables in the correct memory areaThere's a pragma for that: SET_DATA_SECTION()
.
All variables declared after this pragma will be placed in the memory section passed as parameter.
And that's the behavior we need.
// all data before the first SET_DATA_SECTION pragma will reside in the default location for that type of data
#pragma SET_DATA_SECTION(".sharedRAM")
// everything declared from here on will reside in memory region SHAREDRAM
uint8_t TX_DATA[size] = {0};
uint8_t RX_DATA[size] = {0};
#pragma SET_DATA_SECTION()
// all data after a SET_DATA_SECTION pragma without parameters will reside in the default location for that type of data
And from this moment on, your program will work. For the TMS570LC43x LaunchPad fanboys, here's the source for the linker file and source file:
HL_sys_link.cmd
:
/*----------------------------------------------------------------------------*/
/* sys_link.cmd */
/* */
/*----------------------------------------------------------------------------*/
/* USER CODE BEGIN (0) */
/* USER CODE END */
/*----------------------------------------------------------------------------*/
/* Linker Settings */
--retain="*(.intvecs)"
/* USER CODE BEGIN (1) */
/* USER CODE END */
/*----------------------------------------------------------------------------*/
/* Memory Map */
MEMORY
{
/* USER CODE BEGIN (2) */
#if 0
/* USER CODE END */
VECTORS (X) : origin=0x00000000 length=0x00000020
FLASH0 (RX) : origin=0x00000020 length=0x001FFFE0
FLASH1 (RX) : origin=0x00200000 length=0x00200000
STACKS (RW) : origin=0x08000000 length=0x00001500
RAM (RW) : origin=0x08001500 length=0x0007EB00
/* USER CODE BEGIN (3) */
#endif
VECTORS (X) : origin=0x00000000 length=0x00000020 vfill = 0xffffffff
FLASH0 (RX) : origin=0x00000020 length=0x001FFFE0 vfill = 0xffffffff
FLASH1 (RX) : origin=0x00200000 length=0x00200000 vfill = 0xffffffff
STACKS (RW) : origin=0x08000000 length=0x00001500
RAM (RW) : origin=0x08001500 length=0x0007DB00
SHAREDRAM (RW) : origin=0x0807F000 length=0x00001000
ECC_VEC (R) : origin=0xf0400000 length=0x4 ECC={ input_range=VECTORS }
ECC_FLA0 (R) : origin=0xf0400000 + 0x4 length=0x3FFFC ECC={ input_range=FLASH0 }
ECC_FLA1 (R) : origin=0xf0440000 length=0x40000 ECC={ input_range=FLASH1 }
/* USER CODE END */
}
/* USER CODE BEGIN (4) */
ECC
{
algo_name : address_mask = 0xfffffff8
hamming_mask = R4
parity_mask = 0x0c
mirroring = F021
}
/* USER CODE END */
/*----------------------------------------------------------------------------*/
/* Section Configuration */
SECTIONS
{
/* USER CODE BEGIN (5) */
/* USER CODE END */
.intvecs : {} > VECTORS
.text align(8) : {} > FLASH0 | FLASH1
.const align(8) : {} > FLASH0 | FLASH1
.cinit align(8) : {} > FLASH0 | FLASH1
.pinit align(8) : {} > FLASH0 | FLASH1
.bss : {} > RAM
.data : {} > RAM
.sysmem : {} > RAM
/* USER CODE BEGIN (6) */
.sharedRAM : {} > SHAREDRAM
/* USER CODE END */
}
/* USER CODE BEGIN (7) */
/* USER CODE END */
/*----------------------------------------------------------------------------*/
/* Misc */
/* USER CODE BEGIN (8) */
/* USER CODE END */
/*----------------------------------------------------------------------------*/
HL_sys_main.c
:
/** @file HL_sys_main.c
* @brief Application main file
* @date 28.Aug.2015
* @version 04.05.01
...*/
/*
* Copyright (C) 2009-2015 Texas Instruments Incorporated - www.ti.com
...*
*/
/* USER CODE BEGIN (0) */
/* USER CODE END */
/* Include Files */
#include "HL_sys_common.h"
/* USER CODE BEGIN (1) */
#include "HL_sys_dma.h"
#include "HL_sci.h"
//#include "stdio.h"
/* USER CODE END */
/** @fn void main(void)
* @brief Application main function
* @note This function is empty by default.
*
* This function is called after startup.
* The user can use this function to implement the application.
*/
/* USER CODE BEGIN (2) */
#define size 100
/* External connection (SCI3 TX -> SCI4 RX) is needed in case LOOPBACKMODE is defined as 0 */
#define LOOPBACKMODE 1
/* Tx and Rx data buffer */
/* edit jc 20160102: put these variables in write trough memory, so that DMA actions are written from cache to memory */
#pragma SET_DATA_SECTION(".sharedRAM")
uint8_t TX_DATA[size] = {0};
uint8_t RX_DATA[size] = {0};
#pragma SET_DATA_SECTION()
/* Addresses of SCI 8-bit TX/Rx data */
#if ((__little_endian__ == 1) || (__LITTLE_ENDIAN__ == 1))
#define SCI3_TX_ADDR ((uint32_t)(&(sciREG3->TD)))
#define SCI3_RX_ADDR ((uint32_t)(&(sciREG3->RD)))
#define SCI4_TX_ADDR ((uint32_t)(&(sciREG4->TD)))
#define SCI4_RX_ADDR ((uint32_t)(&(sciREG4->RD)))
#else
#define SCI3_TX_ADDR ((uint32_t)(&(sciREG3->TD)) + 3)
#define SCI3_RX_ADDR ((uint32_t)(&(sciREG3->RD)) + 3)
#define SCI4_TX_ADDR ((uint32_t)(&(sciREG4->TD)) + 3)
#define SCI4_RX_ADDR ((uint32_t)(&(sciREG4->RD)) + 3)
#endif
#define DMA_SCI3_TX DMA_REQ31
#define DMA_SCI3_RX DMA_REQ30
#define DMA_SCI4_TX DMA_REQ43
#define DMA_SCI4_RX DMA_REQ42
#define SCI_SET_TX_DMA (1<<16)
#define SCI_SET_RX_DMA (1<<17)
#define SCI_SET_RX_DMA_ALL (1<<18)
/* USER CODE END */
void
{
/* USER CODE BEGIN (3) */
uint32 sciTxData, sciRxData;
int
g_dmaCTRL g_dmaCTRLPKT1, g_dmaCTRLPKT2;
/*Load source data*/
for
{
TX_DATA[i] = i;
}
/*Initialize SCI*/
sciInit();
#if LOOPBACKMODE == 1
/* Enable SCI loopback */
sciEnableLoopback(sciREG3, Digital_Lbk);
while
{
} /* Wait */
/*Assign DMA request SCI3 transmit to Channel 0*/
dmaReqAssign(DMA_CH0, DMA_SCI3_TX);
/*Assign DMA request SCI3 receive to Channel 1*/
dmaReqAssign(DMA_CH1, DMA_SCI3_RX);
sciTxData = SCI3_TX_ADDR;
sciRxData = SCI3_RX_ADDR;
#else
while
{
} /* Wait */
/*Assign DMA request SCI3 transmit to Channel 0*/
dmaReqAssign(DMA_CH0, DMA_SCI3_TX);
/*Assign DMA request SCI4 receive to Channel 1*/
dmaReqAssign(DMA_CH1, DMA_SCI4_RX);
sciTxData = SCI3_TX_ADDR;
sciRxData = SCI4_RX_ADDR;
#endif
/*Configure control packet for Channel 0*/
g_dmaCTRLPKT1.SADD = (uint32_t)TX_DATA; /* source address */
g_dmaCTRLPKT1.DADD = sciTxData; /* destination address */
g_dmaCTRLPKT1.CHCTRL = 0; /* channel control */
g_dmaCTRLPKT1.FRCNT = size; /* frame count */
g_dmaCTRLPKT1.ELCNT = 1; /* element count */
g_dmaCTRLPKT1.ELDOFFSET = 0; /* element destination offset */
g_dmaCTRLPKT1.ELSOFFSET = 0; /* element destination offset */
g_dmaCTRLPKT1.FRDOFFSET = 0; /* frame destination offset */
g_dmaCTRLPKT1.FRSOFFSET = 0; /* frame destination offset */
g_dmaCTRLPKT1.PORTASGN = PORTA_READ_PORTB_WRITE;
g_dmaCTRLPKT1.RDSIZE = ACCESS_8_BIT; /* read size */
g_dmaCTRLPKT1.WRSIZE = ACCESS_8_BIT; /* write size */
g_dmaCTRLPKT1.TTYPE = FRAME_TRANSFER; /* transfer type */
g_dmaCTRLPKT1.ADDMODERD = ADDR_INC1; /* address mode read */
g_dmaCTRLPKT1.ADDMODEWR = ADDR_FIXED; /* address mode write */
g_dmaCTRLPKT1.AUTOINIT = AUTOINIT_OFF; /* autoinit */
/*Configure control packet for Channel 1*/
g_dmaCTRLPKT2.SADD = sciRxData; /* source address */
g_dmaCTRLPKT2.DADD = (uint32_t)RX_DATA; /* destination addr ss */
g_dmaCTRLPKT2.CHCTRL = 0; /* channel control */
g_dmaCTRLPKT2.FRCNT = size; /* frame count */
g_dmaCTRLPKT2.ELCNT = 1; /* element count */
g_dmaCTRLPKT2.ELDOFFSET = 0; /* element destination offset */
g_dmaCTRLPKT2.ELSOFFSET = 0; /* element destination offset */
g_dmaCTRLPKT2.FRDOFFSET = 0; /* frame destination offset */
g_dmaCTRLPKT2.FRSOFFSET = 0; /* frame destination offset */
g_dmaCTRLPKT2.PORTASGN = PORTB_READ_PORTA_WRITE;
g_dmaCTRLPKT2.RDSIZE = ACCESS_8_BIT; /* read size */
g_dmaCTRLPKT2.WRSIZE = ACCESS_8_BIT; /* write size */
g_dmaCTRLPKT2.TTYPE = FRAME_TRANSFER; /* transfer type */
g_dmaCTRLPKT2.ADDMODERD = ADDR_FIXED; /* address mode read */
g_dmaCTRLPKT2.ADDMODEWR = ADDR_INC1; /* address mode write */
g_dmaCTRLPKT2.AUTOINIT = AUTOINIT_OFF; /* autoinit */
/*Set control packet for channel 0 and 1*/
dmaSetCtrlPacket(DMA_CH0, g_dmaCTRLPKT1);
dmaSetCtrlPacket(DMA_CH1, g_dmaCTRLPKT2);
/*Set dma channel 0 and 1 to trigger on hardware request*/
dmaSetChEnable(DMA_CH0, DMA_HW);
dmaSetChEnable(DMA_CH1, DMA_HW);
/*Enable DMA*/
dmaEnable();
#if LOOPBACKMODE == 1
/*Enable SCI3 Transmit and Receive DMA Request*/
sciREG3->SETINT |= SCI_SET_TX_DMA | SCI_SET_RX_DMA | SCI_SET_RX_DMA_ALL;
#else
/*Enable SCI3 Transmit and SCI4 Receive DMA Request*/
sciREG3->SETINT |= SCI_SET_TX_DMA;
sciREG4->SETINT |= SCI_SET_RX_DMA | SCI_SET_RX_DMA_ALL;
#endif
while(dmaGetInterruptStatus(DMA_CH1, BTC) != TRUE);
for(i=0; i<size; i++)
{
if(RX_DATA[i] != TX_DATA[i])
{
break;
}
}
if(i<size)
{
while(1); // FAIL: if your program loops here, the arrays are not identical
}
else
{
while(1); // SUCCESS: if your program loops here, the arrays are identical
}
/* USER CODE END */
}
/* USER CODE BEGIN (4) */
/* USER CODE END */
You can verify that the variables are in the expected memory area by looking at the expressions view:
For the hardcore memory map affectionados, here's some details of the .map
file:
MEMORY CONFIGURATION
name origin length used unused attr fill
---------------------- -------- --------- -------- -------- ---- --------
...
STACKS 08000000 00001500 00000000 00001500 RW
RAM 08001500 0007db00 00000060 0007daa0 RW
SHAREDRAM 0807f000 00001000 000000c8 00000f38 RW
SEGMENT ALLOCATION MAP
run origin load origin length init length attrs members
---------- ----------- ---------- ----------- ----- -------
0807f000 0807f000 000000c8 00000000 rw-
0807f000 0807f000 000000c8 00000000 rw- .sharedRAM
...
SECTION ALLOCATION MAP
output attributes/
section page origin length input sections
-------- ---- ---------- ---------- ----------------
.sharedRAM
* 0 0807f000 000000c8 UNINITIALIZED
0807f000 000000c8 HL_sys_main.obj (.sharedRAM)
...
LINKER GENERATED COPY TABLES
__TI_cinit_table @ 00004f60 records: 3, size/record: 8, table size: 24
.data: load addr=00004f28, load size=00000011 bytes, run addr=08001550, run size=00000010 bytes, compression=rle
.sharedRAM: load addr=00004f48, load size=00000009 bytes, run addr=0807f000, run size=000000c8 bytes, compression=rle
.bss: load addr=00004f58, load size=00000008 bytes, run addr=08001500, run size=00000050 bytes, compression=zero_init
...
GLOBAL SYMBOLS: SORTED ALPHABETICALLY BY Name
address name
------- ----
0807f064 RX_DATA
0807f000 TX_DATA
...
GLOBAL SYMBOLS: SORTED BY Symbol Address
address name
------- ----
0807f000 TX_DATA
0807f064 RX_DATA
The project for TMS570LC43x LaunchPad LX2 is attached:
Comments