Software Transactional Memory

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search
Translate this page to   

TexasInstrumentsLogo.png Wiki Note
Software Transactional Memory
Author(s):Jürgen Mathes Telecom Infrastructure Field Applications
SPRWIKI13958 November 2011
Abstract
This document describes how to use the Multicore Navigator on TMS320C66x (KeyStone) devices to support Software transactional memory. First the concept of transactional memory is explained. Then this concept is mapped onto the available HW resource available on the KeyStone architecture. A proof of concept example is given for the TMS320C6678 device, but the ideas are applicable to other KeyStone devices as well.

Contents


What is transactional memory?

Transactional memory is used maintain coherency between multiple masters accessing shared memory. Typically special load and store instructions are available to construct atomic accesses to shared memory. In fact the atomicity is "constructed" by checking coherency at store time. Software transactional memory is a term that is used when a runtime library provides transactional memory semantics with HW support from the underlying architecture. In case of the KeyStone architecture the Multicore Navigator is being used to support the transactional memory mechanism.

See also Wikipedia definition


How to use the Multicore Navigator

Let's first have a look at the overall architecture of the TMS320C6678. As you can see in Figure 1 below the Multicore Navigator is an independent instance consisting of two major sub-functions. One of them is the queue manager. You can also see the 8 individual cores and the 4MB of shared memory (marked with red circles). To implement Software transactional memory we're going to use the standard load and store instructions available on the c66x cores. Since those do not guarantee atomicity or coherence when data is accessed in shared memory by multiple cores we need flags in addition to help achieve that. Those flags can be implemented using the queue manager.

TMS320C6678 BD.JPG
Figure 1 - TMS320C6678 Functional Block Diagram


What is the Multicore Navigator?

The Multicore Navigator consists of multiple sub components (see Figure 2). We'll concentrate on the queue manager sub-system (QMSS). See Multicore Navigator User Guide for details.

The ‘Queue Manager’ is a hardware maintaining 8192 linked lists managing atomic access for all cores. There are a few special purpose queues and general purpose queues. We'll focus on the general purpose queues. Queues are used to hold pointers to packets. A packet is the logical grouping of a descriptor and the payload. Queuing of packets onto a packet queue is accomplished by writing a pointer to a descriptor into a specific address within the queue manager module. Packets may be queued either onto the head or the tail of the queue. All pushes are atomic. De-queuing (popping) is accomplished by reading the head packet pointer from the queue manager.

Descriptors are small memory areas that describe the packet of data to be transferred or just accessed. We will only use the information if a queue is empty or how many packets are in the queue.

QMSS.JPG
Figure 2 - TMS320C6678 Queue Manager Sub-System (QMSS)


How do we implement flags using the QMSS?

The flag is the information if the queue is empty or not. So pushing a packet onto the queue sets the flag atomically. Popping resets the flag. To mimic the load link, store link and commit link logic we need to implement two flags. The load link flag is a counting semaphore. The commit link a plain semaphore. The Commit flag ensures the atomic write-back. The Link flag manages the reloading of the date in case the Commit failed. The counting can be achieved by pushing multiple packets onto the queue. The QMSS HW will give back the packet count of a queue if the status is being checked.


Example pseudo code

One of the standard examples for a lock free programming is a global counter that is being updated locally on each core. There is a complete proof of concept code example running on the TMS320C6678 KeyStone Multicore DSP available in Appendix A. Here’s a pseudo code showing the mechanism:

while(counter < 100){
 
	// load link
	// use queue manager to flag read access for each core
	// get a local copy of the global counter to work on
	local_count = Load_link((unsigned int *)&counter);
	local_count++;
 
	// commit and store link
	// use queue manager to ensure atomic write back
	if(Commit_link()){
 
		counter = local_count;
		counted++;
 
		// push flag back to commit queue and signal re-loading of updated shared variable
		Restore_link();
	}
}

The kernel shown above runs on 7 cores. All of the cores will try to update the global counter. The load link, store link and commit link logic implemented ensures non-blocking atomic access to the shared variable. On core zero the 'administrative' tasks (e.g. set up MPAX) are per performed. This is just to keep things clean and easy. One could of course run the kernel on all 8 cores as well.

Details of the mechanism

There are actually two flags needed to implement the scheme (Per shared memory location. This could be a single variable or also a whole buffer). Each flag is implemented using a queue and a descriptor. The flag is set by pushing a descriptor to the queue. The flag is reset by popping a descriptor from the queue.

The Load_link function is implemented like this:

inline unsigned int Load_link(unsigned int *address, unsigned int num)
{
	unsigned int value;
 
	// if another core is in commit mark load as invalid
	if (Check_queue(num)) Pop_descriptor(num+1);
	// load shared variable
	value = (unsigned int)*address;
	return value;
}

During an update of the shared variable the queue monitoring the link status is popped. This means the load is being marked as invalid because the data is old.


The Commit_link function is implemented like this:

inline unsigned int Commit_link(unsigned int num, unsigned int desc)
{
	unsigned int value=0;
 
	if(Pop_descriptor(num)){ // pop flag from commit queue and check on load link
		if(!Check_queue(num+1)){
			// shared variable is valid
			value=1;
		}
		else {
			value=0;
			// need to restore Commit Flag because shared variable was old
			Push_descriptor(num,desc);
		}
	}
 
	return value;
}

The commit link needs to check on both queues. The first queue managing the atomic access and the second looking at the validity of the loaded data. In case there's already an update ongoing the the commit is not allowed. In case no update is ongoing we need to check that the loaded data is valid. In case the data is valid we return a 1 to indicate that. In case the data is old we return 0 and also need to set the commit flag back to 1.


The Restore link function is implemented like this:

inline Restore_link(unsigned int num, unsigned int desc)
{
	// restore Commit and Link flag
	Push_descriptor(num+1,desc+0x10);
	Push_descriptor(num,desc);
}

After the shared variable has been updated the commit and link flag need to be restored. This is done by pushing the respective descriptors to their queues.


Proof of concept code


Conclusions

Because of its non-blocking nature Software Transactional Memory enables lock free programming which will allow to concurrently update of shared data without the need for critical sections protected by operating system managed locks. Another area where lock free programming is used for is LIFO stacks and FIFO queues. This is natively supported through the queue manager.

Since lock free-programming provides some significant advantages to real-time systems like avoiding priority inversion and deadlocks this is an interesting domain especially in our multicore environment.


References

  1. TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor Data Manual
  2. KeyStone Architecture Multicore Navigator User Guide (SPRUGR9)
  3. KeyStone Architecture Multicore Shared Memory Controller User Guide (SPRUGW7)


See also

  1. Keystone Device Architecture on TI's Embedded Processors Wiki
  2. TMS320C6678 Lite Evaluation Module
  3. Multicore Mix Blog


Appendix A. Proof of concept example code

The Example Code can be downloaded from here:

E2e.jpg
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Software Transactional Memory here.

Hyperlink blue.png Links
ARM Microcontroller MCU ARM Processor Digital Media Processor Digital Signal Processing Microcontroller MCU Multi Core Processor
Ultra Low Power DSP 8 bit Microcontroller MCU 16 bit Microcontroller MCU 32 bit Microcontroller MCU

Leave a Comment
Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox