NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
Power Efficient System using OMAP35x
THIS PAGE IS BEING UPDATED. CONTENTS ARE NOT YET COMPLETE.
- 1 Introduction
- 2 Hardware feature for Power Management
- 3 Programmer's View
- 4 Power Management Frameworks in Linux
- 5 Towards Power-efficient System
- 6 References
Power consumption - crudely measured as battery life - is a critical design goal for any portable system. The OMAP35x devices include power management techniques that enable them to deliver high performance while consuming much less power traditionally associated at these performance levels.
Unless these techniques are complimented with an equally power-efficient software; the benefits won't be visible.
In this article, we begin with a brief look at hardware features related to the power management available in the OMAP35x devices. This is followed by a quick introduction to the power management frameworks available in the Linux kernels; and how they have been adapted for the OMAP35x devices.
To leverage the power savings built into the kernel, applications need to be equally disciplined to ensure that savings are actually realized. Generic guidelines that would help applications in achieving the desired savings are also discussed here.
NOTE: Most of the discussion applies to any flavor of Linux on the OMAP35x. Since the Linux PSP package v1.0.x from TI differs from the community Linux, some parts may not apply as-is. The contents will be updated (shortly) to bring out specific differences and provide information relevant to the community Linux.
Hardware feature for Power Management
Once the system is powered-on, power is consumed while the device is busy with useful data processing as well as non-useful idle loops. The OMAP35x devices include specific features to optimize the power consumption in both these scenarios.
Active Power Management
- Dynamic Voltage and Frequency Scaling
- Adaptive Voltage Scaling
- Dynamic Power Switching
Idle Power Management
- Static Leakage Management
A Clock Domain refers to groups of modules that are fed same 'gated' clock signal. If all modules in the clock are inactive, the clock signal can be cut to lower the power consumption.
HW Supervised Control
The transition of clock domains between ON and INACTIVE states can be effected through software as well as HW supervision. The HW supervision eliminates software overheads and further increase power savings by much fine grain control of the clocks - not possible in software only implementation - specifically during active processing.
OMAP35x device provides mechanism for gating the clocks are various levels - from device, various levels in the clock tree upto the full clock domain.
At device level, there are two types of clocks - functional and interface. The functional clock is required for internal working of the device. The interface clock is required for interfacing with the OCP bus. This allows the OCP bus to be turned off leaving the device functional.
A Power Domain refers to a section of device that is controlled a single power switch. These sections have independent power rails.
A power domain can contain one or more clock domains.
A *Voltage Domain* refers to a group of modules that draw power from same voltage regulator.
A voltage domain can contain one or more power domains. All modules in a voltage domain gets same voltage.
Voltage regulator can either be SMPS (Switched Mode Power Supply) or LDO (Low Dropout Regulator)
Scalable Voltage Domains
The voltage domains VDD1 and VDD2 can be independently scaled. There are 5 pre-defined OPPs (operating performance points) for VDD1 and 3 OPPs for VDD2.
Dynamic Voltage and Frequency Scaling frameworks can choose beteen these OPPs based on the expected performance/power requirements of the active task.
Auto Voltage Scaling (AVS)
The variations in the silicon manufacturing processes are responsible for the difference in performance and/or power consumption of the device. AVS, implemented with SmartReflex(TM) technology, tends to narrow this difference.
Dedicated SmartReflex hardware implements a feedback loop - without processor intervention - which optimizes the voltage levels to account for differences in the manufacturing process, temperature and silicon degradation.
Power Management Frameworks in Linux
As name suggests, the cpuidle is excecuted when the Linux kernel enters the idle processing.
This framework allows the system to transition between the different idle states (C-states). Each idle state is characterized by:
- Power consumed while system is "in" the state
- Latency to enter and exit the state
- Amount of time the system is expected to "be" in the state
|C0||MPU WFI + Core active + No tick suppression|
|C1||MPU WFI + Core active + Tick suppression|
|C2||MPU CSWR + Core active + Tick suppression|
|C3||MPU CSWR + Core CSWR + Tick suppression|
|C4||MPU Off + Core active + Tick suppression|
|C5||MPU Off + Core RET + Tick suppression|
|C6||MPU Off + Core Off + Tick suppression|
- As the system transitions to a deeper states more power is saved; but at expense of higher system latency.
- Amount of time, the system is expected to preserve the state, also increases with the deeper C-states. This is necessary to justify the cost associated with the entry into and exit from the state.
The cpuidle framework consists of:
- A governor that decides the target C-state of the system
- A driver that actually transitions the system to the target C-state specified by the governor.
To set a constraint on the interrupt latency:
# echo -n <state> > /sys/power/cpuidle_deepest_state
If the system can perform necessary tasks at lower voltage, the power savings are quite evident from this equation:
P = V2 / R
The DVFS has been implemented using the cpufreq framework. This framework allows system to transition between discrete frequencies (P-states) depending upon the active load.
Each P-state is characterized by:
- CPU frequency
- Voltage required by the CPU.
The pre-defined VDD1 OPPs for OMAP35x have been chosen as the P-states.
|OPP||ARM Frequency (MHz)||DSP Frequency (MHz)||VDD1 (Volts)|
The cpufreq framework consists of:
- A governor that decides the target P-state of the system
- A driver that actually transitions the system to the target P-state specified by the governor.
The ondemand governor is used to choose the P-states based on the active processing load.
To add a constraint on the VDD1 OPP:
# echo -n <opp> > /sys/power/vdd1_opp_value
Constraint Framework in PSP Linux package
While the overall intent on a power efficient system is to aggressively conserve power, it cannot be done at cost of the end-user experience.
A running system comprises multiple threads of execution - each with its own requirements on performance and acceptable latency. An arbitration mechanism is required to ensure that system is performing in a 'state' that is acceptable to all running processes.
- The system should not transition to deeper C-states while a large file is being transmitted over USB.
- The system should not transition to lower P-states during a video playback.
The Constraint Framework is a mechanism for the device drivers and applications to specify their requirements in terms of acceptable interrupt latency and frequency. These constraints are one of the inputs provided to the respective governors when the decision on the next C-state/ P-state is made.
Towards Power-efficient System
- Acquire and release clocks based upon the requirement: clk_get(), clk_put()
- Control clocks based on the activity to allow frequent ‘drops’ into the idle state e.g.
- Enable clock only when there is a pending interrupt. Else keep disabled.
- Disable clock after a period of inactivity.
- Express constraints, if any, so that system doesn’t enter a power state not supported by the driver.
- Implement functions: suspend(), resume() and register with the LDM.
- Implement context save and restore mechanism.
- Specify the constraints. Both interrupt latency and OPP.
- Avoid polling. Use events to initiate actions.
- Polling unnecessary wakes the CPU from an idle state - consuming power - but does not result in useful work, in most iterations.
- Whenever the accuracy of timers isn't critical, deferrable timers can be used.
- Open a device only when necessary. Otherwise, kernel will assume it is in use. In extreme case, when all other devices in a power domain are inactive, the open() would prevent domain to transition to a power saving state.
- Define "QoS" levels based on the available power (e.g. battery).
Specific use cases
In a real-life system the applications cannot be started synchronously. If many applications use their own timers to perform specific actions, the probability of CPU to enter a deeper idle state is reduced. Also, frequent entry and exit from an idle state will result in increased power consumption.
Grouping the timers across the system help in reducing the wake-ups from an idle state. Indirectly, this will allow the system to enter deeper idle states and maintain the state for long. See g_timeout_add_seconds()
See Optimizing IO Power Consumption article.