Enhanced Power Modules Increase System Reliability

Stylized representation of various energy sources and technologies in blue and white
While having power when expected is a key aspect of reliability for any electrical or electronic system, what drives that ultimate robustness into the power modules/subsystems is the culmination of many years of advanced studies on the physics of failure and how to apply those learnings to power design via lots of trial and error. Even a good design is not worth the paper it is printed on if it cannot be built consistently and affordably, so tying these reliable design practices to the qualification and high-volume manufacture of end solutions is imperative in understanding how critical system performance is tied to so many factors of power solution engineering.

How do power solutions impact system reliability?

Power solutions impact system reliability in many ways, some more obvious than others. On the first order, a system generally needs to start up to be considered to be functioning, and since no electronics or electrical systems work without power, simply being capable of being turned on is the core measure of reliability and, certainly, what many will think about first in this regard. Beyond just being turned on, a system’s performance can be tied to the quality of that power. In other words, many characteristics and specifications for each system voltage rail has to be fulfilled to be considered of acceptable quality to meet the needs of the load in a way that guarantees that the quality of the power supply will not inhibit the performance of the load. Power quality can be associated to how well the voltage is regulated (for variations in input voltage or output load), what kind of transient or load step can be accommodated without making the power supply unstable or exceeding the acceptable limits, how quickly or smoothly the output voltage rises, and what safety regulations/standards need to be met to garner whatever report/certifications are necessary to legally ship the product.

As hinted above, power supply regulation can apply to the input as well as to the output. Even if working on the output side is satisfactorily accomplished, noise reflected back onto the input can impact other devices that share the same line or bus. If this cross-interference is scaled amongst many units and systems, the effects can even be detrimental to the reliability or stability of the utility. Requirements for power factor correction (PFC) in AC/DC power supplies or maximum total harmonic distortion (THD) levels are to address this phenomenon, though unrelated to end-system performance.

Since nothing in an electrical system operates without power, many electromechanical components physically connect power supplies with their loads, which tend to be common points of failure and, therefore, bottlenecks in terms of optimizing system reliability. Connectors, wire harnesses, wires, and solder joints are often the first culprits that should be investigated when performing a failure analysis of power solutions. Things that physically move, such as switches and fans, also fall into this category.

Filter components are the next major items on the list of concerns in the power bill of materials (BOM), namely, the energy storage devices like capacitors, transformers, and inductors. A capacitor’s reliability is usually at the whim of the electrolyte material, which is often a liquid that can evaporate or even be outgassed over time as a function of temperature and electrical stress (i.e., ripple). Magnetic components can be complex and/or hand-assembled structures that introduce reliability weaknesses in addition to those related to temperature and electrical stress (i.e., core saturation).

Reviewing all these items that encompass the majority of focus areas at the intersection of power supplies and systems is also a great start as an approach in mitigating the risks that each relates to. Aside from listing these things out, this exercise also gives some pretty good hints at where to focus design and qualification efforts in the perpetual pursuit of improved system reliability. System reliability can be characterized in any number of different ways that are typically based on some statistical algorithm to predict life/failure formed from looking at the statistics of failure for the reliability bottlenecks summarized above (see “mean time between failures” or MTBF/“mean time to failure” or MTTF) [1].

What are key aspects of power solution design and manufacturing methodologies that drive reliability?

Given all the consideration that must go into a typical power supply, it almost seems miraculous that such robust subsystems with custom BOMs of high counts, numerous electromechanical components, and high power density can be built safely and repeatedly at high volumes and still maintain the highest of manufacturing yields. Even relatively simple-looking power solutions are the result of many years of painstaking time and effort learning from mistakes and optimizing the process, all the way from raw material procurement to end-of-life (EOL) disposal/recycling.

In the design phase, most experienced power designers will refer to some kind of derating guidelines, which are documents of recommendations on how much margin should be used to utilize a particular component or design a system. In other words, how much less than 100% of an item’s rated, maximum specification for any particular figure of merit (FOM) should be used for the calculation of maximum stresses and expected performance? This can apply to all FOMs from a component’s voltage/current rating to an external, ambient operating temperature or physical attributes such as mechanical stresses. Often, safe operating limits are determined by a mixture of FOMs, such as operating a semiconductor to a maximum junction temperature limit, which can relate directly to a channel current. Derating guidelines and/or standards tend to be based on extensive experience and data based on the specific components, materials, and applications under consideration. Following some rules can allow a new designer to make informed decisions that are guided by many more years of experience than they may have.

The historical bases and research into the physics of failure are a bit beyond the scope of this blog, but it is good to provide a brief introduction to the kind of analyses designers can expect to come across in this regard. The statistically-based algorithms referenced in the previous section will commonly implement some variant of the Arrhenius equation [2], which relates temperature to the rate of chemical reactions, thus providing a predictor for when the energy observed by a component over its life exhibits enough thermal stress to expect degradation to a certain threshold that may define a point of failure. Combining these predictors of performance/failure with the heavy leverage of experience and data that serve as the basis for the design guidelines mentioned above yields process documents that respectable, power design resources will be careful to develop, maintain, and operate to, which may encompass something like a quality management system (QMS) or other framework of process control and data capture. A shortcut in determining just how respectable a specific resource is lies in seeking out the kinds of standards/certifications/processes utilized; see the ISO 9001 [3] reference as a universally-accepted example of this.

Since magnetics are both integral to the design/operation of the power solution, in addition to being a common point of failure due to the involved manual processes and/or complex structures (not to mention electromagnetic compatibility or EMC), many serious power designers will be required to put considerable engineering effort into optimizing the materials and design (electrically and mechanically) of transformers and inductors. The construction of transformers is not only critical for performance but may also be the first line of defense in making the power solution (and, therefore, the end system) safe by implementing galvanic isolation to users and/or loads from unsafe voltages. These components have the added burden of commonly being the largest (in size and mass) in the system. That means, on top of all the factors just outlined, merely securing these large components is also necessary to ensure they stay locked into position beyond the anticipated maximum shock and vibrational stresses to be experienced by the system.

The testing and qualification framework used to assess a design’s performance and ultimate limits shall greatly contribute to the success or failure of a deployed solution. Certainly, most system designers will perform some level of design verification testing (DVT, a.k.a. bench testing), but putting the unit under test (UUT) in a carefully-crafted experiment that methodically pushes on the levers of failure (especially to the breaking point) is what separates a nice, acceptable design from an excellent, highly reliable one. Such a family of tests falls into the “highly accelerated” category in that they emulate the extreme stresses a system may see over the course of a lifetime to bring out the failure mechanisms in a more reasonable, development timeframe (i.e., weeks/months of tests to represent years of use). These kinds of tests may also employ electrical/thermal/mechanical stresses in a lab environment that are not feasible or even safe to perform as simple bench tests. Some examples in this family of tests are a highly accelerated life test (HALT) [4], which pushes a design to failure in the design phase, or a highly accelerated stress screening/auditing (HASS/HASA) [5], which assesses ongoing reliability by constantly sampling units from production and subjecting them to regular, accelerated life testing.

How are modern power solutions enhancing system reliability?

There has already been much discussion around the key points of failure that relate the power solutions to overall system reliability and some general approaches that power solutions experts take to characterize such bottlenecks and design around them based on the physics of failure. Perhaps it is useful to delve a little deeper into the specifics of power supply design and manufacturing to enhance the discussion with the methodologies on the cutting edge of assembly fabrication.

As electromechanical components have been identified as the most common points of weakness in a system, an obvious solution may be to simply eliminate these components, but that is easier said than done. Just knowing a fan is bulky and prone to mechanical failure does not mean components cannot be kept cool enough to remain in their targeted operating region without it, but utilizing other thermal mitigation techniques (i.e., heat sinking/spreading), in combination with intelligent power management techniques to reduce consumption in the first place, may be the difference maker between a system requiring forced air through the use of a fan or one that can be cooled by radiative or convection airflow. In fact, anything that enhances the overall functional commutation efficiency of a power solution, such as wide-bandgap (WBG) semiconductors or reduced capacitor ripple, adds to this value proposition since efficiency is inversely proportional to dissipated power in the solution.

Replacing through-hole (TH) pins with surface-mount (SM) pads is a growing trend in power solutions and is just one of the many advanced packaging innovations in recent years. While it is understood that SM technology itself is nothing new, the ability to transfer high current/heat loads more directly from packaged lead frames and semiconductor components to external printed circuit boards (PCB) and heatsinks/heat spreaders, has seen great advancement, particularly with the assistance of three-dimensional power packaging (3DPP®) techniques [6]. Cleaner attach methods also beget the use of enhanced soldering standards, such as IPC-610 [7], as solder joints are a very frequent cause of heartburn for both quality managers and those responsible for debugging intermittent field issues.

Turning as many manual hand-wired solutions as possible into automated ones was discussed earlier, particularly in the context of magnetics. Modern packaging and heterogeneous integration techniques take this a step further by incorporating planar magnetics. This is done by incorporating the magnetic’s windings into carefully-controlled and repeatably-placed traces on a PCB (often the same one with the other system components) and then enclosing the magnetic/core material around the traces to form magnetic structures. Doing so brings a slew of advantages ranging from reducing the size to accomplishing very complex geometries with tight tolerancing while concurrently mitigating a less-reliable component and driving the economies of scale and manufacturing automation. Planar magnetics can also facilitate improved conformal coating of printed circuit assemblies (PCA) and/or hermetic sealing to protect against environmental factors that inhibit system reliability such as dust, humidity, and foreign airborne conductive particles.

Modern enhancements to manufacturing processes and documentation around component tracking contribute greatly to the near-counterintuitive improvement of system reliability, driven by the power solutions, even in the face of increasingly-complex assemblies. Improved traceability and a global focus on humane sourcing of raw materials enable the ability to not only trace the lot/date code of a single component in a power supply back to its source but also even trace back to the country or mine the material from where it came from all by looking up the power supply’s serial number. This level of tracking and traceability comes from many years of paperwork and database capture that have evolved from papers in a file cabinet to a unique, digital ID that can trace an assembly from part kitting to final packaging/test/shipment and even into the field. The importance of these advanced documentation processes should also be noted for the contributions to the quality of reworked components/PCAs, in addition to those that flow smoothly from one end of the assembly line to the other. Following the path, a unit took used to be a real investigative mystery that often lost the critical tracking that would otherwise lead to the root cause of failure analysis and drive perpetual optimization to system reliability. How else is one to know a board went through an unsupported thermal profile in the wrong solder reflow oven as a short-term convenience for the manufacturing line operator?

Conclusion

Any system is only as strong as its weakest point, and power supplies provide plenty of potential bottlenecks to maximize system reliability. After all, power solutions can be as challenging, in terms of part counts and design complexity, as any other subsystem or component and even rival the overall system design itself. Realizing and internalizing this can turn these weaknesses into the most robust, reliable systems on the planet (or even beyond in space applications, where there is no warranty return for repair).

The physics of failure in power solutions has been very thoroughly studied over many decades, so designers now have extensive tools and guidelines at their fingertips. These helpful solutions can be more specific to certain industries and applications, though it should be noted that even if tailor-made for one market or vertical, it can still be very useful and accessible (e.g., cheap) to apply to another. For example, the derating guidelines and accelerated life tests in a standard like IPC-9592B [8] were created with larger computers and telecommunications systems in mind, but they can also facilitate the creation of a reliable and affordable consumer product. Standards like MIL-HDBK-217 [9] and MIL-HDBK-338B [10] were created for military-use cases but can also be leveraged for other high-reliability applications.

NOTE: Be sure to pay attention to dates on some of this stuff and do some digging for what tends to get used in a particular application of focus since some can be quite old. Also, pay attention to the references and background info that led to the guidance being leveraged to ensure any data/assumptions make sense for the application at hand.

Luckily, today’s designers have drop-in power solutions that are incredibly reliable in user-friendly form factors that are compatible with a variety of manufacturing processes. Advanced packaging and 3DPP® techniques are providing system designers with the advantages and sturdiness that comes with integrated power solutions while taking advantage of state-of-the-art (SOTA), commercial off-the-shelf (COTS) power subsystems.

References

[1] Wikipedia contributors, “Mean time between failures," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Mean_time_between_failures&oldid=1128168769 (accessed March 6, 2023).

[2] Wikipedia contributors, "Arrhenius equation," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Arrhenius_equation&oldid=1123333780 (accessed March 6, 2023).

[3] Wikipedia contributors, "ISO 9000," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=ISO_9000&oldid=1143191589 (accessed March 6, 2023).

[4] Wikipedia contributors, "Highly accelerated life test," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Highly_accelerated_life_test&oldid=1114242261 (accessed March 6, 2023).

[5] Wikipedia contributors, "Highly accelerated stress audit," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Highly_accelerated_stress_audit&oldid=959928574 (accessed March 6, 2023).

[6] “Introducing RECOM 3D Power Packaging® (3DPP),” RECOM Blog, Feb 26, 2021, https://recom-power.com/en/company/newsroom/blog/rec-n-introducing-recom-3d-power-packaging-(3dpp)-145.html (accessed January 23, 2023).

[7] IPC-A-610 Development Team, "IPC-A-610 - Revision F - Standard with Amendment 1: Acceptability of Electronic Assemblies," IPC, Bannockburn, IL, May 9, 2016. Available: https://shop.ipc.org/ipc-a-610/ipc-a-610-standard-amendments/Revision-f/english.

[8] Power Conversion Devices Standard Subcommittee (9-82), "IPC 9592B: Requirements for power conversion devices for the computer and telecommunications industries," IPC, Bannockburn, IL, Nov 2012, pg. 20. Available: https://shop.ipc.org/ipc-9592/ipc-9592-standard-only.

[9] “MIL-HDBK-217F: MILITARY HANDBOOK –RELIABILITY PREDICTION OF ELECTRONIC EQUIPMENT,” US Department of Defense, December 2, 1991.

[10] “MIL-HDBK-338B: MILITARY HANDBOOK – ELECTRONIC RELIABILITY DESIGN HANDBOOK,” US Department of Defense, October 1, 1998.
Applications