|
|
|
Reliability and Energy Engineering White Paper IT Critical Facility Energy Engineering Challenges …A Senior AEE Member’s View Prepared for:
31ST WORLD ENERGY ENGINEERING CONGRESS October 1-3, 2008 Gaylord National Convention Center Washington, DC August 15, 2008 Prepared by: Joseph E. Greenawalt, PE (TX & OH) Consulting Engineering Manager
ABSTRACTEngineering of critical facilities requires extremely talented teams of engineers and engineering managers. Most of the existing Data Centers were engineered initially and have evolved unilaterally by operational changes into facilities that are targeted today as large energy consumers. Facilities construction projects that are designed by teams of licensed engineering professionals comply with statutes and codes, and achieve a balance of safety, security, reliability and cost effectiveness. Too often designers give much less consideration to operations, maintainability, constructability and sustainability. It is usually the clients, within the bounds of statutes, who make the final decisions on how the above are integrated into final design including the construction contracts. Also, given the innate creativity of engineers, there are many possible solutions in all sections of the drawings and specifications. Adding differing site conditions (climate, available utilities, physical threats, site specific vulnerabilities, environmental stewardship, statutes, site labor pool, etc…), a universal critical facility design may not be the best option for most clients and locations. The lowest life-cycle-cost design or the least cost of ownership design is different at every location. I strongly advocate: 1) reserving the title of Engineer for licensed engineering professionals, 2) engineering or reengineering critical facilities by teams of licensed engineering professionals, and 3) intensively engineered systems integration during every critical system or critical facility modification. This paper will discuss critical facilities reliability and energy engineering challenges and current practices.
INTRODUCTION First, let it be clear that there will not be reductions in energy use by the IT industry. This is worth repeating. There will not be reductions in energy use by the IT industry. The feasible energy efficiency improvements cannot keep up with the growth of the industry. It is true that there are Data Centers that may be able to save or reduce 50% of their energy use but their energy demand will likely grow beyond the savings measures before the measures are fully implemented. Growth is one of the many challenges facing the industry but it is not new. One study shows that growth during the 4-year period 2002 to 2006 of computing capacity in existing Data Centers, was over 3300% while critical load energy growth was about 550% 1. That is a calculated 600% improvement in energy efficiency in 4 years; what other industry has approached that level of efficiency improvement? The objective now is to operate critical facilities more efficiently and to keep up with growing demands for service reliability and capacity.Everything about the IT Industry seems to be about helping businesses and industries to reduce costs, improve communications, improve productivity, improve quality, and generally support better business operations through better information exchange. Critical Facilities Must Be ReliableData Center critical facilities are committed to a high level of reliability for their critical loads. To achieve high levels of reliability, people and equipment have been added. Though these actions are required, they are not considered part of the critical IT load. The add-on energy consumption to ensure safety, security and reliability has grown to significance, even matching or exceeding the critical load. Safety and security are achieved by housing the critical load in facilities with people to manage the critical and equipment loads. The people, support equipment and facilities require more energy. Excess heat energy must be removed and the people must be supported. Imagine that you drive an old panel van and now you decide to apply some of the reliability practices to it. How would you provide a second power source? How would you provide back-up cooling? How would you provide a 4 millisecond change of drivers or change of power source? Imagine the space it would take and the additional cost to run two engines, provide an additional driver, and cooling for the second engine. You wouldn’t do these things to that old van but you could somehow. It would cost dearly to have a team of engineers design these changes because of the many operations and safety considerations, and the general liability for the design. I can only see it being done on “Junkyard Wars 2”. Similar changes have been made in existing critical facilities. Perhaps this frivolous example could be a useful exercise for getting critical operations teams to understand that there are systems effects to every modification no matter how simple. In contrast to the enterprise server in a company closet, mission critical Data Centers can serve millions of people 24/7/365 3 anywhere in the world. For some mission critical Data Centers, like Mission Control in Houston, lives may be lost when the center fails. Mission critical facilities demand extraordinary measures to ensure reliability, including duplicate or shadow facilities poised to assume the mission in an event. Core Reliability ChallengesCritical facility reliability has been driven by failures—doing what can be done to make sure that “it” doesn't happen again. Since Critical Load (IT equipment) reliabilities have been improved remarkably, the remaining core reliability challenges are people, electricity and cooling; all are necessary to support the critical mission and Critical Load. Core Reliability Challenges (Figure 1) 4 focus much of the current reliability improvement efforts on getting control over external factors such as utility power. As critical loads continue to grow, a consequent issue is meeting the cooling requirements reliably. Primarily because of reliability improvements of infrastructure, facilities and equipment (N+1 to 2N redundancies) 5, people (not infrastructure, facilities or equipment), continue to be the least reliable component in a critical system’s reliability model.
Most of the existing Data Centers were engineered initially and have evolved unilaterally by operational changes into facilities that are targeted today as large energy consumers. Critical facilities of the future should be a continuous design-build process engineered by teams of licensed engineering professionals with intentioned compliance to statutes, codes, and intense consideration given to safety, security, reliability and cost effectiveness. Adding or replacing a critical system component involves more than planning a space for it and getting power to it. Critical facilities should be engineered or reengineered—to apply Systems Engineering 6 and Reliability Engineering 7. Modeling Reliability and Energy USEReliability modeling 8 is the best way to manage system designs and consequences to ensure reliability and availability levels may be met or exceeded. The advantage of reliability modeling is that there doesn’t have to be an actual failure to ensure that a failure doesn't happen. Some off-the-shelf software 9 can also calculate cost of ownership for the life of the system and can recalculate costs for the changes that may be considered, for example, to improve reliability or to add critical equipment. Root Cause Analysis 10 and Single Point of Failure Analysis 11 have been useful for ensuring system redundancies, but they have contributed greatly to additional energy consumption, from operating system components outside of their designed efficiency parameters and from operating extra equipment. Solutions can be engineered to meet changing Data Center requirements. And with engineering comes adherence to statutes and codes, and intense considerations given to safety, security, reliability and cost effectiveness. Existing solutions to the reliability challenges consume extra resources and energy to ensure maximum equipment availability 24/7/365. Reliability and Energy Engineering ChallengesEnergy conservation and efficiency objectives are opportunities to reduce and control operating costs, reduce wear and tear on critical infrastructure systems and preserve system capacities for contingencies and future growth. Prior to discussing the reliability and energy engineering challenges in more detail, it is important to accept what is meant by energy engineering, energy conservation and by energy efficiency in this paper. Energy Engineering seeks to achieve feasible energy conservation, energy efficiencies and least cost of ownership. It applies efficient energy technologies to the design processes to satisfy the design requirements at minimum cost or consumption of resources for the design life. Energy conservation seeks to expend only the resources necessary to achieve business objectives. Energy efficiency seeks to leverage assets and resources to safely and reliably achieve business objectives at minimum costs. Energy PurchasesThere are typically several components to energy purchases: cost per unit of commodity, transportation, demand/ storage/reserve, business overhead and emissions. The commodity component is the competitive acquisition of definite quantities of energy. The transportation or delivery component gets the commodity from where it is purchased to where it will be consumed. The energy demand component may also be satisfied by on-site or off-site storage or reserves. There are costs associated with each of these components whether purchasing fuel oil, natural gas or electricity. Data Centers should command the most favored electrical rates from electric utility companies due to Data Centers’ relatively constant demands 24/7/365. The low risk nature of the loads and the ability to operate independent of the electricity grid during power contingencies, make Data Centers ideal electricity customers. Engineered measures to shift energy consumption to off-peak periods such as for thermal storage and decrease on-peak consumption could earn valuable incentives 12. Conversely, engineered measures to decrease energy use off-peak could adversely impact the favored electrical rate status and incur offsetting demand charges through the purchased electricity rates.Energy ConsumptionElectricity is the energy form required by Data Centers and the electrical energy is converted 13 to heat or other forms of energy when it is consumed 14.A typical electric heater is designed to receive power from a typical 15 Amp protected receptacle 15. The maximum energy the typical electric heater can consume from a dedicated circuit is about 1,700 Watts or 1.7 kW (the mathematical product of Voltage and Amperage or 115 Volts x 15 Amperes).Consider the portable electric heater for a moment. It is nearly 100% efficient as it delivers heat reliably to the space around it. What are some of the system design considerations for designing, selecting or using them? Safety? Power Source? Flammables? Temperature Controls? Others? Electricity use by IT equipment in the Data Center produces heat nearly as efficiently as the electric heater except that heating the space around it is a consequence of accomplishing mission critical functions. Unlike the electric heater, IT equipment thermostats only shut equipment down to prevent their self-destruction. The challenge in a Data Center with the electric heater is usually not tripping the breaker, but rather the additional energy consumed from other sources when it is operating in a cooled or air conditioned space. The energy (consumed by the heater or the IT equipment) is removed—cooled by energy driven equipment. Cooling IT equipment is a mission necessity—cooling a 1500 Watt portable electric heater is an unnecessary waste. Reliability Decisions Have Energy PenaltiesElectrical LoadsFigure 2 provides a basic reference to the electricity costs and consumptions 16.
Energy numbers reflect the difference between power out and power in. Percent reflects the percent of IT Critical Equipment Load. Cost reflects costs at $0.10 per kWh. For a 1MW critical load, these costs would be for one hour. Miscellaneous Loads would be a number that makes the sum of the numbers above it equal the Total Electrical Load measured. All loads should be measured simultaneously.
Figure 3 shows simplified measures 17 from two sets of points that DCiE 18 energy efficiencies can be calculated with accuracy close (2-3% higher) to Figure 2. A challenge in Data Center engineering or reengineering is to understand how the electrical loads will be experienced and how to match those loads to the best efficiencies available. For existing Data Centers, the load data in Figure 2 would provide the data necessary to model the Data Center and make intelligent choices in selecting equipment replacements or retrofits. Improving the operational efficiencies of any of the critical load and support systems’ components could provide ripple savings from other system components. It is recommended that load/loss curves be acquired for all critical equipment to help in making intelligent choices. UPS Systems Critical facility 2N reliability scenarios require two UPS 19 modules operating in parallel with less than 50% load on each. Modern efficient UPS systems use about 5% more than their rated power to charge batteries and condition and monitor electrical power. This power draw (5% more than their rated power) is nearly the same value over the range of UPS allowable loads (See Figures 4 and 5) 20.
This fully loaded UPS system is about 95% efficient while an UPS at 40% load would be about 89% efficient. Older and less efficient UPS systems use about 10% more than their rated power to charge batteries and condition and monitor electrical power, fully loaded would be about 91% efficient while at 40% load would be about 80% efficient. The 2N requirement to have two UPS modules operating in parallel with less than 50% load on each doubles the losses from 5% to 10%. On a 500 kW UPS system, this loss typically increases from 25 kW to 50 kW (to operate the additional UPS system to ensure reliability) and requires about 7 tons more cooling. Thus cooling 50kW adds 20kW to the non-critical electrical load.
During battery charging, batteries may release highly flammable or corrosive gas that must be monitored and removed by exhaust fans to ensure the safety and reliability of the battery systems. The make-up air for the air exhausted from the battery room may also need to be conditioned (heated or cooled) to provide proper temperatures for reliable battery operations. The safety monitor, exhaust fans and make-up air conditioning (heating or cooling) also add to the non-critical electrical load. KVM and PDU The KVM 21 switch allows a number of computers or servers to be monitored and managed at a single station. This highly efficient solution saves having to have a keyboard, video monitor and mouse for each computer or server and subsequently saves the energy and space of the unnecessary superseded components. The PDU 22 monitors and distributes power typically from two power sources. If one power source fails and/or fails to meet power quality requirements, the PDU is programmed to switch the power within 4 milliseconds to an alternate power source. The power consumption of KVM Switches, KVM controlled devices and PDUs (combined) could be as high as 2.5% of the critical load. They also add to the cooling load and non-critical electrical load. Emergency or Standby Generators Emergency or standby generators consume about 150% more energy than their power output to provide electricity during contingency and test operations. This additional energy use is primarily due to the inefficiencies of the combustion engine driving the generator and is generally wasted in engine cooling systems and through the exhaust system. Once the engine is started, it must be run until the engine oil and exhaust systems have purged all condensation resulting from the combustion gasses. Failure to accomplish this reliability procedure (purging all condensation) will contribute to early engine failures due to acid corrosion 23. Additional fuel premiums and delivery charges may be incurred for small deliveries to maintain the minimum level of emergency generator run time required. Lighting Lighting is required only when people need to perform a function. This energy use principle may be applied to personal computer systems, printers and task lighting. The energy supplied for these purposes must also be removed as heat. Avoided electricity consumption equals avoided cooling and electricity costs. The most cost effective lighting energy savings come from shutting lights off when not needed. The next best is to control lights automatically through sensors and the BMS 24. Lighting efficiency retrofits in many applications can reduce lighting energy up to 40% 25 and improve lighting levels. LED retrofit kits 26 for exit lights not only save energy and operating costs but practically eliminate life safety issues with burned out bulbs in exit signs. Cooling Loads 27CRAC and ERAC Computer room air conditioning (CRAC) units and electrical room air conditioning (ERAC) units move air through filters to clean and through cooling coils to extract heat. The air flow is created by electrically powered fans which draw heated air from the room through the air conditioning units and push it into a cooling air plenum. The plenum delivers the higher pressure cooling air to lower pressure points in the plenum. Air flow designs generally treat air as an incompressible fluid. Constant velocity designs are common in ducted applications because every change in air direction or speed generates (static pressure) losses that must be compensated by more fan horsepower. Constant velocity unobstructed air flows require the least amount of fan energy. The amount of resistance to airflow (static pressure) is measured in inches of water gauge (inches wg) or water column inches (wc”). The system resistance depends upon three factors:
Too many CRAC Units immediately generate mega-losses at their output by ramming high velocity air into a perpendicular and practically infinite obstruction (the subfloor). Most of the velocity pressure that could be used for static regains 31 is immediately used up. Perforated tile air diffusers follow the same scheme as they quite effectively obstruct air flows and generate significant changes in direction and speed.
Energy conservation is achieved by lower velocity unobstructed air flows. Figure 6 illustrates a typical raised floor air flow design 32. The key to engineering energy efficiency is achieving necessary air flows at the lowest feasible static pressure (losses). All along the way that the air moves, it incurs losses.
Figure 7 illustrates typical situations in existing raised floors driving CRAC inefficiencies 33. Friction losses occur in air conditioning unit filter and coils, then at the outlet into the plenum or raised floor. Air leaks, cables, conduits, piping, structural members and surface friction also create losses in the movement of the cooling air to its intended load and can severely reduce the amount of air available to cool critical loads. This is especially true for raised floors that are less than the height shown. CRAC and ERAC units use significant energy to move air. Additional fan energy and air pressure may be required to overcome system losses to move cooling air to cool the critical loads. ERAC units are used in critical equipment rooms such as substations, switch gear, and UPS. Their primary purpose is to provide clean conditioned air to ensure critical equipment reliability. Dust, humidity and subsequent corrosion from outside air ventilation can contribute to early and unscheduled demise of the critical infrastructure equipment. This is also a concern for IT equipment on the raised floors. Corrosion is a significant cause of intermittent failures in data mechanical connections. Intermittent failures are very difficult and time consuming to resolve. Dust reduction and humidity control are additional reliability measures that have proven effective. CRAC and ERAC units also add to the cooling load and also to the non-critical electrical load. Chilled Water System Cooling adds over 50% of the critical load to the non-critical electrical load. A challenge in Data Center engineering or reengineering is to understand how the cooling loads will be experienced and how to match those loads to the best system efficiencies available. Chillers should be targets for energy efficiency improvements since they can add 40% of the critical load to the non-critical electrical load. The main challenge with chillers is operating them at their best design efficiency. Modern chillers can remain a near peak efficiency and remain reliable at reduced chilled water flows when loads are less than the design peak. Modern chillers can operate at peak efficiency over a wide range of loads. If a chiller is over 15 years old or operates on CFC 34 refrigerants such as R11 35, a replacement chiller is a sound investment as it will significantly reduce energy losses 24/7/365 and eliminate the use of an ozone depleting refrigerant. Pumping Systems Pumps provide the work to move water through a piping system with enough energy to overcome system losses. The design challenge is to eliminate and reduce system losses. Energy conservation is achieved by lower velocity unobstructed water flows.
Figure 8 show a typical pumping system designed for reliability and flexible operations. The engineering challenge is to design a pumping system that is inherently balanced 36 and can move water efficiently to meet a wide range of flow and pressure requirements. VFDs 37 on pumps reduce energy costs and improve reliability. Pumps and VFDs also add to the non-critical electrical load. Chilled Water Pumps Primary Chilled Water Pumps move water through filters for cleaning and chiller evaporators for chilling; then move chilled water through strainers and through cooling coils to extract heat. Secondary Chilled Water Pumps are often used to boost water flow through CRAC units and cooling coils and sometimes Return Pumps are used to boost the return water flow to the chiller system. Thermal energy losses or gains are controlled or reduced by proper application and installation of piping insulation. Leaks are also correctable energy and water losses. Condenser Water Pumps Condenser Water Pumps or sometimes called Cooling Tower Pumps draw water from the cooling tower basin(s) through strainer(s) and pump it through chiller condensers to extract heat from the compressed refrigerant. The water leaves the chiller under pressure to flow back to the cooling tower’s headers or collection basins which uniformly distribute water over the evaporative heat exchangers (tower fill) and down to the basin. Pumping energy is required to draw water from the basin, overcome strainer and piping losses, losses through the chiller and to lift the water up and over the tower. Cooling Towers
|