|
Characterizing
the Effect of Thermal Stress on Reliability
|
|
|
- We
leverage Continuous System Telemetry
Harness (CSTH) for quantifying the thermal stress experienced
by computer chips.
- CSTH:
- Efficient
infrastructure for collecting and analyzing time series data
- Advanced
pattern recognition tools for reliability surveillance: e.g.
Multivariate state estimation technique (MSET)
- Quantification
of thermal stress can guide the
reliability testing at manufacturing stage.
- Using
CSTH, we can obtain a real time metric for evaluating the remaining
useful life of components. See our innovative Length-of-Curve
(LOC) Metric.
|
|
|
|
|
Optimizing the Thermal Profile
|
|
|
- Thermal
management techniques typically trade off performance to lower
temperature.
- Current
systems on the market only pay attention to the maximum temperature
achieved.
- Frequency
and magnitude of thermal hot spots
and temperature gradients can be
minimized in a cost effective way through real time monitoring (i.e. CSTH)
and adapting to changes in workload and cooling dynamics. See the Temperature Aware Scheduling technique we
have developed.
|
|
|
|
|
Estimating
On-Die Temperature Accurately
|
|
|
- Lower
estimates than actual temperature cause late activation of thermal
management, which increases packaging costs and degrades reliability.
- Higher
estimates than actual temperature cause early activation of thermal
management, which degrades performance.
- Our
Accurate
Temperature Estimation technique eliminates thermal sensor
noise. It also provides temperature estimates on various locations on
the chip based on temperature readings from a limited number of
thermal sensors.
|