Händelser: Data- och informationsteknikhttp://www.chalmers.se/sv/om-chalmers/kalendariumAktuella händelser på Chalmers tekniska högskolaThu, 30 Nov 2017 17:15:49 +0100http://www.chalmers.se/sv/om-chalmers/kalendariumhttps://www.chalmers.se/sv/institutioner/cse/kalendarium/Sidor/Thesis-Defence-Alirad-Malek.aspxhttps://www.chalmers.se/sv/institutioner/cse/kalendarium/Sidor/Thesis-Defence-Alirad-Malek.aspxAlirad Malek, Data- och informationsteknik<p>EB, lecture hall, EDIT trappa C, D och H, EDIT</p><p>Advances on Adaptive Fault-Tolerant System Components: Micro-processors, NoCs, and DRAM</p><br />The adverse effects of technology scaling on reliability of digital circuits have made the use of fault tolerance techniques more necessary in modern computing systems. Digital designers continuously search for efficient techniques to improve reliability, while keeping the imposed overheads low. However, unpredictable changes in the system conditions, e.g. available resources, working environment or reliability requirements, would have significant impact on the efficiency of a fault-handling mechanism. <br /><br />In the light of this problem, adaptive fault tolerance (AFT) techniques have emerged as a flexible and more efficient way to maintain the reliability level by adjusting to the new system conditions. Aside from this primary application of AFT techniques, this thesis suggests that adding adaptability to hardware component provides the means to have better trade-off between achieved reliability and incurred overheads. On this account, hardware adaptability is explored on three main components of a multi-core system, namely on micro-processors, Networkson-Chip (NoC) and main memories. <br /><br />In the first part of this thesis, a reliable micro-processor array architecture is studied which can adapt to permanent faults. The architecture supports a mix of coarse and/or fine-grain reconfiguration. To this end, the micro-processor is divided into smaller substitutable units (SUs) which are connected to each other using reconfigurable nterconnects. Then, a design-space exploration of such adaptive micro-processor array is presented to find the best trade-off between reliability and itsoverheads, considering different granularities of SUs and reconfiguration options. Briefly, the results reveal that the combination of fine and coarse-grain reconfiguration offers up to 3 more fault tolerance with the same overhead compared to simple processor level redundancy. <br /><br />The second part of this thesis, presents RQNoC, a service-oriented NoC that can adapt to permanent faults. Network resources are characterized based on the particular service they support and, when faulty, they can be bypassed through two options for redirection, i.e. service merging (SMerge) and/or service detouring (SDetour). While SDetour keeps lanes of different services isolated, suffering longer paths, SMerge trades service isolation for shorter paths and higher connectivity. Different RQNoC configurations are implemented and evaluated in terms of network performance, implementation results and reliability. <br /><br />Concisely, the evaluation results show that compared to the baseline network, SMerge maintains at least 90% of the network connectivity even in the presence of 32 permanent network faults, which is more than double versus SDetour, but will impose 51% more area, 27% more power and has a 9% slower clock. Finally, the last part of this thesis presents a fault-tolerant scheme on the DRAM memories that enables the trade-off between DRAM capacity and fault tolerance. We introduce Odd-ECC DRAM mapping, a novel mechanism to dynamically select Error-Correcting Codes (ECCs) of different strength and overheads for each allocated page of a program on main memories. Odd-ECC is applied to memory systems that use conventional 2D, as well as 3D stacked DRAMs and is evaluated using various applications. Our experiments show that compared to flat memory protection schemes, Odd-ECC reduces ECCs capacity overheads by up to 39% while achieving the same Mean Time to Failure (MTTF).https://www.chalmers.se/sv/institutioner/cse/kalendarium/Sidor/Thesis-Defence-Jacob-Lidman.aspxhttps://www.chalmers.se/sv/institutioner/cse/kalendarium/Sidor/Thesis-Defence-Jacob-Lidman.aspxJacob Lidman, Data- och informationsteknik<p>EA, lecture hall, EDIT trappa C, D och H, EDIT</p><p>​Program Analysis for Performance and Reliability</p>The increased demand for computing power has lead designers to put an ever increasing number of cores on processor dies. This advance has been made possible through miniaturization and effectivization of the underlying semi-conductor technology. As a by-product, however, the resulting computer systems are more vulnerable to interference. This has made reliability a first-order concern and is treated both in software and hardware through some form of redundancy. Redundancy is however detrimental to performance leading to more resources spent re-computing. Efficient use of hardware requires software that can take advantage of the computer system. Compilers are responsible for translating high-level source-code into efficient machine-code. Transformations in the compiler can improve performance and/or reliability of the software. Prior to applying such transformation the compiler needs to verify the legality and benefit of this optimization through program analysis. This thesis develops program analyses for reasoning about performance and reliability properties and show how these synthesize information that could not be made available from previous approaches.<br /><br />First, I present an analysis based on abstract interpretation to determine the impact of a finite number of faults. An analysis based on abstract interpretation guarantees logical soundness by construction, and I evaluate its applicability by deducing the fault susceptibility of kernels and how a program optimization affect reliability.<br /><br />Second, I present the fuzzy program analysis framework and show that it admits a sound approximation in the abstract interpretation framework. Fuzzy sets allow non-binary membership and, in extension, a qualitative static program analysis that can perform common-case analyses. Furthermore this framework admits a dynamic analysis based on fuzzy control theory that refines the result from the static analysis online. Using the framework I show improvement on a code motion algorithm and several classical program analyses that target performance properties.<br /><br />Third, I present an analysis based on geometric programming for deciding the minimal number of redundant executions of an program statement while maintaining a reliability threshold. Often a fixed number of redundant executions per statement is employed throughout the whole program. To minimize performance overhead I exploit that some statements are naturally more reliable, and more costly, than others. Using the analysis I show improvement in reliability and performance overhead due to use of a redundancy level that is tailored for each statement individually.<p></p>https://www.chalmers.se/sv/styrkeomraden/ikt/kalendarium/Sidor/Initiative-Seminar-2018-Digitalisation.aspxhttps://www.chalmers.se/sv/styrkeomraden/ikt/kalendarium/Sidor/Initiative-Seminar-2018-Digitalisation.aspxInitiative seminar on Digitalisation Security & privacy | Machine Intelligence<p>RunAn, conference hall, Kårhuset, Campus Johanneberg</p><p>​Save the date: 15 March 2018</p>​ <br /><span class="text-normal page-content">Next year’s initiative seminar on Digitalisation we focus on the two themes Security/Privacy and Machine Intelligence. More information will follow. </span>