NSF CNS-1527051
CAR:
Comprehensive Algorithmic Resilience
Overview
Design
Results
Resources
Team
Publications
Resources
Hardware and Testbed
Server type-1 (2 units)
Dell PowerEdge R420
[Ref]
2 x Intel® Xeon® CPU E5-2430
[Ref]
96 GB. DDR3 Memory
Mellanox Technologies MT27500 Family [RDMA-Enabled NIC]
[Ref]
CentOS release 6.9
Server type-2 (3 units)
Dell PowerEdge R430
[Ref]
2 x Intel® Xeon® CPU E5-2620 v4
[Ref]
32 GB. DDR3 Memory
Mellanox Technologies MT27500 Family [RDMA-Enabled NIC]
[Ref]
CentOS release 6.9
Server type-3 (9 units)
Dell PowerEdge R610
[Ref]
2 x Intel® Xeon® CPU L5630
[Ref]
32 GB. DDR3 Memory
Mellanox Technologies MT27500 Family [RDMA-Enabled NIC]
[Ref]
CentOS release 6.9
RDMA-enabled switch type-1 (1 unit)
Mellanox SX1012X Open Ethernet Switch
[Ref]
12 QSFP ports 40/56GbE
Lowest latency: 220nsec for 40GbE
RDMA-enabled switch type-2 (1 unit)
Mellanox MSX1024B-1BFS-RF SwitchX-2 Open Ethernet Switch
[Ref]
48 SFP+ ports 10GbE
12 QSFP ports 40/56GbE
Lowest latency: 270nsec for 10GbE
Lowest latency: 220nsec for 40GbE
Software and Artifacts
REX uses
adaptive incremental checkpointing (AIC)
, derived out of:
modifying Berkeley Lab Checkpoint/Restart for Linux
(BLCR) version 0.8.2
to handle incremental checkpointing; and
integrating a page-aligned delta compressor (Xdelta3-PA), a modified version of
Xdelta3 (version 3.0y)
.
301 E. Lewis St., University of Louisiana at Lafayette, Lafayette, LA 70503