Buy Now

SRE Foundation Certification (CSREF)

About Certification

GSDC's SRE (Site Reliability Engineer) Foundation Certification is a roadmap to the principles & practices that allows an organization to reliably and economically scale critical services. The course content of this certification revolves around the evolution of SRE and its future direction and empowers the participants with the practices, methods, and tools to engage people across the organization involved in reliability and stability evidenced through the use of real-life scenarios and case stories. After the completion of this certification, participants will have tangible takeaways to leverage when back in the office such as understanding, setting and tracking Service Level Objectives (SLO's).

  • The certification is curated with the fundamentals of key SRE sources, engaging with thought-leaders in the SRE space and working with organizations embracing SRE to extract real-life best practices and aims towards spreading knowledge about the key principles & practices necessary for starting SRE adoption.

Certification badge for SRE Foundation
 

Objectives

The objective of the SRE Foundation includes a deep understanding of:

  1. The history of SRE and its emergence at Google
  2. The inter-relationship of SRE with DevOps and other popular frameworks
  3. The underlying principles behind SRE
  4. Service Level Objectives (SLO's) and their user focus
  5. Service Level Indicators (SLI's) and the modern monitoring landscape
  6. Error budgets and the associated error budget policies
  7. Toil and its effect on an organization's productivity
  8. Some practical steps that can help to eliminate toil
  9. Observability as something to indicate the health of a service
  10. SRE tools, automation techniques and the importance of security
  11. Anti-fragility, our approach to failure and failure testing
  12. The organizational impact that introducing SRE brings

 

Target Audience

about-us

Anyone starting or leading a move towards increased reliability

Anyone interested in modern IT leadership and organizational change approaches

Business Managers

Business Stakeholders

IT Team Leaders

System Integrators

Change Agents

Consultants

DevOps Practitioners

IT Directors, IT Managers

Tool Providers

Product Owners

Scrum Masters

Software Engineers

Site Reliability Engineers

 

Benefits

After the completion of this certification, participants will be able to:

Communicate with other engineers, product owners, and customers and come up with targets and measures.

Introduce error budgets in order to measure risk, balance availability and feature development.

Automate tasks that require a human operator to work manually.

Understand the systems and their connectivity.

Discover the problems early to reduce the cost of failure.

 

Pre-requisites

There are no as such pre-requisites for this SRE Foundation Certification. But IT experience and working knowledge in the DevOps field are recommended.

 

Examination

Ensure that you have filled up the basic details.
This exam consists of 40 multiple-choice questions.
Candidates need to score a minimum of 65% of the total marks (i.e. 26 out of 40) to pass this examination.
The total duration of this examination is 90 minutes.
Candidate should Tick against only one correct answer in Multiple Choice Questions.
There is no negative marking system applicable to this examination.
Incase the participant does not score passing % then they will be granted a 2nd attempt at no additional cost. Re-examination can be taken up to 30 days from the date of the 1st exam attempt.

 

Sample Certificate

 

Exam Syllabus

1. SRE Overview

  • Introduction
  • The Production Environment From the Viewpoint of an SRE
    • Exercise: Mapping Your Production Environment
2.Principles of SRE
  • Embracing Risk
    • Managing Risk
    • Measuring Service Risk
    • Risk Tolerance of Services
    • Motivation for Error Budgets
  • Service-Level Objectives
    • Service Level Terminology
    • Indicators in Practice
    • Objectives in Practice
    • Agreements in Practice
  • Eliminating Toil
  • Monitoring Distributed Systems
    • Why Monitor?
    • The Four Golden Signals
    • Worrying About Your Tail
    • Choosing an Appropriate Resolution for Measurements
    • As Simple as Possible, No Simpler
    • Tying These Principles Together
    • Monitoring for the Long Term
  • The Evolution of Automation
    • The Value of Automation
    • The Value for SRE
    • Use Cases for Automation
    • Automate Yourself Out of a Job
    • Soothing the Pain: Applying Automation to Cluster Turnups
    • Borg: Birth of the Warehouse-Scale Computer
    • Reliability is the Fundamental Feature
  • Release Engineering
    • The Role of a Release Engineer
    • Philosophy
    • Continuous Build and Deployment
    • Configuration Management
  • Simplicity
    • System Stability Versus Agility
    • The Virtue of Boring
    • I Won't Give Up My Code!
    • The "Negative Lines of Code" Metric
    • Minimal APIs
    • Modularity
    • Release Simplicity

3. Practices of SRE

  • Practical Alerting
  • Being On-Call
  • Effective Troubleshooting
  • Emergency Response
  • Managing Incidents
  • Postmortem Culture: Learning from Failure
  • Tracking Outages
  • Testing for Reliability
  • Software Engineering in SRE
  • Load Balancing at the Front End
  • Load Balancing in the Datacenter
  • Handling Overload
  • Addressing Cascading Failures
  • Managing Critical State: Distributed Consensus for Reliability
  • Distributed Periodic Scheduling with Cron
  • Data Processing Pipelines
  • Data Integrity: What You Read Is What You Wrote
  • Reliable Product Launches at Scale
4. Management in SRE
  • Accelerating SREs to On-Call and Beyond
  • Dealing with Interrupts
  • Embedding an SRE to Recover from Operational Overload
  • Communication and Collaboration in SRE
  • The Evolving SRE Engagement Model


Download PDF

295 Turnpike Rd block 519, Westborough, MA 01581, USA
Hohenstieglen 6, 8152 Glattbrugg, Switzerland +41 41444851189