Service Level Expectations
The infrastructure team does not have any formal agreement or contract regarding the availability of its different services. However, we do try our best to keep services running, and as a result, you can have some expectations as to what we will do to this extent.
Primary Business Hours
Fedora Infrastructure is a community team, involving volunteers as well as people employed by Red Hat to work on Fedora. However, despite the help of volunteers primary business hours are mostly aligned with the workk schedule of Red Hat. Normal hours should be seen as during Mondayy through Friday 1400 UTC to 2300 UTC with US national holidays and a 2 weeks end of year closure affecting staffing and response times.
Services outside of primary business hours are done on call and depend on the availability of staff.
Roles and Responsibilities
Fedora Infrastructure to Community
-
To have staff present and available in appropriate IRC channels to answer questions during primary hours.
-
To have particular staff on-call during off hours so that community members can contact for Critical and Important services.
-
Interact with community members with respect and courtesy.
-
Work with community members to get accurate and thorough documentation of incidents, problems, or feature requests.
-
Resolve reported problems as soon as acknowledged if possible.
-
Clearly communicate estimated resolution times.
-
Move items which can not be resolved within a reasonable time to future feature requests or close out.
Community Members to Fedora Infrastructure
-
Provide full and detailed reports of the problem or requested service.
-
Provide clear and complete contact information and times when available.
-
Leave alternative contacts who can also be available in case of vacation or other emergencies.
-
When contacted by Fedora IT, respond back within 5 business days.
Fedora Infrastructure to Fedora Infrastructure
-
Have a clear schedule of reachable hours.
-
Set and take regular vacation time to be rested.
-
Rotate through days on-call in IRC and tickets.
-
If adding a new service, be available outside of normal business hours to help debug problems.
-
Follow procedures and checklists when adding or updating services.
-
Help with regular audits of the documentation
Definition of Service Priorities
The general design of service priorities is that of concentric circles, where items rely on services in their own circle or a circle below them.
-
Critical services are ones which Fedora Infrastructure will work on 24x7 with a 52 week coverage if an unplanned outage occurs. Services will be configured to be highly available with an estimated planned/unplanned uptime of 95%. Response time should be within 1 hour.
-
Important services are ones which Fedora Infrastructure will work to be available 24x7 with a 50 week coverage. Response times during primary hours should be 1 hour, and outside of it should be 4 hours.
-
Normal services are ones which Fedora Infrastructure will work to be available during primary work hours. Problems outside of these hours will be looked at as people are available. The services may be available outside of these but are of a lower priority than important services.
-
Low priority services are ones which are not critical or important for the primary function of Fedora Infrastructure. They will be worked on and looked at during primary business hours. Response times should be within 1 business day.
-
Third Party services are ones which Fedora Infrastructure has outsourced tools and services to. Uptimes, service hours, and coverage are dictated by the third party. Depending on the type of problem, Fedora Infrastructure will act as an intermediary, or in the case of tools like retrace and COPR, direct the user to talk with the service owners.
-
Deprecated services are ones which Fedora Infrastructure are no longer putting resources into. This may be because the project has completed its mission, the upstream software is dead, or the original reasons for the product are available. Problems with these services will be looked at during primary business hours. Responses may be mostly "Will Not Fix".
Limitations on Support
-
Some services that are associated with Fedora are provided by third parties. Changes and outages which affect them are outside the control of Fedora Infrastructure.
-
Fedora Infrastructure will prioritize issues and requests that affect multiple people or teams over a smaller group or individual.
-
Fedora Infrastructure has limited budget and hours. Requests and features will be prioritized to fit within those.
-
Fedora Infrastructure is bound by the laws and regulations of the United States of America. This means that certain requests, changes and problems are outside the ability of members to deal with.
Glossary
-
Planned outage: A planned outage is one that is announced sufficiently ahead of time to allow most users to plan around it.
-
Unplanned outage: An outage that occurs suddenly without proper allowance for users to plan around it.
-
Scheduled outage: An outage which has been scheduled to occur, but may not have been announced with enough time for users to plan around it.
-
High Availability: Systems are available during specified operating hours with any unplanned outages 'masked' by other tools.
-
Continuous Operations: Systems are available 24 hours a day, 7 days a week, with no scheduled outages. Unplanned outages are possible during this time.
-
Continuous Availability: Systems or applications are available 24x7 with no planned or unplanned outages. This is a combination of high availability and continuous operations.
-
Level of availability:
Percentage | Max outage time per day |
---|---|
90% |
144.0 minutes |
95% |
72.0 minutes |
99% |
14.4 minutes |
99.9% |
1.4 minutes |
-
Committed Hours of Availability: Hours that an organization will have staff available to help deal with issues with systems, services, and applications. Also known as "Regular Business Hours"
-
Outage Hours: Total number of hours of outage considered normal for calculating achieved availability.
-
Response Time: The time between the users notification of the problem and when the help desk will begin to work on that problem.
-
Resolution Update: The frequency of updates to tickets
Estimated Time of Resolution:
By priority Levels:
-
Emergency: Problems which are site wide, and affect the core functions of the project.
-
Urgent: Problems which affect multiple functions and groups in the project.
-
Normal: Problems which affect a single user from performing needed duties.
-
Low: A request for service, instruction, information that has no immediate impact on services.