Remote Systems Engineer III
- Location: , NC
- Start Date: 2/9/2024
- Job ID: 24-00091
- Posting Date: 2/9/2024
- Job Type: Contract
What does a day look like?
Sample Day 1:
The day starts with a quick run through email to see if any of the teams you partner with had any active questions about the monitoring & observability you guided them in setting up. From there it's onto the daily Scrum call to give the team a quick update of where you stand. It's now 9:15AM EST and you're ready to work on your tasks which all relate to adding observability to Product X by integrating it into Datadog (tagging, base agent installation, integration setup, etc). After lunch you have a meeting with the team who own Product X where you run them through the basics of Datadog while showing them their product data (from QA). After that you're back to getting the jmx integration configured and working in the QA environment. Once done you look at the out of box dashboard for JMX and notice it doesn't have some key information. So, you go ahead and copy it to add a few metrics. Now it's almost the end of the day so you take a quick peek at what other tasks you have coming. Looks like with JMX done it is moving on to adding tracing to the application tomorrow then working with the team to configure their monitors next week (and validate their ServiceNow alert mapping & routing).
Sample Day 2:
The day starts with a quick run through email to see if any of the teams you partner with had any active questions about the monitoring & observability you guided them in setting up. From there it's onto the daily Scrum call to give the team a quick update of where you stand. It's now 9:15AM EST and you're ready to work on your tasks which are focused on how we can monitor Apache Tomcat. You're exploring online to Client what the key metrics / events / logs / traces to watch for are. Then you peek at a few instances already in our observability tool to see if that information "holds water”. As you're looking you are bearing in mind the four golden signals to ensure you have coverage across them.
• Implement and enhance monitoring of the hardware & software across our ecosystem.
o Developing and improving instrumentations/integrations.
o Providing guidance on monitoring best practices.
o Providing guidance on monitoring specific hardware & software items (key points to monitor).
• Implement and enhance observability of hardware & software across our ecosystem.
o Developing and improving instrumentation
o Providing guidance on key areas to observe.
o Educating teams on how observability tools work.
• Being responsible for ensuring we provide our internal customers with the best monitoring & observability possible to aid them in raising the quality, reliability & availability of IT corporate infrastructure.
• Scripting / Infrastructure as Code for monitoring & observability implementations & enhancements.
• Engineering degree or equivalent experience and familiarity with engineering best practices.
• Working knowledge of how hardware & software interact in a corporate retail environment.
• Deeper knowledge in one or more of the following domains of hardware/software:
o Application Servers (IIS, Tomcat, WebSphere, jBoss, etc)
o Containerization (Kubernetes, VMWare, etc)
o Database (SqlServer, Postgres, DB2, Oracle, etc)
o Message Bus (IBM MQ, Kafka, Active MQ, Rabbit MQ)
o Networking (Cisco ACI, F5 Load Balancers, Firewalls, etc)
o Operating Systems (RedHat, Windows, etc)
o Programming (java, .net, pyton, etc)
o Storage Devices
o Web Servers (apache, nginx, etc)
• Familiar with Agile Scrum process.
• Ability to interact with a variety of personalities and technical skill levels across multiple product & platform teams.
• Proficient in developing and maintaining technical documentation.
Nice to haves:
• Experience with:
o ServiceNow Event Management / Service Operations Workspace
• Knowledge on the Google Site Reliability Engineering model
• Experience with Infrastructure as Code / Configuration Management tools:
• Skills in troubleshooting production environments (this is not a day to day responsibility of this role but this experience will prove valuable as we build the tools those teams utilize).
• Strong ownership attitude / track record of taking responsibility.