DevOps Tools – Troubleshooting Guide

DevOps Tools – Troubleshooting Guide
Topic Jenkins, From Zero To Hero Become a DevOps Jenkins Master
Sourcejenkins/centos7/Dockerfile
ErrorStep 2/7 : RUN yum -y install openssh-server
---> Running in 7e18f84a2754
CentOS Linux 8 - AppStream 391 B/s | 38 B 00:00
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
ERROR: Service 'remote_host' failed to build: The command '/bin/sh -c yum -y install openssh-server' returned a non-zero code: 1
SolutionFROM centos:centos7
RemarkRelated to new CentOS 8 Stream
Topic Jenkins, From Zero To Hero Become a DevOps Jenkins Master
Sourcejenkins\docker-compose.yml
user: root
Errorjenkins | touch: cannot touch '/var/jenkins_home/copy_reference_file.log': Permission denied
Solutiondocker exec -it jenkins bash
chown -R jenkins:root /var/jenkins_home/*
Remarkdocker-compose.yml user:root to #user:root
Topic Jenkins, From Zero To Hero Become a DevOps Jenkins Master
Sourcejenkins server > Build with Parameters > Console Output
Error[SSH] executing...
ERROR: Failed to authenticate with public key
com.jcraft.jsch.JSchException: invalid privatekey: [B@95a8cd7
SolutionAdd Credentials > Kind: Username with password
RemarkIf Kind: SSH Username with private key does not work for you.
Topic Jenkins, From Zero To Hero Become a DevOps Jenkins Master
Sourcejenkins server > Build with Parameters > Console Output
Error/tmp/script.sh $MYSQL_HOST $MYSQL_PASSWORD $DATABASE_NAME $AWS_SECRET_KEY $BUCKET_NAME bash: line 6: /tmp/script.sh: Permission denied
Solutionvolumes:
      - "$PWD/aws-s3.sh:/tmp/script.sh"
chmod +x aws-s3.sh
RemarkMake the script permanent outside of docker container by using volumes but remember to make it executable.
Topic Jenkins, From Zero To Hero Become a DevOps Jenkins Master
Sourcejenkins > docker-compose build
Error/bin/sh: 1: python: not found
ERROR: Service 'jenkins' failed to build: The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python get-pip.py --user && python -m pip install --user ansible' returned a non-zero code: 127
Solution# Dockerfile
USER root
RUN apt-get update && apt-get install -y python3 python3-pip
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python3 get-pip.py --user && \
python3 -m pip install ansible
RemarkTutorial videos based on Python 2 and outdated installing Ansible with pip commands.
Topic GitLab, CI/CD Getting Started
SourceGitLab > (your project) > CI/CD > Pipelines
ErrorPreparing environment
Running on devops…
ERROR: Job failed: prepare environment: exit status 1. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
SolutionComment out last 3 lines /home/gitlab-runner/.bash_logout
# ~/.bash_logout: executed by bash(1) when login shell exits.
# when leaving the console clear the screen to increase privacy
#if [ "$SHLVL" = 1 ]; then
#[ -x /usr/bin/clear_console ] && /usr/bin/clear_console -q
#fi
RemarkDelete the file works too.
Topic GitLab, CI/CD chmod: unrecognized option ‘—–BEGIN’
SourceGitLab > (your project) > CI/CD > Pipelines > Deployment stage
Error$ chmod og= $ID_RSA
chmod: unrecognized option '-----BEGIN'
Try 'chmod --help' for more information.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1
SolutionCI/CD > Variables > Update variable
Change Type drop-down to File
RemarkApply chmod on a file instead of file content.
Topic GitLab, CI/CD Load key “/builds/.username./my-proj.tmp/ID_RSA”: invalid format
SourceGitLab > (your project) > CI/CD > Pipelines > Deployment stage
Error$ ssh -p2288 -i $ID_RSA -o StrictHostKeyChecking=no $SERVER_USER@$SERVER_IP "docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY"
Warning: Permanently added '[[MASKED]]:2288' (ECDSA) to the list of known hosts.
Load key "/builds/username/proj.tmp/ID_RSA": invalid format
Solutionuser@server:~/.ssh$ ssh-keygen -p -m pem -f id_rsa
RemarkRequires PEM format (-----BEGIN RSA PRIVATE KEY-----) instead of (-----BEGIN OPENSSH PRIVATE KEY-----)
Topic GitLab, CI/CD Permission denied (publickey,password)
SourceGitLab > (your project) > CI/CD > Pipelines > Deployment stage
Error$ ssh -p2288 -i $ID_RSA -o StrictHostKeyChecking=no $SERVER_USER@$SERVER_IP "docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY"
Warning: Permanently added '[[MASKED]]:2288' (ECDSA) to the list of known hosts.
Permission denied, please try again.
Permission denied, please try again.
user@[MASKED]: Permission denied (publickey,password).
Solutionuser@server:~/.ssh$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
RemarkAdd the public key (generated within) to the deployment server itself in order for it to accept the RSA private key from GitLab CI/CD variable.
Topic GitLab, CI/CD stages: -publish (passed) but -deploy (did not happen)
SourceGitLab > (your project) > CI/CD > Pipelines > under Stages
ErrorSubsequent stage – deploy did not process even after -publish stage passed
Solutiononly:
#- master
- main
RemarkIn March 2021, GitLab renamed the default ‘Master’ branch to ‘Main‘ for new projects.
TopicGitLab, Install a runner in CentOS 7
Source$ gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
ErrorFATAL: flag provided but not defined: -user
Solution$ sudo yum install gitlab-runner
RemarkUsing GitLab runner installation instructions which works for Ubuntu but not for CentOS 7.
TopicGitLab, Install a runner in CentOS 8 Stream
Source$ gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
Errorsudo: gitlab-runner: command not found
Solution$ whereis gitlab-runner
gitlab-runner: /usr/local/bin/gitlab-runner
$ sudo visudo
Modify: Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin
To: Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin
Remark sudo visudo is reserved for updating /etc/sudoers
TopicLimited access (non-root) user: AWS API Error
SourceEC2 Dashboard
Error[AWS EC2]
Instances (running) x API Error | Dedicated Hosts x API Error
Instances x API Error | Key pairs x API Error
$ terraform apply
Error: Error launching source instance: UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: bgJ7KtftIXVield6dlQyqxtJ
SolutionDelete and re-create the (non-root) user and re-assign required permissions policies.
Remarkstackoverflow.com mentioned keys got revoked because AWS detected access key/secret key was exposed/ compromised.
TopicAWS Application Load Balancer – 503 Service Temporarily Unavailable
SourceTerraform (main.tf) > AWS (EC2/ALB) > Browser
Errorhttp://terraform-asg-example-15464xxxxx.us-east-2.elb.amazonaws.com/
503 Service Temporarily Unavailable
Solution# Find aws_autoscaling_group resource and add this line in Terraform main.tf
resource "aws_autoscaling_group" "example" {
target_group_arns = [aws_lb_target_group.asg.arn]
...
}

[Manual fix]
Go AWS EC2 > click Target Groups > click affected group > Register targets > Check Instance ID (e.g. EC2) > Include as pending below > Register pending targets

Go to browser and retry http://terraform-asg-example-15464xxxxx.us-east-2.elb.amazonaws.com/
RemarkThe target group is created but contains no EC2 instances hence HTTP Error 503 is returned.
TopicPrometheus – err=”opening storage failed: mmap files
Source Prometheus on Docker > docker-compose up
Errorprometheus | level=error ts=2022-07-20T13:20:57.653Z caller=main.go:787 err=”opening storage failed: mmap files, file: data/chunks_head/000476: mmap: invalid argument
Solution[List all volumes]
docker volume ls -q
(prometheus is the container name and prometheus-data is the volume)
prometheus_prometheus-data
[Delete prometheus volume only]
$ docker volume rm prometheus_prometheus-data
RemarkTo delete data/chunks_head/000476 is one of the solutions. However, Prometheus is using Docker, trying $ docker exec -it prometheus bash will result in Error: No such container: prometheus because it fails to start up. This will result in historical data loss but configurations remain intact
Topichttp port [8000] – port is already bound. Splunk needs to use this port.
SourceCentOS 8 | Splunk v9
Error$ sudo /opt/splunk/bin/./splunk start
Checking http port [8000]: not available
ERROR: http port [8000] - port is already bound. Splunk needs to use this port.
Solution$ sudo dnf install nmap
$ sudo nmap localhost
Not shown: 992 closed ports
PORT STATE SERVICE
22/tcp open ssh
8000/tcp open http-alt
Remarknetstat -an | grep 8000; fuser -k 8000/tcp; lsof -i TCP:8000; grep -rnw '/etc/httpd/conf.d/' -e '8000'. All these commands will not find the associated PID using port 8000 except when using nmap shows it is alternate http
TopicRemove ‘already committed’ .vscode directory from Git repository
SourceGitLab
ErrorMany existing projects have committed and uploaded the sftp.json file which stores servers login details.
SolutionCreate .gitignore to not push hidden folders and files by adding
.*
!/.gitignore

Above step is preventive but to remove .vscode folder already in Git repo, go Git Bash
git rm -r --cached myFolder
Finally, commit and push all changes.
RemarkSFTP is a useful plugin for Visual Studio Code but the sftp.json file will get push to Git together with the rest of the project files if no .gitignore is deploy.

This post is not the end, for we will continue to add more troubleshooting guides as we continue our exploration with DevOps tools.

Last updated 30 Jan 2023

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *