When people make technological mistakes

We often think sellers are perfect. They have backups. They have redundancy. They have experts who know exactly how to deploy solutions without fail. And then we see that they are no better than us.
Let’s look at some recent examples.
In the small to medium sized business (SMB) space, StorageCraft has long been a trusted backup software vendor. One of the first to make image backups easy to make, it was used and recommended by many managed service providers. Following the acquisition of StorageCraft by Arcserve in March 2021, there were no immediate major changes to how the company ran.
Then, last month, many backups in the cloud were permanently lost. As was reported Blocks and Files, “During the recently planned maintenance window, a series of redundant servers containing critical metadata was decommissioned over time. As a result, some metadata was compromised, and critical connections between the storage environment and our DRaaS cloud (Cloud Services) were disconnected. Engineers were unable to restore the required connections between the metadata and the storage system, making the data unusable. This means that partners cannot replicate or fail machines in our data center. ”
From 16 April, the status report stated: “All affected machines are now enabled and recovery points are increasing. The throttle is off and the uploads are working as usual. The time to replicate data will depend on each customer’s upload bandwidth and data volume. ”
That doesn’t help if you had an older backup that you wanted to keep in your cloud repository.
Next up, Atlassian, showed up on April 4 a total disruption occurred for approximately 400 Atlassian Cloud customers across their Atlassian products. As the company noted on its site:
“One of our standalone apps for Jira Service Management and Jira Software, called” Insight – Asset Management, “was fully integrated into our products as native functionality. Because of this, we had to deactivate the standalone legacy app on customer sites where it was installed. Our engineering teams planned to use a current script to deactivate instances of this standalone application. However, two crucial problems arose:
“Communication gap. First, there was a communication gap between the team that requested the deactivation and the team that performed the deactivation. Instead of providing the ID of the app that was intended to be marked for deactivation, the team provided the identification of the entire cloud site where the apps were to be deactivated.
“Defective script. Second, the script we used provided the ‘mark for deletion’ capability used in day-to-day operations (where recovery is desirable), and the ‘permanently delete’ capability required for data permanently removed where necessary for compliance reasons. . The script was executed with the wrong execution mode and the wrong ID list. It has resulted in the inappropriate deletion of sites for around 400 customers. ”
While these incidents may not have directly affected you, it makes sense to use them as lessons to learn from them.
First of all, always review (in your contract with a vendor or in licensing terms) what their responsibilities are and what solutions you might have if a problem occurs. In both cases, StorageCraft and Atlassian will abide by the terms they have agreed to. If you are a larger client, you can control the terms of the contract and the remedy at hand. If you are a smaller client, the end-user license agreement and the terms it contains govern what the seller will do. If you rely on a vendor and its services, plan on something that goes wrong at some point. The key is to review how vendors handle their mistakes rather than their achievements.
Will they compensate you for the value of your loss? Will they take extraordinary actions to restore you in whole or in part? Often, how quickly they deal with what happened is more important than how they handle your data.
In both cases, the fault was human error. I still remember the time I was working on a DOS computer and accidentally typed in del *. * At the bottom of drive C rather than under my intended subdirectory. Of course, it’s a lesson that stays with me to this day. Whenever I do anything to delete, I always pause and ask if I have a backup in case I make a mistake. I pause and check where I am doing the action. I ask myself if I am deleting the correct item.
Whether you are a single user or handle a network of computers (on-premises or in the cloud), always have a full backup. Consider that there are various ways you can recover data after a problem. From full backups to simple copies of directories, be flexible on how to recover data.
Then, if you are an BPA, persuade your team to double check your scripts. Often, we reuse scripts and do not audit them to ensure that they still perform as intended. Reading about the details of Atlassian failure is painful. It is clear that the teams did not communicate well and eventually accidentally deleted information they did not intend to delete. Communicating and planning a major change to your infrastructure is critical to your success.
That applies to communication from vendors, too. I am a Microsoft 365 user and often rely on two different platforms to keep track of issues. The IS Microsoft 365 Twitter account it allows me to receive alerts when there are problems. (You can download the Twitter app and set it up to receive a push notification when there is a status change.) Or you can set it notifications from the message center to make sure you are up to date. For any vendors you use regularly, check if they have any communication channels that will keep you up to date.
Remember that technology is driven by human decisions and people make mistakes. Do not assume that mistakes will not happen. Plan what you will do when sellers make mistakes. After all, they are just people.
Copyright © 2022 IDG Communications, Inc.
When people make technological mistakes
Source link When people make technological mistakes