WinterNode - Notice history

All systems operational

Notice history

Sep 2024

No notices reported this month

Aug 2024

MC.LON3 Unavailable
  • Resolved
    Resolved

    Dear Customers of MC.LON3,

    We are pleased to inform you that the MySQL database has been successfully migrated to MC.LON5.

    If you are using our MySQL Databases, you will need to take the following actions:

    • Reset your MySQL passwords by visiting the Databases page.

    • Reconfigure any plugins or mods to the new MySQL Database host, as well as update the password to the newly assigned one from the above bullet point. The current passwords that were used on MC.LON3 no longer work.

    Your IPs have changed, but your ports remain the same. 💙 If you are using a WinterNode provided subdomain, you will need to re-create them to ensure they point to the new IP and Port combination assigned to your server. If you are using your own domain, please ensure your DNS records are updated accordingly. Additionally, for customers with a Dedicated IP, we’ve added extra dedicated IP ports for your usage. 🤣

    As part of our effort to resolve this issue, we will automatically apply a 50% credit of your service renewal amount to your account within the next 24 hours. No action is required on your part, and the credit will be applied to your next invoice, adjusting any automatic payments.

    We are also happy to announce that MC.LON5 boasts improved hardware, offering better performance for your services.

    We truly appreciate your patience and understanding as we worked through this incident over the past few days.

    All access has been restored. If you need any assistance, our Support Team is happy to assist you in our Discord Server.

  • Update
    Update

    We have successfully switched over the connection of MC.LON3 to MC.LON5. Server files have been restored and you may able to see them within the File Manager (not SFTP at the moment since we still need to make changes on the admin-side) assuming the node is not currently on maintenance mode.

    Please do not interact with your server yet, as we still need to re-allocate the IPs (unfortunately, you will have new IPs assigned to you), you will also need to delete and re-create your subdomains to point to the new IPs, and we still need to restore the MySQL database.

    We're crossing our fingers 🤞 and knocking on wood that we are at the home stretch! As always, we will keep you posted.

  • Update
    Update

    We have successfully email subscribed all customers of MC.LON3 to this incident.

    We are currently in the process of restoring customer data of MC.LON3 from the August 28th @ 2AM Central Time backup to our new MC.LON5 node. We were also fortunate enough to slowly grab a MySQL backup as well in-between the constant unexpected restarts of MC.LON3, but would like to remind customers that MySQL backups are not taken.

    At this time, we are working with our Panel Provider to restore customer access to go through our new instance.

    Compensation in the form of account credit is already planned for those customers who are affected by this incident.

    We appreciate your patience and support as we get through this incident together.

    The MC.LON3 machine is also still currently offline as of this update. If it does come back online, it is not recommended to make any changes as access is being switched over to MC.LON5 and/or any data on MC.LON3 will not be transferred over.

  • Identified
    Identified

    Bringing this incident up to speed...

    Throughout the night and morning of August 29th, we have been observing numerous machine restarts. At the speed of the unexpected reboots, it is difficult to diagnose the node or disable services from starting. This issue is still occurring at this time.

    Around 9AM Pacific Time today, we used every communication channel with our provider to express our frustration and to provide more information to have our service provider re-evaluate the issue as NOT software related.

    At around 11:59 AM, our provider's hardware diagnosis flagged the test of rebooting back into the Customer OS as "DOWN!!!!". When they attempted to swap to another spare server, they discovered SMART errors on both drives. You can read more about what SMART errors are in this article provided by Seagate - https://www.seagate.com/support/kb/my-system-reported-a-smart-error-on-the-drive-184619en

    Within a few minutes of receiving this notice, our team made the decision to have the intervention team attempt to replace one drive to we can assess the situation and at least bring the Operating System back online. We have also asked in a separate ticket to attempt to replace the RAM hardware as well. This request was closed as the drive replacement request was still underway.

    We have just received the following communication regarding the drive replacement request.

    Date 2024-08-29 21:38:05 BST (UTC +01:00), Component replacement:

    After deep troubleshooting, the smart errors in the disks have been caused by raiser card,

    Replaced the raiser card, tested multiple times the disk in the server and no errors have been shown,

    sent back to rescue customer

    ping ok

    ipmi ok

    However, at this time, we are still observing unexpected reboots/machine down alerts. We are still discussing internally our options.

    We will be keep our customers updated through this incident. We apologize for the inconvenience, however, this issue is not in our direct control.

    We recommend subscribing to this incident through email - https://status.winternode.com/cm0fynyjy00271jjf1rhsvohj/subscribe/email

  • Investigating
    Investigating

    Following the previous incident, an automated alert has been triggered by our monitoring system at 11:21 PM Pacific Time, August 28th and our team has been notified. We are currently investigating this incident and monitoring our communications with our service provider to ensure intervention takes place.

MC.LON3 Unavailable
  • Resolved
    Resolved

    At 7:20 AM today, we received a response back from our service provider that they still believe this is a software related issue and that we need to consult a "Linux professional" for further assistance.

    We have applied the latest patches to the node to reduce the likelihood of it still being a weird software bug. However, further intervention may be required on our end.

    Since the last reboot detected at 4:46 PM Pacific Time, August 25th, 2024, we have not received any new notifications of unexpected reboots or issues with customer's services. For this reason, we will be closing this incident and evaluating our options going forward.

  • Monitoring
    Monitoring

    The technical support team of our service provider is only available from 8AM-6PM Monday to Friday. While we have tried other avenues, we cannot access the same team who handles interventions, unless one is active. However, our ticket was prioritized. and should hopefully be answered at the beginning of their shift in about 3 hours from now. Since our last update, a few more unexpected reboots occurred but has stabilized the last few hours.

    We will still await the response from our service provider to ensure we can reduce the likelihood of another long incident.

  • Update
    Update

    We have requested more information from our service provider about this issue. While it appears a reboot of the machine has re-established network connectivity and allowing customers to turn on their server. It is asked that customers on MC.LON3 refrain from turning on your server if you are expecting it to stay online for awhile, to avoid data loss from non-graceful shutdowns.

    We will continue to monitor the situation from our side and relay any additional updates as they arise. We appreciate your patience.

  • Investigating
    Investigating

    Another automated alert has been triggered by our monitoring system at 2:55 PM. An intervention request has already been automatically opened and subsequently closed at 3:05 PM with the test results and that it is believed to be "software related and cannot be fixed by the DC technicians".

    For context, when the service provider closed their intervention at 1:43 PM Pacific Time, they conducted a hardware replacement to one that has "recently passed our extensive preparation checks"/"known-working spare server" and moved our drives over from the previous machine.

    Our team will now take a look at the IPMI and investigate why this might be the case.

  • Monitoring
    Monitoring

    Our service provider has concluded their intervention as of 1:43 PM Pacific Time. Since then, we have diligently made the necessary changes on our side to bring back networking to the machine. We will be monitoring for any additional issues. All customers on MC.LON3 are encouraged to power-on their server and notify our Support Team if they face any issues starting their server.

    Further information about this incident will be provided in the "Resolved" Incident Update.

  • Update
    Update

    As of 11:33 AM Pacific Time, we are still waiting for intervention by the service provider. Our monitoring system has been reporting brief moments of up and down statuses for ping requests. We are unable to remotely check the server status until the intervention has concluded. We apologize for the inconvenience.

  • Investigating
    Investigating

    An automated alert has been triggered by our monitoring system and our team has been notified. We are currently investigating on this incident our side and with the provider.

Jul 2024

Undetermined problem - details to follow
  • Resolved
    Resolved

    We experienced an issue with our network distribution switch providing connectivity to our Chicago Infrastructure. After restarting the switch, connectivity was restored to all nodes.

    If you are still experiencing issues, please let us know through Live Chat or Discord.

  • Investigating
    Investigating

    We have detected a problem related to WinterNode.com Chicago services and are actively investigating.

    More details to follow.

Jul 2024 to Sep 2024

Next