Greetings,
Sometimes in game development you find yourselves in situations where things don’t go according to the design or plan you have in mind. In the past when we’ve hit such situations, we’ve done our best to remain transparent and take some time to explain what our intent was, what went awry, and what our next steps are. We’d very much like to carry that forward with Season of Discovery, where it’s especially important to be open and honest. Season of Discovery is, by its nature, highly experimental and we were bound to hit some snags along the way but we think it’s a best practice to simply explain what happened and learn from it.
So, we are going to really dig in deep into what has been happening with the Battle for Ashenvale, but word of warning, this is going to be an extremely long, extremely dense post. I’ll be taking time to explain all of the major systems at play so if you aren’t in for the technical details, you may want to skip to the end for info on what exactly we are doing now.
The Intent
The intent of the Battle for Ashenvale was fairly simple; use a fairly compelling carrot on a stick (Warsong Gulch reputation and a mount) to get people into Ashenvale so they could fight. In that respect, it was a success. As Ashenvale is a hotly contested zone that houses several quest hubs as well as our “end-game” raid for the level 25 level band, some conflict was assured here, but we really wanted Ashenvale to be a non-stop hotbed for PvP. The event itself has a decidedly PvE slant, which is fine, but in the lead up to and aftermath of each battle, there is essentially non-stop PvP at all times throughout the zone and we couldn’t be more pleased by how that aspect of this has played out.
How The Battle for Ashenvale works under the hood
Looking at the Battle itself though, it might help to prime the rest of this post with a look at how it functions under the hood.
- The Battle for Ashenvale cycle starts with a lead-up phase.
- During this lead-up, there is a counter that runs behind the scenes and adds up how many PvP and PvE kills occur in the zone.
- When the counter reaches a certain number, the battle begins.
- This is represented on-screen with a percentage counter UI element in the top-middle of your HUD.
- This element is replaced with an objective tracker showing the progress of the battle when it starts.
- Once the battle begins 3 Lieutenants spawn for each faction at various points on the map; Keepers of the Grove for the Alliance, and Blademasters for the Horde, as well as a general for each faction. For Horde this is a Farseer, for Alliance this is a Priestess of the Moon, and they are both invulnerable and do massive damage until all of their lieutenants are killed.
The objectives are simply to kill the opposing lieutenants and then kill the enemy general. Once this is done, the battle ends and you return to the lead-up phase and the cycle repeats.
What Went Wrong
On a realm with a single layer, there are very few points of failure with this system. We had a few minor issues in the very early days of Season of Discovery with the base functionality of this system and for the most part those were quickly resolved. The issues we are seeing now are the result of a conflict between multiple different systems that are meant to load balance server populations as well as unintended interactions between some fixes we made earlier this week with the battle.
To help organize let’s break these issues down into the main symptoms and then we’ll dive into the root causes:
- Being removed from your current layer mid-battle.
- Progress Counter Resetting back to 0 or low numbers during the lead-up phase.
- Battles becoming stalled and the objective tracker stops responding.
Being removed from your current layer mid-battle
Layering is a system that we’ve had in place since the launch of WoW Classic in 2019 to allow many times more players onto a single realm than would otherwise be permissible without severe performance degradation. Layering essentially creates fully featured copies of the entirety of Eastern Kingdoms and Kalimdor. The number of layers on a realm depends on the peak number of players logged in recently. Historically, when realms start up during weekly maintenance we start with a nominal number of layers—generally only a few. As we move closer to peak play times in your region however, additional layers would spin up dynamically as more and more people log in and existing layers fill.
The layer manager is a system that controls layers, spins them up as needed, and decides which layer players logging in will be assigned to. The layer manager will try not to move you to a new layer unless certain thresholds are crossed. In normal times outside of major events and launch windows, this almost never happens.
When the world begins to fill on an individual layer and that layer begins reaching a critical mass however, the system may at times try to load balance the population on that individual layer to an existing layer or a brand new layer if all others are full, which at times means moving people onto a relatively empty layer. The layer manager system does have some safeguards to prevent you from too many jarring layering incidents, such as preferring NOT to move people that are in a group with each other, or that are in combat. Ideally, what will happen when a group is formed is that a re-evaluation will happen when people leave and join, and the layer manager will attempt to get everyone onto the same layer and keep them there until they all log out.
The snag we run into with the Battle of Ashenvale however, is that often times a great many people are joining groups, often times using external tools like discord to coordinate. When a player leader organizes several full raids to all congregate on one layer, that layer may hit a threshold and start trying to not add more people to that layer. For a time it will continue to allow more people to join to not break up groups, but there is a point when it reaches its absolute maximum and on the next evaluation the manager will forcibly move people off of the layer to balance the load and prevent a crash or other severe performance drops. This means that even players that were grouped and would normally be protected from being moved because they are grouped may also be moved as an emergency measure. This has been one cause for players unceremoniously being booted to a different layer; the layer simply got too full and stopped caring whether you were grouped or in combat or not.
Going back to what I said a moment ago about layers spinning up as we move into peak time, one key detail is that in the past, layers would never spin down or “retire” unless we restarted the realms completely. This means that at peak times, there may be 5, 6, 7 or even 10 (or more) layers going at once on the largest realms. At off-peak times however there would still be that same high number of layers and several days after weekly restarts, there would be so many layers going that late at night or early morning the world would feel very empty. This means even if several thousand people—what would normally be a “large” realm in earlier versions of WoW—are logged in at 8:00 AM but there are 10 layers going from the previous day’s peak, each layer only has several hundred people (when it could handle several thousand) and it could feel pretty lonely.
For Season of Discovery, we decided to try something new and allow layers to automatically retire when populations dropped to “collapse” the number of total layers and make the world feel more active and lively at all hours of the day and night. We hoped to avoid those “dead” periods where the realm felt more empty than it actually was.
The unfortunate side effect of this is that every time we would retire a layer, it would suddenly cause players to change layers as the layer manager re-balanced and redistributed folks. We’ve never engaged this system before because we knew this would feel jarring, and in a game mode like Hardcore for example, being unexpectedly moved to a new layer when yours is retired could be deadly! But, like many other aspects of Season of Discovery, this was a risk we consciously took to see if we could improve the experience for those that prefer to play at off-peak times.
As you’ve likely deduced buy now, both of these extremes; too many and too few people attempting to play could cause players to suddenly be moved. This included people in groups, people trying to join groups, as well as people actively engaged in battle in Ashenvale.
As of yesterday, we have already made a change that should cause the layer manager to try harder to prevent players from layering if they are a) in a group, and b) in combat. This is not to say it will never happen. When a layer reaches a certain level, we have to split people up or it will crash. That is unfortunate but currently it’s unavoidable.
The next change we are about to make starting tomorrow is to stop retiring layers when the population dips down during offpeak. This means that you won’t be suddenly moved because your layer is being retired, but it also does mean that if you are playing very late at night or early in the morning, the world may feel sparsely populated.
Progress resetting during the ramp-up period
The next major issue involves the ramp-up period percentage counter suddenly dropping from a high percentage to a low one. The cause for this is simple; the progress tracker represents an aggregate of progress across every participating layer. This means that when you finish a battle and start making progress and other layers also finish their battles (or new layers spin up), they all join the “pool” of layers contributing to the total percentage progress.
So, if your battle ends and your layer managed to get back to to 20% towards the next battle, and then another layer entered the ramp-up phase, they would start at 0% which would drop the total progress across all layers down several percentage points as the “pool” and thus the total required number of kills just got larger. This was technically intended behavior, but it was very confusing. You’d be questing, pvping, or gathering in Ashenvale when you’d see the counter go from 70 or 80 percent suddenly down to 20 or 30 percent as other layers had their battles wrap up and they jumped into the pool.
The original intent of this was to make it so that all battles started at the same time across every layer on a realm. This was to try and prevent players from hopping layer to layer to attempt to do the event multiple times back-to-back and place additional load on the layer manager. While that was somewhat successful, the trade off was just too much confusion as the most diligent layers who finished their battle early would see their progress drop potentially multiple times during the following ramp-up phase.
To help with this, we made an adjustment yesterday to never allow the aggregated progress % to drop. New layers who joined the pool of layers that were in the ramp-up phase would just have their total progress updated to match whatever current % progress value is displayed. In a vacuum this change makes sense and seems like it would help prevent that confusion, but this is really the change that led to the most pronounced issues.
Battles becoming stalled and objective trackers not responding.
The change mentioned above created a situation where multiple layers could finish their battle, enter ramp-up again and start making progress very rapidly towards their next battle. However if one or more layers did not finish their battle before the other layers in the ramp-up period finished said ramp up period, it would cause any in-progress battles on that realm that were lagging behind to completely break. This is because the system that controls when a new battle starts would see that the ramp-up was finished and send the signal to all layers to start a new battle, including those with a battle already in progress.
In addition to breaking any in-progress battle this would likely also cause some layers on your realm to enter a veritable death-loop of never being able to finish a battle before a new one started, breaking their current battle all over again. This was even further exacerbated when a layer automatically retired itself, because then the total number of kills needed would go down and the progress needed for the next battle would suddenly jump up and make it even more likely that a new battle would start when some layers already had one going.
There’s one last detail to mention before discussing what the path forward is; and that is how additional layers impact the total progress towards the next battle. Using
very fake numbers, lets say that if you have a single layer, you need 1000 creature or player kills to trigger a battle. The way we had it set up was that if a second layer spun up, you’d need 25% more kills, so it would go to 1250 needed total between both layers. A third layer would bring the total needed across all layers to 1562 (1250*1.25), a fourth made that number go to 1925, and so on. When we originally designed the system, we initially made each layer require more kills per layer on a 1:1 basis, so using the numbers above as an example, with 5 layers you’d need 5000 kills (5 layers x 1000 kills needed per layer). We decided not to do this later in development because we had been operating under the assumption that layers never really spun down outside of planned restarts, thus increasing the kill requirement on a 1:1 basis would cause battles to take an extremely long time to trigger during off-peak times because there were many layers and relatively few players around to make progress during the ramp-up period.
All of these many words brings us to the crux of it; the system was not really designed with the idea that a) we’d ever be retiring layers other than during restarts and b) that we would get to a point where a battle could still be going on a layer when the next battle began. This is also an example of a system that could have really benefitted from a proper PTR stress test and we are going to keep that in mind for the future.
TLDR and the path forward
The next set of changes we have on deck are:
- With a hotfix tomorrow we will no longer be dynamically retiring layers. This means that late at night or early in the morning there will be more layers than are needed and the world may feel more empty.
- With a hotfix tonight we are increasing the number of kills that are needed for each layer that exists to move the overall realm percentage up.
Within an hour or so from the time this is posted, most layers should have “healed” themselves, but we will be watching it closely.
Both of the changes together do also mean that battles will spin up FAR slower during offpeak times and noticeably slower during peak times. The trade-off is that it’s now far less likely that one layer will lag far enough behind to break if the percentage counter fills up before their battle completes. We believe it should now be a few hours between battles, at least, even during peak. We need this buffer to prevent these stall-outs from happening.
This solution is not perfect though and we may consider a further redesign of this system down the line, but that’s likely too large of a change for a hotfix right now. The priority now is to stabilize things with this most recent set of changes.
Thanks so much if you read this entire post. I understand if you have questions about this as this is a lot to grok. Suffice it to say that the ultimate issue here is that we had a collision of a few different game design and server engineering systems that were both meant to enhance the player experience, but ultimately ended up damaging said experience when they clashed. We took a chance by not having a PTR. Some things in Season of Discovery were a success due to the lack of a PTR, but this was likely less successful.
Lastly, thank you so much for playing Season of Discovery with us. This season has been a true labor of love for the WoW Classic Team and while its certainly been interesting to support, we can’t wait to take the lessons we’ve learned forward to make future phases even more awesome!