In the Garage: Discord AMA on Forza Race Regulations (FRR) with the Simulation Team
11 October 2024- Turn 10 Studios
Catch up with our Forza Motorsport developers’ latest work on Forza Race Regulations (FRR) and go behind the scenes of this feature!
The following article is adapted from a presentation and AMA (“ask me anything”) hosted in the Forza Official Discord on September 26, 2024. The featured guests are T10 LoungeToy and T10 OG. These remarks, explanations, and responses to Discord audience questions have been edited for length, clarity, and accuracy.
Intro
(Speakers) T10 LoungeToy and T10 OG: We’re both co-leads for the simulation team at Turn 10 Studios. Some of the areas under our umbrella include artificial intelligence (AI)-related aspects, the penalty system also known as FRR, as well as all the physics in the game such as the vehicle dynamics and how tracks show up. We also study the tuning and mapping of all the possible input methods (game pad controller, wheels, other peripherals such as the Xbox Adaptive Controller—did you know you can even play the game with a guitar?). Force feedback also ties directly into our physics work. We’re responsible for car balancing and car division management. So what cars go to where? How do they feel? How they actually feel out-of-the-box, and what the tuning limits are, and that sort of thing.
T10 LoungeToy: That said, today’s presentation will be focused on FRR. FRR is one feature that is a big deal for us; we’ve invested in it for years, since 2017 all the way through now. FRR has gone through many different iterations, but I wanted to give you a high-level overview about where it is today, and how we think about FRR, and hopefully give you a good general understanding of where we are with the current release.
Hopefully we can do some more of these AMA sessions in the future and dive deeper into topics like AI, physics, input, car balance, car division management, or any other area of the game anyone is interested in hearing about. We’re hoping to shed some light on the inner workings here at Turn 10, how we’re thinking about our systems, and just being transparent about where things are and, of course, help you understand where we’re going in the future.
What is FRR?
T10 OG: Wait, does everyone know what FRR is?
T10 Lounge Toy: Good point. FRR stands for Forza Race Regulations, and it’s Motorsport’s adjudication system used for identifying, measuring, and assigning faults for collisions and off-track penalties.
The collision system is based off a machine learning system, which is trained by a bunch of collision data that we use to assign faults. Interestingly, the off-track penalties are calculated with a different technology, and we’ll explain that later. But the collision data that actually makes up the collision system is generally created by players like you. Every single collision in the game is tracked and sent to us, and that makes up the majority of the data that we actually use to help inform FRR. Additionally, some data is contributed from our internal playtests and from experts on the team like T10 Raceboy, to focus on certain areas of the collisions. In a lot of ways, T10 Raceboy is how we judge our FRR (we’ll explain that later, too!).
Sometimes we get literally over a million collisions a day, right? Someone actually has to go through every single incident that we want to put into the FRR model and label it, manually. That’s a very time-consuming process, and we’ll actually show you the tool that by which we do that. FRR’s fault determination and improvement is truly dependent on all of this manual data labeling.
Alright, let’s talk about our time penalty systems. We have time penalties for collisions and time penalties for track cutting/going off-track, and they have some differences.
Time penalties for collisions are based on a severity system. That severity system is mostly influenced by the speed delta (basically, the velocity of impact that occurs). If one player’s going 200 miles an hour, and another player is going 20 miles an hour, that’s a huge speed delta, and that’s going to have a major impact on the time penalty that is given to the player. Additionally, we have a tool called a “speed calculator” or “speed computer,” which has an expectation of what your speed should be at any time, based off our AI driving as fast as they can with that particular car or tune. If you’re over that expected speed, that also impacts the time penalty that is given. If the other player loses a place or loses time from the collision, that data is also factored into the severity calculation.
For off-track penalties, we use the (aforementioned) “speed computer,” which knows how fast any car should be going around any corner or any part of the track at any given time. If you cut the track, or get pushed off-track for any reason, the speed computer compares your speed or time to the expected speed or time. Penalties are given based off any time gained or speed over the expected speed there.
These are the basics of how FRR works, how we generate data for it and tune it, and how time penalties are given.
Now that you know the basics, let’s move on to the advanced course! The portion of FRR that detects fault for collisions currently has an estimated accuracy around 89% (improved from 80% at launch). Here’s a graph so you can see how this has changed over the past year.
(On the horizontal axis, each sequential month is represented by the first letter in its name.)
When we launched, 4 out of 5 collisions were, in our estimation, adjudicated correctly, which also means that 1 out of 5 was incorrectly adjudicated. You, as a player, could have been incorrectly given fault for a collision (false positive), or you could have been incorrectly given no fault for a collision (false negative). Both cases are still considered incorrect FRR rulings.
As of the most recent release, we are around 89% accuracy overall. Now I just want to make it clear: we’re not done. We’re heavily investing in this area, and we’ll talk later about our current targets and when we hope to get there. The thing that I really wanted to make clear here is, we learned a lot about how to improve the model over time. And when we launched, we were not quite at the place we hoped we’d be. And we did not really understand how much time it would take for us to improve the model. But after a lot of learnings in the last few months, we have, I think, a really good plan in place to improve.
We believe the biggest issues that players have with FRR right now are these false positives that show up in very specific racing incidents. As an example, I think that the classic “meme’d” FRR issue right now is when someone gets rear-ended and gets a 4- or 4.5-second penalty. It’s not their fault, because someone just ran into the back of them, but now they’re like, “hey, why did I get a penalty?” And this situation is exactly what we’re trying to solve right now.
So, a very interesting thing happened over all of the releases. In this next graph (below), you can see a spike around March (2024).
(On the horizontal axis, each sequential month is represented by the first letter in its name.)
As of March, even though the overall accuracy of the model was getting better (per the first chart we showed, which was for overall accuracy), specific false positives were getting way worse, now occurring as many as 10 times per 100 collisions a player encounters (compared to launch, when we had about a 4% false positive return, or 4 times out of 100).
Consequently, there was a huge outcry around FRR in March, and so we responded and changed our thinking around what was important to improve. We really set our sights on improving the accuracy of false positives as our main priority. Now, as you can see up till now, we have increased both the overall accuracy as that previous chart I showed, and also significantly decreased false positives (currently at their lifetime lowest rate of occurrence).
We have a custom-built tool that visualizes every incident in the game. So anytime there’s a collision—I mentioned earlier that we sometimes have well over a million collisions every day-- we have a way to visualize exactly what that collision looks like. We have a tool that people like T10 Raceboy can use to look at an incident to label who’s at fault, and what’s the severity of the collision. This is kind of a first for us, showing some of our fundamental dev tools which really don’t showcase awesome graphics.
So, for the first time, here is our FRR labeling tool. Like I said, it’s not pretty, but when we get any collision sent to us via FRR, it looks like this example clip, and we can play it back and forth:
see blog for video: In the Garage: Discord AMA on Forza Race Regulations (FRR) with the Simulation Team
- The tool shows two cars racing in a top-down view. Each has a colored bubble that changes between green, yellow, and red–which indicates their speed relative to the expected speed of the speed computer.
- If the car goes into a corner too hot, the bubble’s color will change from green (“within the expected speed”) to yellow (“not quite over the expected speed”) to red (“beyond the expected speed”).
- The blue bars coming out of the car are an indication of steering input. At the start of the clip, the red car has a lot of steering input to the left, while the green car has a little bit of steering input to the left, as they’re navigating this corner.
- Lastly, the arrows in the front of the cars are an indication of how much throttle they have at any given time. The green car has what looks to be nearly full-throttle, while the red car has maybe one-quarter or one-half throttle there. When they brake or if there’s any type of deceleration to the car, those arrows will appear in the rear direction.
As you may be able to see in the top right corner of this interface, our devs can come in here and label this incident. In this case, we have labeled the red car as being at fault, with a severity rated a “2” out of 4 severity levels.
One other thing to note there, you may see the green car jittering around a little bit. And what you’re actually seeing may be latency issues. Latency is taken into consideration every time we are labeling the data.
What’s Next for FRR
T10 LoungeToy: The biggest thing we’ve done in the last, I think, 6 months is really get come to grips with what it takes to make the model better. I’ll talk about that in just a second, because it’s a lot of effort to make it better and establish rough goals for us of where we believe it needs to be at, so we can have the best FRR adjudication system.
We’re currently around 89% overall accuracy, and we believe that we need to get to at least 95% overall accuracy to be in a really good place. But more importantly, we also have a 99% goal. The 99% goal is more about what we call “specific accuracy” or specific incidents. Our goal is to make sure that those braindead-simple incidents are 99% accurate for us.
T10 OG: “Braindead-simple incidents” meaning, my mom could look at the incident, she knows nothing about racing, but she would know who’s at fault.
T10 LoungeToy: Before we go into the specifics of how we’re getting to 95-99%, there are a ton of related features to FRR that we are iterating on. Some of you may know that we have some new ghosting features that will be showing up in multiplayer, both as an option within Free Play, as well as part of our Featured Multiplayer series. Situations such as returning to track, or going really slowly on track, or really high-speed ramming incidents, will automatically be ghosted in the near future. So that’s going to have a big impact on how FRR is functioning and how people are perceiving the overall accuracy of FRR. Cleaner racing through some of the ghosting technology we are working on should have a pretty significant impact here.
Of course, matchmaking with cleaner racers should make for cleaner racing. We have a lot of work to do here, and that’s related to our safety rating and skill ratings, of course. Interestingly, when we recently shipped multi-class as a function of our Featured Multiplayer, we saw really, really good and clean racing across the board there. And that has a lot to do with the race starts and how the field really does get spread out a lot more before Turn 1 in multi-class racing. It cleans up a lot of the racing. And this wasn’t intended. This is just a result of having this style of racing and this type of race start. So that also has a big impact on things like FRR and people’s perception of it.
Lobby size, I think our average lobby size over the day right now is 11 or 12 and that’s been pretty consistent since launch, I think. And that lobby size, we can take a look at and push it up or down depending on how people are feeling. Of course, the smaller the lobby sizes, the potentially less fun racing that people can have, but also the cleaner racing that it potentially could have, right? So that has a big effect on perception of FRR. And of course, we have a lot of work to do with our safety and skill ratings in terms of how they’re showing up, and things like people’s safety rating after they get out of the first couple of races, and their overall safety rating. We’re looking at a lot of stuff here.
Lastly, while we actually do have racing etiquette guidelines that really govern how we think about assigning fault for FRR, we’ve never published those and we currently don’t really give any of our players a good idea of like, “hey, you want to join multiplayer? Here’s how you should think about other players on the track.” We are figuring out ways that we can promote better racing through understanding what’s expected of players.
Going back to our FRR goals timeline: we’re hoping to get to 95% accuracy by the end of this year. That is a huge and ambitious goal, because you can see how long it’s taken us to go from 80 to 89%. It’s basically taken us a year.
One of the main issues here is that the labeling process is incredibly manual. And up until this point, we have basically had one person doing most of the labeling for us. He’s in the audience here, T10 Raceboy, and he’s a limited resource, right? He does a bunch of stuff for the team and for Motorsport all up. But what we realized is to get to that 95%, is to bring on a host of additional resources here to help label all of this data because like I said, we have well over a million of these things coming in every day.
We have added a number of additional people full-time here to really attack this problem, and they’re reviewing up to 1000 or so incidents per week at this point. Our expectations are, going forward, that they’re going to be labelling anywhere between 2000-3000 a week. It’s a huge part of how the overall accuracy of this model gets better. So now we have well over four times the resources attached to labeling this data and making it better. They just started and are affecting this as of September. I think that’ll really dramatically help us get this model better.
The second change here is about the different models that actually drive FRR. Without going into too much detail, we’re actually on our third model of FRR. The most important thing to understand is there are new data points that the model is now tracking that might help us both get better accuracy, and also filter down to specific types of incidents.
In our first model, we didn’t have the data that tracked where the collisions on the cars were happening. We couldn’t really tell if a car was rear-ending a car from the front of the car, side of the car, back of the car, things like that. That wasn’t part of the initial data set. We initially believed that all the other inputs (throttle, brakes, steering inputs, expected speed) were the correct inputs to solve for. Turns out it’s easier for us to filter down to the types of collisions we’re looking to improve if we have additional data being tracked.
So now I think with Model 2, we added a bunch of data. Same thing with Model 3. We now know the specific portions of the cars that are colliding. What that allows us to do is to filter out for only those types of collisions going forward. So we can say something like “today, we’re only going to look at front-to-rear collisions from last night and just label those.” And that’s a huge change for us because we can now go in and attack “specific accuracy” and go, “hey, we know people are having some issues with getting rear-ended and having false positives there.” We can now filter down to those specific types of incidents and make those better and have a focused effort there.
So the long story here is that the overall accuracy number may be less important, going forward, because of those related features I mentioned, but more importantly in regards to the “specific accuracy” that we were just talking about. If we get to 99% for the specific types of incidents we’re looking at, and people aren’t complaining about the braindead-simple ones that T10 OG was talking about, I think we’re going to be in the right place.
We have a bunch more people and resources assigned to the labeling portion of this. And I think we have the right tools and technology now to get us to that 95-99% accuracy overall with FRR.