New Triaging Process Proposal

The current triaging workflow we have at Blender was created late in 2019. Sergey and I got together once again to try to revisit this process to the current team dynamic.

At the core of that proposal was:

  • Separate Reports, Bugs and Known Issues:
    • Reports: Issue/ticket created by a user yet to be confirmed or simplified or reproduced.
    • Bugs: Issues the team planned to work on (in the coming 6 months).
    • Known issues: Limitations of the software that wouldn’t be addressed in the near future.
  • Classification:
    • Isolate developers from the continous influx of reports.
    • Have module owners and members assess final priorities/known issues.

What worked well:

  • The triaging team grew into an essential role of flagging regressions and other high priority issues the modules should pay attention to.
  • Modules were not distracted anymore by all the new daily reports which sometimes had incomplete information or were already fixed.

What didn’t work well:

  • Some modules never got to classify their issues (see metrics).
  • Some tasks marked as bugs were not really looked at in the expected time-frame.
  • Known issues were never re-visited by modules and were just adding noise to the issue tracker.
  • High priority issues were not tackled for months, going against the understand of what a priority is. In a lot of times having a severity level instead would have been more accurate.

Proposal

At the core of this proposal the idea is to:

  1. More clearly communicate whether users should expect a fix, and when.
  2. Simplify the process by removing the classification step.
  3. Empower the triaging team even further - the moment a triager confirms an issue, it is already considered a bug (modules can still edit an issue afterwards though).

In practical terms this mean:

  • Remove the “Classification” step in the triaging process.
  • Rename “Priority” to “Severity”.
  • Remove “Report” type (leaving only bug).
  • Archive all Known Issues.
  • Encourage modules to use milestones.
    • Milestones help to communicate planning and sense of urgency for issues.
    • A high severity issue with no milestone should get the module attention asap to assess how important it is and when will be tackled.

The proposed severities and their definitions are:

  • Unbreak Now! : Stop everything to fix it (e.g., data corruption).
  • High: Fix as soon as possible (e.g., regressions).
  • Medium: Planned as part of development.
  • Low: Good to have, but not planned (good target for contributors).

Metrics

Unclassified Reports over time:

Pain Points

The main point of dispute is whether known issues:

  • Should be archived.
  • Left open.
  • Added to a “Won’t be fixed/Never” milestone.
  • Removed as a type.

At the moment I (Dalai) am more inclined to keep them as a type (for metrics), and close/archive them.

Feedback

I would love to hear from module owners, members, the triaging team as well as the community.

14 Likes

Overall, I agree with the points made. I think the Known Issue question is a difficult area to handle. Speaking in concrete numbers, of the current 6571 open issues, 1141 of them are tagged with “Known Issue”, making its usefulness as a queue or even current baseline state rather dubious. On the other hand, I think having the label and having the ability to keep an issue open for a long time can help with communication and eventual prioritization / planning of issues that may otherwise go missing.

I don’t think removing it as a label entirely is a good idea. Intuitively, “Known Issue” sounds like bugs or limitations due to the current architecture of various areas that take a concerted amount of development to fix, so maybe the guidance around them should be to rename and fix up the original post so that teams can deduplicate issues into them more easily and provide a better sense of what large areas need to be tackled by the modules.

Known issues need to be handled case-by-case imho. For example, known issues about old simulation systems (which in hundreds now I think) can be closed, because plans of the module doesn’t involve fixing them, but replacing them with new systems. But there can be known issue for example about Multires, which has no final plans or active maintainers and will be good to keep the open for interested parties, and also so that they can be easily found and evaluated by module in the future.

So I think modules can decide if known issue can be closed or not based on if fixing it is “not right now”, or “never, not in our designs anymore”.

3 Likes

I’m totally with you, except for three parts:

Rename “Priority” to “Severity”.

I don’t see any actual value added by renaming it. Instead it could appear to issue readers, that severity medium is actually quite bad (which often times it’s not necessarily as it’s only “Planned as part of development.”). Priority on the other hand gives a better impression on when it may get fixed (where the report is on the todo stack).

Archive all Known Issues.

This i personally find quite tricky. Yes, it adds noise and clutter to the tracker. However like Priority Low Bugs they may get fixed eventually. And even if not, they’re there and not forgotten and so easier to search for and reference if necessary. An archived known issue appears to me to be resolved (either by a fix or by circumstances due to newer architecture etc). Anyways the usefulness of older reports (the oldest known issue is 11 years old) is quite questionable.

Thus instead of closing them directly (which only hides the problem) i propose to leave them in the tracker for at least the duration of the nearest to be referenced LTS version up until 5 years (when i think the usefulness it not really given anymore).

More clearly communicate whether users should expect a fix, and when.

In my opinion this could easily backfire and result in a public understanding of lots of missed deadlines. Instead the prioritization system could be described in general in the bug report template.

2 Likes

I do like Rename “Priority” to “Severity”. because there are high priority bugs that don’t really see an immediate fix (like some of them require systematic approach to change a bigger part) Or, we could add a separate tag for this and keep the priority tag.

I’m not sure if archiving all known issues is the right approach. A having a open, confirmed known issue seems nicer than “This is a known issue that can be improved, closing and adding it to the pile of 90,000 other closed issues”. So maybe leave known issues where it is and it could be re-evaluated at a later date? Or maybe someone can provide a good argument for closing all known issues.

As for the “Encouraging modules to use milestones” thing. It seems good, but I believe the use of milestones in the past has conflicted with how the release manager evaluates if a version of Blender is ready for release.
For example the triaging team were adding 4.2 bugs to the 4.2 milestone. Some of them were small enough that the fix could be delayed to a corrective release, while some were bad enough that they should be fixed before 4.2.0 releases. Without investigating each report, it was hard to tell what type each report fell into, which impacted the judgement on whether or not Blender 4.2 was ready for release. So the rules either need to be clear, or some other changes need to happen at the same time to align with the release managers workflow (E.g. A label could be added to signify that a version of Blender should not be relased until this issue is fixed. In theory Unbreak Now! meets this criteria, but it’s used very infrequently and maybe that should change).
Obviously I’m not the release manager, so what I’m saying could be very different from what the release managers percieves.

As brought up by @Bujus_Krachus in the previous comment, medium severity might seem too bad to the average reporter. Just naming it “Normal severity” might be better.


As for everything else mentioned, it seems alright.

2 Likes

While I generally agree that triaging/bug processing could be simplified a bit, I can’t say I agree with much of this proposal.

  1. More clearly communicate whether users should expect a fix, and when.

This is already possible with current system (priority + milstones), I’m not really convinced that the proposed changes are going to improve anything here. IMHO the main issue regarding this point is the lack of discipline in the dev team regarding management of the tracker.

  1. Empower the triaging team even further - the moment a triager confirms an issue, it is already considered a bug (modules can still edit an issue afterwards though).

Generally seems fine to me.


I do not understand that point. You can remove it from the doc, it will still happen anyway! Maybe you could rather rename it as the ‘Investigation’ step, but there is still need from module team to check reports, see what is actually happening, if this is actually a bug, if they are assigned to the correct module, decide how critical they are, decide who should work on it, and so on.

  • Rename “Priority” to “Severity”.

Not against it, but also do not really see the point. Besides change for the sake of change. Or maybe aligning with industry standards like CVE’s ?

  • Remove “Report” type (leaving only bug).

Agreed.

  • Archive all Known Issues.

Very strongly disagree. This type is for issues that cannot be handled as bugs, because they require a lot of work, most likely some UI/UX and/or technical design, etc. They need to be kept around for at least two reasons:

  • Information: Both for users hitting again and again the issue, and for the triaging team.
  • Reference for future design & development, when a project will be started in the affected area.

If you really want to add milestones to everything (which I don’t think we should do, see point below), then adding a new ‘Undecided’ or so milestone would be necessary. Definitively not a won't fix one - issues that we know won’t be fixed should be archived indeed. But I think this mostly concerns deprecated and unmaintained areas of code.

  • Encourage modules to use milestones.
    • Milestones help to communicate planning and sense of urgency for issues.
    • A high severity issue with no milestone should get the module attention asap to assess how important it is and when will be tackled.

While not really against it, I’m not sure how this is going to help much…
On one hand, previous releases have shown us that a lot of ‘milestoned’ tasks were not handled properly, leading to having tens of them still listed a few days before the release itself.
On the other hand, fixs for the vast majority of bugs do not have a predictable schedule. They tend to happen ‘on rainy days’ or during bugfix sprints, i.e. when the devs ‘have time for it’, which does not happen often.
The only bugs I would really expect to see milestoned would be:

  • High priority (or severity) ones.
  • Some bugs in modules or features being actively developed, e.g. currently GPv3, animation, extensions, brush assets, Cycles, Core, etc. Essentially when the module team knows that they are important enough to plan and dedicate time to solve them soon.

The proposed severities and their definitions are:

  • Unbreak Now! : Stop everything to fix it (e.g., data corruption).
  • High: Fix as soon as possible (e.g., regressions).
  • Medium: Planned as part of development.
  • Low: Good to have, but not planned (good target for contributors).

I would remove Unbreak Now! actually. It’s very rarely used, and I don’t think it’s effectively different from High in real dev life. They are both ‘fix ASAP’ IMHO.
Also agree that Normal feels better than Medium.
And current names also seems to be part of alignment to CVE standard (or are heavily inspired by it at least)?

3 Likes

To improve the efficiency and effectiveness of getting things done in the proposed Blender triaging workflow, consider the following recommendations:

  1. Implement clear deadlines:
  • Assign specific timeframes to each severity level
  • Set up automatic reminders for issues approaching their deadlines
  1. Establish accountability measures:
  • Designate responsible individuals or teams for each severity level
  • Implement regular progress reviews for open issues
  1. Enhance communication channels:
  • Create a dedicated communication platform for triagers and module owners
  • Set up regular sync meetings to discuss high-priority issues
  1. Provide training and documentation:
  • Develop comprehensive guidelines for the new triaging process
  • Conduct training sessions for all team members involved in the workflow
  1. Implement a feedback loop:
  • Regularly collect feedback from users, triagers, and developers
  • Use this feedback to continuously refine the process
  1. Utilize automation:
  • Implement automated issue assignment based on severity and module
  • Use AI-powered tools to assist in initial issue categorization
  1. Establish performance metrics:
  • Define key performance indicators (KPIs) for the triaging process
  • Regularly review and report on these metrics to identify areas for improvement
  1. Create a triage escalation process:
  • Define clear steps for escalating issues that are not being addressed
  • Establish a review board for addressing consistently delayed high-severity issues
  1. Implement a reward system:
  • Recognize and reward team members who consistently meet or exceed expectations
  • This can help motivate the team to stay on top of issue resolution
  1. Conduct regular process audits:
  • Schedule periodic reviews of the entire triaging workflow
  • Identify bottlenecks and inefficiencies, and implement solutions

By implementing these additional measures, the Blender team can further enhance their triaging workflow, ensuring that issues are addressed more efficiently and effectively.

1 Like

Having thought about this a bit now, I am mostly with @mont29 here actually (thx reading my mind and typing it out almost exactly like I would have :grinning:) except for

I would keep this, still think there is a difference between “ASAP” (which I would connect to High) and “Now” (which really means you drop anything unless your mother is dying atm…)

I also think this step will happen eventually (and should be part of regular dev team management of the tracker), but if I understood correctly, the initial “what is actually happening, if this is actually a bug, if they are assigned to the correct module, decide how critical they are” Triaging judgement is trusted enough to not have this as a formal step written down for the module to do again (of course they might correct Triaging and change things if appropriate).

The mentioned graph with the unfortunate steadily rising number of Unclassified Reports was solely based on the type still Report even though it was already labeled Confirmed, and since we all agree that removing that type distinction is good, the graph will go (it wasnt a good measure for “a dev has actually looked at an issue”). Now how can we still make sure this ( the “a dev has actually looked at an issue” ) is happening, which is of course mandatory to actually get rid of issues/bugs. Think this can only be reached with counting on the dev team discipline regarding the tracker – reserve more time for this on a regular basis besides projects or whatever works best for a dev to get motivated to do this.

I think if we use milestones to “help to communicate planning and sense of urgency for issues” this can go hand in hand with the release management process. The way Triaging was using milestones to indicate that an issue got introduced along the 4.2 dev cycle probably needs tweaking then (as the fact in got introduced in 4.2 wasnt always reflecting the urgency of getting it fixed until the 4.2 release). If we as Triaging want this information in order to better get a list of appropriate fixes for the Release Notes, we should be using something else.

I think if we stress this “new/increased” importance of milestones with everyone, it could actually work – and would therefor help the (release) process.

I agree in general with this proposal.

From my perspective as triager, I think, that the known issue report definition could be modified. Currently it practically covers “won’t fix” reports like for modules that are EOL. These could be probably closed. However reports for unmaintained modules should be kept, since module status may change in the future.

In any case, it would be good to be able to tell, whether the report was closed as a result of new policy, a fix, or some other means.

From my perspective as developer, I use known issue type mainly to communicate to others, that the issue is more complex, than it looks and it needs some coordination and planning to resolve it. For VSE most of known issues were fixed without looking at these reports and in some cases I forgot, that they were even reported. So if these would be closed, I wouldn’t mind so much.

Would this mean, that report with medium priority (assuming it is default) and no milestone attached would mean “Eventually this could/would be fixed”? I think, that in practice, user would still wonder whether there will be some milestone attached to the report and hope would be lost with passing time.

Back in phabricator days, if I did not fix the issue within a month, chances that issue would be fixed goes down drastically. Now, if I am not tagged in new report, there is good chance, that I may overlook it, especially if I am working on some complex task. But sometimes, I wonder what users think of issues open for long time, and whether it could be communicated to them, what is the chance or timeframe of fix.

For VSE issues I sometimes explicitly commented in the report whether this is simple fix or would need more work. I did this as a part of report classification, where I did reproduce it and looked at what actually went wrong. But I have never specified any timeframe.

For release planning, milestones do make sense, even though I rarely used them. To me adding these to PR’s makes more sense, than for reports.

@Nurb2Kea Your steps would be great for running team, that does purely maintenance and to get number of bugs closer to 0. But I am not sure that is the goal of this proposal.

1 Like

Is this an AI suggestion? Most of it exist in one form or another, some could be improved upon, but I dont see a clear connection to what is proposed here…

9 Likes

We will also discuss this in tomorrows meeting

1 Like

Remove “Report” type (leaving only bug).

Not sure how it is supposed to work in practice.
Will it exclude bug confirmation process, so all the reports will be considered as bugs, mixing them together?

Known issues: Limitations of the software that wouldn’t be addressed in the near future.

Information about limitations forms the knowledge database. Not sure if it could be acrhived in actively developed software.

We discussed this topic in the triaging meeting. The generall notes were:

  • Archieving all known issues doesn’t seem like a good idea. Still needs further discussion before any changes are made.
  • These changes can be implemented in phases. For example implementing the non-controversial changes in the near future, while discussion surrounding the controversial changes continues.
  • Some changes are destructive and we can’t go back if we change our mind, so we should be certain we want them before making the change.
2 Likes

Ok my proposed plan of action then is:

  • Rename the Priority label to Severity.
  • Report / Bug
    • Investigate a way to batch assign labels without triggering (email) notifications (to convert Reports to Bugs).
    • Update Bug template to use Bug label.
    • Archive the Report label.
  • Check if any change is required on the bot.
  • Update documentation (remove mention of classification).
  • Update metrics at least in regard to priority/severity.
  • Leave decisions about known issues for a later time.

I will bring this up on tomorrow’s bf-admin meeting.

2 Likes

Was it decided if Priority Medium or Priority Normal will be used? I’m still in favour of Priority Normal.

Otherwise for me personally, the plan sounds alright.

Normal has some shortcomes, but for simplicity reasons we may as well go with it.

Out of curisoty, as discussion around changing the name may occur again in the future. What short comings does Normal have?

2 Likes

All but the metrics changes are done now.

2 Likes

“Normal” implies that the other status are “not-normal”. It also implies that, yes this is expected (it is normal). None of these apply to the bug priority/severities.

3 Likes