Upcoming New Developer Documentation Platform - Replacing Wiki

If the current workflow takes too much time and overhead to manage, then the above seems to simply create a different form of time and overhead. One can imagine that after several months, the drive will be filed with images/etc that aren’t used at all. After that number approaches several hundred images, it will be impossible for an admin to act as the janitor.

Several hundred images is nothing, we can have that amount added every day and not worry about it.

I am not exactly sure why a potential drive.blender.org needs to be public for everyone with a Blender ID. If we limit uploads to trusted people (commiters, documentation writers…) we have all benefits of a small git repo with the benefit of a trusted media platform to upload images, blend files and videos.

We want anyone to be able to contribute changes to docs, not just trusted people.

People will have to either have commit access / or go via a PR anyway to contribute to the docs, so there is already a form of access restriction. I rather give people permissions on the drive than moderating unwanted content. But that’s something we can still decide later. I just wanted to raise the point that it doesn’t necessarily need to be public in order for us to work.

I’ll also stop now as I think we have probably now gone through the pros and cons in quite a bit of detail.
There are just a few things I want to respond to.

To me it seems like you want to simplify stuff but then to do the simplification you wanted, you introduce an other complex step not making it that much simpler in the end.
If you have require people to use a file host then I personally think it is not much different from having a separate repo. But I guess this is the point that we disagree on.

We probably want to impose restrictions regardless if we go with a separate file host or not, no?
For example we probably don’t want our videos to be 120 fps HDR 4K videos encoded in a lossless format. Probably easier to detect when people are uploading huge files if we host them in the repo as well.

If you want to simply clone the repo without any of the binary files, it is possible to do so with git-lfs as well. git-lfs per default also only downloads the current binary snapshots.

Eeehhhh. I think that would be a regression in the current workflow. Now you are forcing people to keep track of files in two different places than just in one place. Which is what you wanted to avoid in the first place.

My experience at a big tech company is that the idea of keeping developer docs in the same directories and source control system doesn’t work out that well in practice (at least in the part of the company that I can see). In practice, what happens is this:

  • Detailed usage documentation goes into large comments in the .h file for the API in question. Pretty much all you need to understand the capabilities, restrictions, and performance of API functions are documented in a very large comment block there.
  • “Design Documents” are usually just Google Docs. They represent a point in time about the design, and not necessarily the current state of the implemented software.

This is not ideal of course, and the workflow that Brecht et al. are hoping for seems better. Yet we have the capability (and even official encouragement) to follow something like that workflow, and yet don’t. Why? Here are a couple of my thoughts:

The main point is that “design” goes through many stages:

  • Idea of the functionality wanted and one or more ideas of the technical approach to be used.
  • Choices about the API and/or user interface.
  • Initial choices about data structures and some primitive helper functionality to be implemented.
  • Starting to implement. Discovering that some algorithm development is needed, and working out the math for that.
  • Discovering bugs, functionality lacks, performance problems in the initial code. Iterate on some of the previous stages until a satisfactory MVP is achieved.
  • Release. Get feedback and bug reports. Iteratoe on some of the previous stages again.

At what point does one have “developer docs” that are ready for committing to a repository? You could say “all of them”, to fit the proposed workflow. But in my experience, it is hard to keep a living, well-organized document up to date while going through the above stages. Also, it is difficult to do that and also keep a record of alternatives tried that didn’t work out. It can be done, but is a lot of work. What I personally do, on large projects (like Bevel, or Exact Boolean) is write a running Google doc that is kind of stream of consciousness - notes to myself, and sometimes, to a close collaborator. I could do this in markdown and submit it with each commit to my private fork, but would feel slighly embarrassed about doing so.

The other point is that something like a Google doc has much less friction for adding pictures and collaborating with others via side comment threads. I would encourage us here to try to find the most frictionless way possible to add pictures – ideally, a drag-and-drop or a simple “insert picture” dialog where the user just picks a .png file. (Even better: something like Google’s built-in Drawing insertion too, for diagrams that need vector art with labels.)

4 Likes

The difference is in the number of branches and pull requests you have to juggle.

Restrictions on “maximum size of a single file on the server” vs. “maximum size of all files in the repository that will be duplicated on the computers of all developers” are quite different things.

No, you are not keeping track of the files on drive.blender.org. You put the link in the markdown file, and that’s the only place you keep track of it. Git LFS actually works the same way, there’s some hash in the repo and as a user you don’t manage the files in LFS.

I think you are mostly talking about different types of developer docs, that we already not putting in the wiki for the most part?

Above I suggested we use Gitea issue for design docs while working on a project. Besides the specific platform I agree that these are living documents that we should be able to discard, and not worry about making them presentable.

Personally I’m actually skeptical about extending the docs under Features in any direction also, compared to putting such details in code comments. There are some aspect we need to document better, mainly things that affect many developers and are worth writing good documentation for them to get started. For example DNA and versioning.

But when it comes to more specific areas of Blender, the value of documenting code like this goes down, and often it would be better to invest time in writing comments or making the implementation as clear as possible. What I think should be in the developer docs are things that can either not be expressed or discovered well in code comments.

3 Likes

What I mean is that instead of having all files in the repo, now you have to manually upload and update the links after you have tested out your changes locally and want to create a PR.

So instead of doing simply just doing git add . && git commit && git push to upload your changes, you now have to also:

  1. manually navigate to the drive.blender.org home page
  2. click on “upload files” and navigate to where you hosted your files locally and select them
  3. manually replace every link in the markdown files with the urls from the drive.blender.org website

For doing edits it gets even more tedious because with git-lfs you could just replace the media file in the repo and commit and push. Now you have to do the “upload and replace urls” work described above every time.
(Which to me is not conceptually much more work than having to push to a different repo, but still more tedious as there is more manual work involved imo)

I’m a bit confused in what you are trying to prove here.
You are correct that git lfs uses hashes and then does a switcheroo with the file on disk. But the end user just uses a path to the file and doesn’t have to care about the hashes or that it is managed by git-lfs as this part is supposed to be transparent to them.
If you host files on drive.blender.org now the users actually have to care about hashes and what hash they are linking to.

IE instead of having this url when using git-lfs:
../media/images/my_image.png
You have this instead:
drive.blender.org/125ac4b53b823bf623.png

The difference here is that with git-lfs the user does not have to update the url when editing or updating the image. With drive.blender.org they have to. Because we can’t allow users to edit or delete the files they have uploaded.
It also makes it harder to figure out what the image is supposed to be by just looking at the urls as the filename of the pictures will probably not be human readable.

2 Likes

@brecht

One thing I’d like some feedback on is what people think about moving the docs into the main Blender repository.

For developer documentation, I’m very much in favor of moving them into the main Blender repo.

A recent case study of why I think that’s a good idea:

We recently merged hierarchical bone collections, which included plenty of Doxygen doc comments for functions, etc. But there is a higher-level design of how the hierarchy is laid out in memory and its invariants that didn’t have any obvious place to go. (It could have been a massive comment right smack in the middle of bArmature in DNA, but that didn’t seem appropriate.) Importantly, this higher-level design is just as relevant to understanding and working with the code as any of our Doxygen comments.

But because it’s not in the same repo, documenting this important aspect of the code became a separate step that we had to take note to do later. And in my experience, making documentation a separate step rather than intermixed with writing code makes it a lot less likely to happen.

Moreover, intermixing documenting with coding also significantly increases the chances that existing documentation is updated along with relevant code changes.

In short, I think that writing/maintaining code and writing/maintaining the documentation for that code (whether it be doc comments or higher-level design documentation) should be viewed as part of a single, continuous process. And it’s a lot easier to psychologically buy into that and make it a habit if the code and documentation are in the same repository.

I think the only thing I’m ambivalent about is including documentation that’s not directly related to the code base or Blender’s design. For example, although I view build instructions as directly relevant (if I clone a repo, I should be able to figure out how to build it from the contents in the repo itself), I do not view communication channels or organizational development processes as relevant.

@ZedDB

I think this will make PRs and commits a lot more noisy. Not just the amount but the content/discussions in the PRs as well.

I certainly understand the desire to avoid more noise in PR discussions. Particularly of the bikeshedding variety. And I agree that it’s even easier to bike shed prose than code. So I’m very sympathetic to your desire to avoid that.

However, I think it’s worth the benefits. Documentation is much less useful if it’s out of date or otherwise inaccurate. By allowing documentation changes to accompany code changes, we can more easily ensure that relevant documentation additions/updates actually get made, and double-check that they’re an accurate reflection of the code changes.

(I’m also just generally of the mind that code and its documentation are deeply interrelated, and should always accurately reflect each other. If they don’t, I view it as a kind of bug. (Edit: there are some situations where it makes sense for things to go temporarily out of sync–I phrased that too strongly.) So to me, ideally dev documentation changes should accompany code changes in a PR. But I won’t push that point.)

1 Like

Yes, I think I am. I guess I am a bit fuzzy on what you meant above by saying “code documentation” is part of the developer docs that you were discussing putting into the same repository as the code. That phrase can cover a lot of levels of explanation. Some of which clearly belongs in the code as comments, some of which clearly takes a lot of words and pictures to explain (a “research-paper” like explanation of what is going on big-picture-wise). I’m not sure what is in between that gets committed as markdown as part of the regular commit cycle.

By the way, the “interleaving code and comments” idea brought up by Jacques and Brecht reminds me of “Literate Programming”, which interested readers may want to look up. Long ago I worked on Knuth’s TeX system, which was written this way. I think it is a great way to write code that is meant to be understood, but we are way too far along with Blender to try to do exactly that, IMO.

My personal take:

  • Most developer documentation doesn’t involve actual image files. Most will be prose, code snippets, and diagrams (which can be done with mermaid diagrams on the new platform). Therefore in practice I, as a developer, will only very occasionally be interacting with the file store for images.
  • With a sufficiently straightforward file store, on the few occasions I do need to add images, it’s basically a drag-n-drop in my web browser.
  • By contrast, I’ve had a hell of a time getting Git LFS to work in the past (including the recent past, trying to add things to Blender’s user manual). And when Git LFS goes wrong, it’s a pain to trouble shoot. I would much rather interact with an easy file store than Git LFS.

Having said that, I actually agree that the image files for documentation should ideally be stored in the same repo as the documentation itself, wherever that is. I just think Git LFS is awful enough that I would prefer a separate file store if Git LFS is the only other alternative. And as Brecht noted, technically Git LFS doesn’t actually store files in the same repo anyway.

2 Likes

git-lfs stores a local cache of the large files on your local computer, which means a blender git checkout would suddenly be multiple gigs large and contain lots of media I’d hardly ever be interested in while coding.

Having the images stored on a permanent online location (drive.blender.org for example) would mean the images would only be downloaded while viewing the specific document they are linked from (by the webbrowser looking at the document).

It’s maybe slightly inconvienient that the developer docs would not be complete without internet access, but imo that beats having a humongous footprint for any local checkout.

The non-human-readable filenames (hashes) are a bit of a turnoff for me though.

1 Like

I’m curious exactly what issues you ran into.
While working on the studio-pipeline repo, Nick and I did run into some issues. But most if not all of them were either solved by following the the instructions on how to initialize git lfs on the computer (only need to do it once per user account) or tweaking some git config option.

Of course LFS is not a perfect and absolutely flawless software solution. But I think it is good enough for us to leverage it.

That it actually stores the data outside of the actual git repo is not the problem. With git lfs this becomes transparent to the user. If we use a solution like drive.blender.org, then this will not be transparent as the user will have to manually upload and link the files. With LFS, they can work as if the files were on their local drive.

Right, that is why I am suggesting to not merge the repositories and keep them separate. Mainly because I think that the majority of people cloning the the repo will not use the docs. They will most likely just build Blender and/or work with the source code itself. (And if they want to read something from the docs, they will probably go to the docs website).

I don’t think many people will use the repo to view the docs. They will go to the developer.blender.org/docs website instead. Which part of my point.
People actually using the docs in the docs repo will most likely have it cloned down because they are going to work on them. Therefore I think it is best to have the files at least appear to be hosted in the same repo to the end user.

To me having the media files hosted in a place where they do not act as if they were local files, makes it very similar to having different repos. For example if we hosted all media files in a svn repo instead and then referred the them in the git repo with the relevant svn url. This is practically the same to me as hosting the files on drive.blender.org.

My point is that if we are going to have this split workflow regardless, then why not just have a separate docs repo and host all files in the same place. This way when you are actually working on the doc files, you don’t have a split workflow. You will only have a split workflow when working on both code and the docs. (However with the proposed “drive” solution you would have this split workflow no matter what)

In my humble opinion, having the docs as part of the main repository would be a good idea. In case that the change to the docs become to big for some PR, splitting them into a seperate PR is easy and this is already what we do for code anyways.

For media, I don’t have a strong opinion. I think that whatever we choose as a solution is fine. More than 95% of the docs is just plain markdown anyways. I would even go as far as to say that for technical documentation, avoiding the use of images or videos is a plus. If I created some diagram, then upload it as an image, it’s no longer editable. So using markdown extensions to draw the diagram instead would be better.
This changes of course when talking about the manual where media is often required.

If the documentation was guaranteed to only be text, then I think it would be OK to have it as part of the main repository. I’m not keen on adding GIT-LFS to the main repository.

GIT-LFS just isn’t as well supported as GIT is, I’ve run into problems with it in the past that were a hassle to solve. Currently GIT-LFS doesn’t support security tokens well for e.g., meaning I need to physically press my security token multiple times when pushing to a GIT repo that uses GIT-LFS (it’s a known-problem).

I’m curious exactly what issues you ran into.

This latest time, I was able to clone, check out, and commit things fine (or at least that seemed to be the case), but then issues started popping up when I tried to push, that caused the LFS-managed binary files to not get pushed properly. I don’t recall the specifics anymore, and I never actually solved the issue. It’s still on my todo to figure out the next time I need to make changes to the user manual.

But to be clear, this has been my repeated experience with git LFS over the years: it has foot guns that I somehow manage to keep running into. And I have at least basic knowledge of how git works and its data model. And I’m familiar with the idea behind how git LFS works. So I’m not totally ignorant.

Git LFS certainly has its use cases, though. So I’m not arguing to never use it. But I see it as something to use only when you really need it. So when other solutions are acceptable, I would prefer avoiding it.

Also, I have this preference even if we put the developer docs in a separate repo. For example, as noted by @filedescriptor, developer docs are unlikely to/shouldn’t have a lot of binary files anyway, so just directly committing the few we do have into git proper would be preferable, IMO, if we go with a separate repo.

(Side note: git also has partial clones, which when used with --filter=blob:none more-or-less amounts to a git LFS like system, but better and actually properly integrated with git. Except that the large binary blobs still accumulate with subsequent pulls/fetches, and as far as I know there’s no easy way to delete/gc those accumulated blobs (certainly not automatically). I’m hoping they add such features in the future, and thus make git LFS obsolete.)

1 Like

Same experience here, ran into various issues with Git LFS getting into strange states for the user manual and developer docs. It’s especially bad when adding LFS to a repository when it was not there before, and going back in history.

I would definitely not want to risk adding it to the main Blender code repository.

1 Like

Migration Update:

1 Like