Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProGit v1 redirects #1915

Open
wants to merge 64 commits into
base: gh-pages
Choose a base branch
from
Open

Add ProGit v1 redirects #1915

wants to merge 64 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Nov 6, 2024

Changes

With this PR, the original ProGit v1 links (like https://git-scm.com/book/en/Getting-Started-Git-Basics) will no longer show the 404 page, but instead redirect to the corresponding v2 section.

Context

It was reported in #1782 that the ProGit v1 URLs stopped working a while ago. This was pre-Hugo/Pagefind switch, resulting in a 500 page, but after that switch, they are still broken, only now a 400 page is shown instead.

I mulled for a while what to do because we cannot have wildcard routes in a static site.

Eventually, I suggested to resurrect the previously-working URLs by inspecting the original progit repository.

In the end, I figured out a way that was much more convenient: obtain the table of contents from the Internet Archive and then use common sense to manually map the v1 URLs to the v2 URLs.

I did not want to leave potential contributors with a lot of uncertainty whether this is even workable, so I investigated a bit, and even came up with that manual v1 <-> v2 mapping..

After that, I did not want to invest even more effort myself, hoping that other people would run with the idea. But nobody bit. So today, I thought: "How hard could this be?". And to my surprise, it was not hard at all.

Note that only the first 5 commits of this PR are interesting with regards to review; The remainder of the commits were created by a manually-triggered update-book workflow run where I specifically forced a complete rebuild (the second parameter of the getPendingBookUpdates() call reflects the force-rebuild input parameter.

For convenience, I deployed this to my fork, you can verify that the ProGit v1 URLs work e.g. by picking a link from the original ProGit v1 table of contents in the Internet Archive, editing it so that it points to my forked site instead, and then verify that it redirects as intended. For example:

I also tried to make sure that the translated pages work as well. Example:

This fixes #1782

@dscho
Copy link
Member Author

dscho commented Nov 7, 2024

I just updated this by rebasing to the current gh-pages revision (skipping the book updates), then letting another update-book workflow run update it. Here is the diff (spoiler alert: it essentially just updates the eBook download URLs).

@LemmingAvalanche
Copy link
Contributor

I was asked to review this PR, a task I’m not qualified for (no Ruby knowledge etc.). So I’ll probably try to read the code back.

@LemmingAvalanche
Copy link
Contributor

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

script/book.rb Show resolved Hide resolved
script/book.rb Show resolved Hide resolved
script/extract-book-v1-urls.rb Show resolved Hide resolved
script/extract-book-v1-urls.rb Show resolved Hide resolved
script/book.rb Show resolved Hide resolved
@LemmingAvalanche
Copy link
Contributor

I compared the book_v1.yml on Chapter 4 (since V1/V2 diverge a lot) to Dscho’s initial table:

#1782 (comment)

The mapping in that file is the same as the table.

@LemmingAvalanche
Copy link
Contributor

I was a bit confused by all the one-commit merge commits for the localized book versions. But from reading the Git mailing list I know that Dscho knows his merge commits. So there’s a reason for those.

LGTM

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

@LemmingAvalanche how do you serve the local site? Via hugo --serve? That won't work, due to the .html-less pages that GitHub supports (and which we must use for backwards compatibility with the Rails app version of git-scm.com).

Could you try serving it via script/serve-public.js instead?

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

I was a bit confused by all the one-commit merge commits for the localized book versions. But from reading the Git mailing list I know that Dscho knows his merge commits. So there’s a reason for those.

@LemmingAvalanche you mean commits like 4b00169? These are a side effect of a matrix of jobs taking care of updating the individual translations of the book. The changes made by those jobs all have to be merged together. (They could be merged via a single octopus merge, but my experience is that octopus merges baffle users and Git commands alike, therefore I avoid them.)

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Replying to this thread more visibly:

Uh oh. The $v1_to_v2 data is not even used. Which means that https://dscho.github.io/git-scm.com/book/en/Git-Tools-Submodules redirects to https://dscho.github.io/git-scm.com/book/en/v2/GitHub-Summary (i.e. 6.6 -> 6.6, when it should redirect to https://dscho.github.io/git-scm.com/book/en/v2/Git-Tools-Submodules instead, i.e. to 7.11)!

Will fix immediately.

I fixed this, force-pushed, then let the update-book.yml workflow do its job. The combined result is here.

You can see that this fixed the English links starting here.

You can verify that the Git-Tools-Submodules section I mentioned in the quoted message is now correctly aliased

Or just direct your web browser to https://dscho.github.io/git-scm.com/book/en/Git-Tools-Submodules or to https://dscho.github.io/git-scm.com/book/en/v1/Git-Tools-Submodules to see that it works now.

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Will have to re-fix, hold on a sec.

In 2014, the second edition of the ProGit book was started, in 2016 this
edition became the default, and in 2020 all v1 URLs were redirected to
the landing page of v2.

At some stage, the v1 links stopped working, returning a "500 Internal
server error".

After switching the git-scm.com site to a static Hugo-generated site,
those links now return a "404 That page doesn't exist".

This is far from an ideal situation.

Unfortunately, in a static site, there is no way to install wildcard
routes that would redirect to a better URL, therefore we have to
enumerate all of the URLs that we want to redirect.

Internet Archive to the rescue!

This script downloads the tables of contents of the ProGit v1 book and
its translations, and determines the URLs that had been active back
then, maps them to the v2 equivalents (on a best effort basis), and then
writes out a YAML file with that information.

Signed-off-by: Johannes Schindelin <[email protected]>
This file was generated by running script/extract-book-v1-urls.rb.

Signed-off-by: Johannes Schindelin <[email protected]>
The `File.write(path, content)` form is quite a bit more readable than
the long form.

Signed-off-by: Johannes Schindelin <[email protected]>
The default language of the ProGit book is English, therefore
https://git-scm.com/book should redirect to the table of contents of the
English version of that book.

This means that the `/external/book/content/book/en/_index.html` file
needs to be part of the sparse checkout, otherwise the workflow run
would not be able to update it (should it ever become necessary).

This was not a problem so far because that file remained unchanged (and
is likely to remain so for quite some time yet).

Signed-off-by: Johannes Schindelin <[email protected]>
Once upon a time, there first edition of the ProGit book was available
on Git's home page, and the sun was shining. Then, one day in 2014, the
sun shone brighter and work was begun to write the second edition of
this book. At some stage, this became the default when directing web
browsers to https://git-scm.com/book, and a few years later, 2020 or so,
the links that formerly led to the first edition would redirect to v2.

Then, one day, clouds moved across the sky and the redirects from v1 to
v2 stopped working and instead a "500 Internal server error" page was
shown.

Time went by and nobody really knew how to fix it (or more likely,
wasn't in the mood, or wanted other people to fix it).

Finally, in the fall of 2024, git-scm.com was switched to a static web
site, generated using Hugo, and local development became much easier.
Naturally, the v1-to-v2 redirects were no longer in place and the v1
links therefore showed 500 no longer, but a 404.

Still, nobody knew how to fix it, or wasn't in the mood, or wanted other
people to fix it for them.

Until now. Now is the day when we resurrect the v1-to-v2 redirects, in
even more glory than ever before, for now we redirect to the v2 sections
that correspond to the v1 sections (as far as possible, that is)!

Only one (slight) fly in the ointment: URLs to v1 sections of the book
which contain anchors will keep those anchors as-are, and not translate
them to the corresponding new anchors. Example:
Git-Basics-Getting-a-Git-Repository#Cloning-an-Existing-Repository
should redirect to v2/Git-Basics-Getting-a-Git-Repository#_git_cloning,
but does not. It redirects to that page but still tries to find the
anchor `#Cloning-an-Existing-Repository`.

Alas, this is where I do not know how to fix it, or ain't in the mood,
or want other people to fix it for themselves.

This commit addresses git#1782

Signed-off-by: Johannes Schindelin <[email protected]>
dscho added 11 commits November 11, 2024 14:45
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
Updated via the `update-book.yml` GitHub workflow.
dscho added 25 commits November 11, 2024 14:45
@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Will have to re-fix, hold on a sec.

Fixed. Still had this left-over from an internal iteration: https://github.com/git/git-scm.com/compare/6e803d156d69500d0cf099771078e46934d4cd3e..8d9bfac33af659438239db9f9926f5da5181717d

@LemmingAvalanche
Copy link
Contributor

(They could be merged via a single octopus merge, but my experience is that octopus merges baffle users and Git commands alike, therefore I avoid them.)

@dscho A sound assessment. :P

@dscho
Copy link
Member Author

dscho commented Nov 14, 2024

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

@LemmingAvalanche how do you serve the local site? Via hugo --serve? That won't work, due to the .html-less pages that GitHub supports (and which we must use for backwards compatibility with the Rails app version of git-scm.com).

Could you try serving it via script/serve-public.js instead?

@LemmingAvalanche did you have any luck with that yet?

@LemmingAvalanche
Copy link
Contributor

did you have any luck with that yet?

@dscho I’ll try this evening

@LemmingAvalanche
Copy link
Contributor

I’ll try this evening

This setup works fine.

hugo
node script/serve-public.js

I’m writing a longer reply now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

500 Internal server error when visiting link for Pro Git v1 book
2 participants