I have a custom 404 page as I felt that was the right thing to do. Then a few days ago I realised it should maybe be disallow'ed in the robots.txt file so they would not crawl the page.

Now, I just got a google email saying:

--
Page indexing issues detected on feel-good.today
To the owner of feel-good.today:
Search Console has identified that your site is affected by 1 Page indexing issue(s):
Top critical issues
Critical issues prevent your page or feature from appearing in Search results. The following critical issues were found on your site:
Submitted URL seems to be a Soft 404
We recommend that you fix these issues when possible to enable the best experience and coverage in Google Search.
--

the 404 page has been there for some time, so it seems me asking for it to be disallowed by the robots has in fact forced the page to be crawled. The simple answer is to take that line out of the robots.txt file but just wondering why this is happening.

Can anyone help clarify?

I've also asked for the thanks.html page to be disallowed as it is just the page that someone will see when they have submitted a webform i.e. "Thanks for submitting the form, I'll be in touch soon" msg. THis page has not flagged an error (YET!).



Their help says "If the rendered page is blank, nearly blank, or the content has an error message, it could be that your page references many resources that can't be loaded (images, scripts, and other non-textual elements), which can be interpreted as a soft 404. Reasons that resources can't be loaded include blocked resources (blocked by robots.txt), having too many resources on a page, various server errors, or slow loading or very large resources."

I read that as "We found the page, but can't load it, maybe because it has been disallowed but the robots file"
Which is exactly right it has been blocked by the robots.txt file! This page is only pointed to by my control panel in my web hosting, i.e. no page links to it SO my understanding was no search engine would crawl to it!! If, google looks at all HTM / HTML files in a directory it might actually find it, even if it's not linked form any other page - hence I disallowed any bot from looking at that file.

Anyone?