I generate web pages from a template held in a database.
Do I need to have meta tags for keywords and description? Are these pages cached and searchable?
Thanks...
www.bricksandbrass.co.uk
I generate web pages from a template held in a database.
Do I need to have meta tags for keywords and description? Are these pages cached and searchable?
Thanks...
www.bricksandbrass.co.uk
I generate web pages from a template held in a database.
Do I need to have meta tags for keywords and description? Are these pages cached and searchable?
Thanks...
www.bricksandbrass.co.uk
Hi Simon ....
My understanding is, no, dynamically generated pages are not crawled by spiders. But it depends on how your pages are created. For example, if they are inside a frameset, put the meta tags/descriptions inside these. You can list the links inside the NOFRAMES tags and these can be crawled.
Some resources:
WebMedic(right at the bottom): http://www.northernwebs.com/set/setsimjr.html
SearchEngineWatch: http://www.searchenginewatch.com/webmasters/index.html
Web Promote: http://webpromote.com/
cfn ... Jen
Jen Worden
Web Developer
www.meadoworks.com
Jen wrote:
<blockquote>
My understanding is, no, dynamically generated pages are not crawled by spiders. But it depends on how your pages are created.</p></blockquote>
The way it depends is on the URL. The search engines have no way of knowing whether the page was dynamically created or just pulled off the hard disc - by the time they receive it, it's all just plain HTML.</p>
What they do look at is the URL linked to. If there are query parameters in it, eg. http://www.example.com/readarticle.cgi?article=123 , they are much less likely to crawl it (although some will anyway). For this reason many people use various URL-transforming techniques to have a publically-accessible URL that looks like http://www.example.com/articles.cgi/123 instead, for pages that ought to be indexed. I don't know what approaches are available on your server, Simon, but typically mod_rewrite or CGI ATH_INFO are used.</p>
meta-keywords may help a little in search engine indexing, but not very much these days - too many people abused it, so many engines just ignore it now. (content="sex, sex, sex, real estate, Xara X, Pamela Anderson...") meta-description is still helpful for users to see in results.</p>
Jen's right about the noframes content too - if your pages aren't crawlable by users without frames, they won't be crawlable for (most) search engines either.</p>
I do use frames, but I do have a robots.htm which lists all the files which I want crawled.
Most of the site is static, but things like the bibliography, events, glossary and directory of companies are in MySQL, with the pages generated using PHP; essentially I have a template table, and the script which builds the page gets the template and does a find/replace where content is to go.
With the directory, I do have some pages which are specially for the search spiders - these are not in the navigation although a human visitor will get pointed in the right direction.
So I think all is OK - unless the spiders have given up on the keywords anyway!
Thanks.
www.bricksandbrass.co.uk
Simon, this came up on evolt today (thelist) and looked like it might be just what you were looking for:
http://spider-food.net/dynamic-page-optimization.html
cfn ... Jen
Jen Worden
Web Developer
www.meadoworks.com
Jen
That covers it - although if Google and Hotbot are beginning to trace through dynamic pages...
www.bricksandbrass.co.uk
Simon,
I'd be *very* wary of using a 'robots.htm' file like that. Search engines are always on the lookout for 'cheating' techniques intended to make spiders behave differently to people, and your empty links might look just like that. Google sometimes punishes what it sees as 'cheating' with a zero PageRank, which you probably don't want.
Why not just put the links to all your pages as a sitemap in the <noframes> section of index1? Not only will this placate any cheating-detection algorithms, but it'll mean non-frames or non-JavaScript users will at least be able to read your pages, instead of just getting a link to a page telling them to get lost.
[This message was edited by Jen Worden on March 11, 2002 at 15:17.]
I'm muddled now!
I am using the robots.htm file in the way I have believed was normal practice ie to give the path to all the htm files that a spider should visit. Is this correct?
And on the links pages, they are visitable by a human user and they contain a bit of text, plus a redirect to the database php script which is the entry to that bit of the site. Is this ok?
Thanks to everyone on this.
www.bricksandbrass.co.uk
Hi Simon ...
I think you mean a robots.txt file (as opposed to an .html file), yes? In which case all you require is an open "invitation" as its function is actually for the opposite - which directories/files you don't want crawled.
Syntax:
All robots will spider the domain
User-agent: *
Disallow:
# Disallow directory /cgi-bin/
User-agent: *
Disallow: /cgi-bin/
# Disallow directory /i/
User-agent: *
Disallow: /i/
Where you have text on the links pages, I think you've already ensured that they are crawlable (?!)
cfn ... Jen
Jen Worden
Web Developer
www.meadoworks.com
Bookmarks