TYPO3: Robots.txt
First of all: I’m very busy at the moment so it’s been a little quite lately, but I’ll try to keep up..
I’ve done some research and I think this is a pretty good example of how to use robots.txt on a TYPO3 website:
User-Agent: *
Allow: / # Allow bot to enter
Disallow: /fileadmin/website/notimportant/ # Exclude only folders with no
# link from frontend, like
# templates, css, js.
Disallow: /t3lib/ # Nothing to see here
Disallow: /typo3/ # Nothing to see here
Disallow: /typo3conf/ # Nothing to see here
Disallow: /typo3temp/ # Nothing to see here
Disallow: /*?id=* # Disable non-realurl
Disallow: /*&type=98 # Disable print pages
Sitemap: http://www.example.tld/sitemap.xml # Your Sitemap
Sitemap: http://www.example.tld/rss.xml # Your RSS Feed
You’ll notice the upload folder is allowed since there are files in here, which are linked to the frontend. The same applies for files in the fileadmin. If you don’t use RealUrl keep in mind to remove that specific line or else the bot won’t index much.
If anyone knows a better setup please don’t hesitate to comment!

Maarten,
After a few years, our TYPO3 robots.txt file looks very much the same. You’ve done a good job here.
Michael