Our Liferay CMS has about 70000 visible pages that should be indexed from google or another search engine by sitemap.xml.
On the google page (http://www.google.com/support/webmasters/bin/answer.py?answer=183668) or the sitemaps.org page (http://www.sitemaps.org/de/protocol.php#index) it's specified that sitemaps should only have max. 50k and should't be bigger than 10MB. Well, our page has more pages and the size of the sitemap is bigger than 10MB.
So an improvement would be:
- check the size (and visibility, sitemap include) for all pages
- is size > 50000 create an sitemap-index file and split the created sitemap file into two (or more) files