![]() |
|
| |||||||
| Register | Blogs | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| PHP/HTML/CSS/JavaScript and Servers Pick up tips and exchange info on the technical side of having a web presence. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Hey guys,
I had a quick question....I currently have a URL that is www.domain.com/category.html. I want to change that to www.domain.com/category.htm due to how the site is getting redevloped (notice the difference between ".html" and ".htm"). Does google view those as two difference URL's (ie. duplicate content)? Do I lose all the link equity I had to the previous URL? I have read about 301 redirects...is that an applicable technique I can use to get around this? Would I have to 301 redirect every page on the site? Any thoughts would be very much appreciated! |
|
|||
|
category.html and category.htm will be viewed by Google as two separate pages. If the site is already established using .html extensions, then a 301 redirect would be the best option. If you have a lot of pages then the easiest way to redirect all the files would be to wildcard them (*.html > *.htm).
Usually I'd suggest using an .htaccess file to do this, but the fact that you want to change .html to .htm suggests that you're probably moving your site to a Microsoft server? If so, I'm not sure of the best way to do this, though I'm sure someone on here is familiar with MS servers. If not and you're using Linux/BSD, I can help.
__________________
Lawrence |
| The Following User Says Thank You to Lawrence For This Useful Post: | ||
biz817 (12-09-2008) | ||
|
|||
|
For reference, if you're using an Apache server and have mod_rewrite switched on, this is one of the ways you can do it.
Add this code to your .htaccess file in the web root (you may need to create one if one doesn't already exist). If the first two lines are already present in the file, then you only need to add the last line to the end of any existing rewrite rules. Code:
RewriteEngine on RewriteBase / RewriteRule (.*).html$ /$1.htm [R=301,L]
__________________
Lawrence |
|
|||
|
thanks again lawrence. had a seperate but related question. we have the site http://www.officechairs.org. This was our first site and it is on the volusion platform. We accidently made the mistake of switching the URL's from underscores to hypens and back (the system rejected the hypens change after two days - not sure why). In either case, it seems that some, if not all of the hypened URLs were indexed. Clearly we have a duplicate content issue in that case and want to get rid of all hypened URL's from the Google index. We have access to a robots.txt file via Volusion. Would we need to block every single URL that has a hypen manually in the robots.txt file? Is there a way to create a generic code for this? Is blocking a whole bunch of URLs looked down upon by Google? Sorry for all the questions.
|
|
|||
|
If the hyphenated URLs no longer exist and there are no links pointing to them, then they will eventually be removed from the index anyway. robots.txt should in theory only help if you still have links pointing to the hyphenated URLs, otherwise the bots have no way of finding the pages by spidering the site.
Google doesn't recognise regular expressions in robots.txt files, but it does recognise * as a wildcard, so you should be able to use something like *-*.html to pattern match hyphenated URLs. Google doesn't seem to mind lots of pages being excluded from being spidered, it's in their interest if their index isn't full of login, registration and other non useful pages. A far better way to achieve your goal though, is to open a Google Webmaster Tools account then remove the unwanted URLs with their URL Removal tool.
__________________
Lawrence |
|
|||
|
hey lawrence,
hope you had a good weekend. i had a couple of questions in regards to google indexing. Our site, officechairs.org, is ranked 12th in google for the keyword "office chairs". While there is no extension written before the "www" in the search results, i assume it is "http" and not "https". However, when I type into google "site:officechairs.org", the first result is the https version of the homepage (https: //officechairs.org). I am unable to find the http version of the homepage. This seems odd, given most other sites seem to have the http version result come up when i do the applicable site command in google. You will also notice a bunch of indexed pages seem to include the https extension, which is concerning. Do you have any thoughts on what might be going on here and how we should address it? As a follow up to the problem, our guess is we should block google from spidering the https extension. I have done some googling on the issue and google suggested that an https version of the robots.txt would need to be created as in the following link: http://www.google.com/support/webmas...n&answer=35302 The problem is Volusion only allows you to have one robots.txt file. I was able to find a couple of alternative solutions in one file but they suggest using different syntax: http://forums.monstersmallbusiness.com/lofiversion/index.php/t17200.html (second post down titled "KennyE" - suggested syntax: Disallow:/https:/) http://www.vodahost.com/vodatalk/sea...gle-index.html (fifth post down from Karen Mac - suggested syntax: Disallow: https:/) any thoughts here would be appreciated. your insight has been very helpful! thanks, sachin |
|
|||
|
The source of the problem is your ecommerce software, it doesn't kick you back to non SSL mode when it should. To see how easy it is to get stuck in SSL; from your homepage click "My Account", then click the "login to my account" button. This will change the URL to an https:// one. If you now click the "home" link, you'll see that it remains https:// instead of reverting back to http:// like it should. All the "Browse by Category" links on the homepage are now https://, though the "featured Items" and left hand side menu links are http://. What is happening is that the bot gets onto an https:// page (either via internal navigation or by a direct external link) then spiders all available links, some of which are https:// while others are http://. This is why you're getting mixed URL prefixes in your SERPs.
This might be due to an error in your store's configuration settings. Double check everything to do with sessions, SSL etc in your store's admin. If it turns out that it's a flaw in the Volusion S/W then one option is to delete all the https pages from Google's index, create an XML sitemap (you can download free S/W to do this if you're not sure how to), then submit the sitemap manually to Google. Quote:
Quote:
Quote:
Creating entries in your robots.txt file and similar "fixes" is just putting a band aid on the problem. The real problem is your software not switching out of SSL when it should. Fix this, get your site reindexed and all should be well.
__________________
Lawrence |