Shopping Cart Forum

Go Back   eShop Forums - eCommerce Help Forum for Shopping Cart Owners. > Tech Corner > PHP/HTML/CSS/JavaScript and Servers
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read
PHP/HTML/CSS/JavaScript and Servers Pick up tips and exchange info on the technical side of having a web presence.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-09-2008, 09:14 PM
biz817 biz817 is offline
Junior Member
 
Join Date: Sep 2008
Posts: 4
Thanks: 1
Thanked 0 Times in 0 Posts
Default Switching URL extensions

Hey guys,

I had a quick question....I currently have a URL that is www.domain.com/category.html. I want to change that to www.domain.com/category.htm due to how the site is getting redevloped (notice the difference between ".html" and ".htm"). Does google view those as two difference URL's (ie. duplicate content)? Do I lose all the link equity I had to the previous URL? I have read about 301 redirects...is that an applicable technique I can use to get around this? Would I have to 301 redirect every page on the site?

Any thoughts would be very much appreciated!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 12-09-2008, 09:32 PM
Lawrence Lawrence is offline
Forum Admin
 
Join Date: Aug 2007
Location: Kenley, Surrey, UK
Posts: 1,347
Thanks: 3
Thanked 87 Times in 87 Posts
Default

category.html and category.htm will be viewed by Google as two separate pages. If the site is already established using .html extensions, then a 301 redirect would be the best option. If you have a lot of pages then the easiest way to redirect all the files would be to wildcard them (*.html > *.htm).

Usually I'd suggest using an .htaccess file to do this, but the fact that you want to change .html to .htm suggests that you're probably moving your site to a Microsoft server? If so, I'm not sure of the best way to do this, though I'm sure someone on here is familiar with MS servers. If not and you're using Linux/BSD, I can help.
__________________
Lawrence
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
The Following User Says Thank You to Lawrence For This Useful Post:
biz817 (12-09-2008)
  #3 (permalink)  
Old 12-09-2008, 11:57 PM
biz817 biz817 is offline
Junior Member
 
Join Date: Sep 2008
Posts: 4
Thanks: 1
Thanked 0 Times in 0 Posts
Default

thanks lawrence! your reply was very helpful...im not sure about the server stuff, but i think this will definitely help me start the process...thanks again!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 13-09-2008, 08:11 PM
Lawrence Lawrence is offline
Forum Admin
 
Join Date: Aug 2007
Location: Kenley, Surrey, UK
Posts: 1,347
Thanks: 3
Thanked 87 Times in 87 Posts
Default Apache .htaccess rewrite rule for html to htm 301 redirect.

For reference, if you're using an Apache server and have mod_rewrite switched on, this is one of the ways you can do it.

Add this code to your .htaccess file in the web root (you may need to create one if one doesn't already exist). If the first two lines are already present in the file, then you only need to add the last line to the end of any existing rewrite rules.
Code:
 
RewriteEngine on 
RewriteBase / 
RewriteRule (.*).html$ /$1.htm [R=301,L]
__________________
Lawrence
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 15-09-2008, 06:31 PM
biz817 biz817 is offline
Junior Member
 
Join Date: Sep 2008
Posts: 4
Thanks: 1
Thanked 0 Times in 0 Posts
Default

thanks again lawrence. had a seperate but related question. we have the site http://www.officechairs.org. This was our first site and it is on the volusion platform. We accidently made the mistake of switching the URL's from underscores to hypens and back (the system rejected the hypens change after two days - not sure why). In either case, it seems that some, if not all of the hypened URLs were indexed. Clearly we have a duplicate content issue in that case and want to get rid of all hypened URL's from the Google index. We have access to a robots.txt file via Volusion. Would we need to block every single URL that has a hypen manually in the robots.txt file? Is there a way to create a generic code for this? Is blocking a whole bunch of URLs looked down upon by Google? Sorry for all the questions.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 15-09-2008, 07:00 PM
Lawrence Lawrence is offline
Forum Admin
 
Join Date: Aug 2007
Location: Kenley, Surrey, UK
Posts: 1,347
Thanks: 3
Thanked 87 Times in 87 Posts
Default

If the hyphenated URLs no longer exist and there are no links pointing to them, then they will eventually be removed from the index anyway. robots.txt should in theory only help if you still have links pointing to the hyphenated URLs, otherwise the bots have no way of finding the pages by spidering the site.

Google doesn't recognise regular expressions in robots.txt files, but it does recognise * as a wildcard, so you should be able to use something like *-*.html to pattern match hyphenated URLs. Google doesn't seem to mind lots of pages being excluded from being spidered, it's in their interest if their index isn't full of login, registration and other non useful pages.

A far better way to achieve your goal though, is to open a Google Webmaster Tools account then remove the unwanted URLs with their URL Removal tool.
__________________
Lawrence
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 22-09-2008, 07:47 PM
biz817 biz817 is offline
Junior Member
 
Join Date: Sep 2008
Posts: 4
Thanks: 1
Thanked 0 Times in 0 Posts
Default https indexing

hey lawrence,

hope you had a good weekend. i had a couple of questions in regards to google indexing. Our site, officechairs.org, is ranked 12th in google for the keyword "office chairs". While there is no extension written before the "www" in the search results, i assume it is "http" and not "https". However, when I type into google "site:officechairs.org", the first result is the https version of the homepage (https: //officechairs.org). I am unable to find the http version of the homepage. This seems odd, given most other sites seem to have the http version result come up when i do the applicable site command in google. You will also notice a bunch of indexed pages seem to include the https extension, which is concerning. Do you have any thoughts on what might be going on here and how we should address it?

As a follow up to the problem, our guess is we should block google from spidering the https extension. I have done some googling on the issue and google suggested that an https version of the robots.txt would need to be created as in the following link:

http://www.google.com/support/webmas...n&answer=35302

The problem is Volusion only allows you to have one robots.txt file. I was able to find a couple of alternative solutions in one file but they suggest using different syntax:

http://forums.monstersmallbusiness.com/lofiversion/index.php/t17200.html
(second post down titled "KennyE" - suggested syntax: Disallow:/https:/)

http://www.vodahost.com/vodatalk/sea...gle-index.html
(fifth post down from Karen Mac - suggested syntax: Disallow: https:/)

any thoughts here would be appreciated. your insight has been very helpful!

thanks,
sachin
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 23-09-2008, 12:05 AM
Lawrence Lawrence is offline
Forum Admin
 
Join Date: Aug 2007
Location: Kenley, Surrey, UK
Posts: 1,347
Thanks: 3
Thanked 87 Times in 87 Posts
Default

The source of the problem is your ecommerce software, it doesn't kick you back to non SSL mode when it should. To see how easy it is to get stuck in SSL; from your homepage click "My Account", then click the "login to my account" button. This will change the URL to an https:// one. If you now click the "home" link, you'll see that it remains https:// instead of reverting back to http:// like it should. All the "Browse by Category" links on the homepage are now https://, though the "featured Items" and left hand side menu links are http://. What is happening is that the bot gets onto an https:// page (either via internal navigation or by a direct external link) then spiders all available links, some of which are https:// while others are http://. This is why you're getting mixed URL prefixes in your SERPs.

This might be due to an error in your store's configuration settings. Double check everything to do with sessions, SSL etc in your store's admin.

If it turns out that it's a flaw in the Volusion S/W then one option is to delete all the https pages from Google's index, create an XML sitemap (you can download free S/W to do this if you're not sure how to), then submit the sitemap manually to Google.

Quote:
Originally Posted by biz817 View Post
I have done some googling on the issue and google suggested that an https version of the robots.txt would need to be created as in the following link:

http://www.google.com/support/webmas...n&answer=35302

The problem is Volusion only allows you to have one robots.txt file.
What Google are telling you to do here is impossible on most hosting accounts. Most hosting configurations use one set of files for a site. What appears on an http:// page is exactly the same file as what appears on an https:// one. robots.txt will be the same file whether it's https:// or http://. What you can do is switch robots.txt files dynamically on the server depending on whether the request is SSL or Non SSL. Your site is on a MS IIS server which I have limited experience with, so I can't give you specific instructions on how to achieve this in your case (Linux/Apache is easy though).

Quote:
Originally Posted by biz817 View Post
I was able to find a couple of alternative solutions in one file but they suggest using different syntax:

http://forums.monstersmallbusiness.com/lofiversion/index.php/t17200.html
(second post down titled "KennyE" - suggested syntax: Disallow:/https:/)
This won't work because it tells the bots to disallow spidering of anything in a directory called /https://, what you want to block isn't in a directory.

Quote:
Originally Posted by biz817 View Post
http://www.vodahost.com/vodatalk/sea...gle-index.html
(fifth post down from Karen Mac - suggested syntax: Disallow: https:/)
I haven't seen this done before and Googling it turned up very little information. It could be an undocumented feature of the Google bot, though I'm surprised it wasn't mentioned in your Google support link above, though it could be a feature that was added after the Google post. It's well worth a try.

Creating entries in your robots.txt file and similar "fixes" is just putting a band aid on the problem. The real problem is your software not switching out of SSL when it should. Fix this, get your site reindexed and all should be well.
__________________
Lawrence
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Bookmarks

Tags
.htaccess, url rewrite


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Google
Home - Top

Edible Graphics, Affordable E-Commerce, Web Shops & Custom Form Scripts
Gadgets online
tech news, product reviews, the latest home and business technology, the latest in digital imaging


Content Relevant URLs by vBSEO 3.0.0