SEO Hardcore

Search Engine Optimization

  • Home
  • about
  • archives
  • sitemap
May 28th, 2008

SharePoint 2007 301 redirects

john in MOSS, caching

Each time you request a Site Collection (http://domain/) or a Site (http://domain/foo/) of your Publishing Site you get redirected to the .aspx”>http://domain/Pages/<WelcomePage>.aspx. SharePoint 2007 uses the 302 header (location temporarily moved) for this purpose. Surprisingly even WSS uses the 302 header to redirect a root url to the default.aspx. In comparison ASP.NET uses an internal redirect to render the default page when the root url requested: there is no redirect in this situation.

The whole issue about the 302 headers is that the redirected locations don’t get crawled by search spiders which don’t follow temporarily moved pages. While it’s not really an issue for intranet environments it has major impact on indexing the content of Internet-facing web sites and making them searchable using a search engine.

Looking for an answer I have researched the SharePoint runtime: SPHttpHandler, SPRequestModule and PublishingHttpModule classes. As none of these has given me a clear answer I have noticed that there are multiple references to the Redirect method present in the code which uses the 302 header as well.

To solve the issue I have designed a custom redirect HttpModule which uses 301 headers instead.

The requirements

The module must rewrite all request for a Site Collection or Site. Url’s of these request might but don’t have to contain trailing slash (/). Furthermore the module must distinct a WSS request from a Publishing Site / Publishing Web request. Also the module has to be aware of Variations if used by the Site Collection.

The work

Firs of all we create a new HttpModule. As we want the redirect to find place as soon as possible we will hook it up in the BeginRequest event. Furthermore we want the module to be the first one to interact with the request. As we use an external assembly we need to define it as the first element in the httpModules section of web.config.

namespace Imtech.SharePoint.Enhancement.HttpModules
{
public class RedirectModule : IHttpModule
    {
        #region IHttpModule Members

public void Dispose()
{ }

public void Init(HttpApplication context)
{
context.BeginRequest +=
new EventHandler(context_BeginRequest);
}

void context_BeginRequest(object sender, EventArgs e)
{
HttpApplication app = (HttpApplication)sender;
string requestUrl = app.Request.Url.ToString();
}

        #endregion
    }
}

Because we will need the Request url later on in quite a few places I have decided to store it in a separate variable.

The first requirement states that the module should redirect only requests for Site Collections and Sites. If the requirement wouldn’t have say that the trailing slash is optional you could solve it using a simple if (requestUrl.EndsWith(”/”)). In our situation we will have to use a Regular Expression in order to figure out whether we need to rewrite the url or not.

Regex regEx =
new Regex(@”^https?://.*(?<itemUrl>/[^/]+\.[^/\.]+)$”);
if (regEx.IsMatch(requestUrl))
return;

if (!requestUrl.EndsWith(“/”,
StringComparison.CurrentCulture))
requestUrl += “/”;

If the url matches the regular expression it means it’s a page request and should be passed on along the request pipeline unaltered. Later in the module we will combine the request url with the page url. As the trailing slash is optional I have decided to add it at the end if not present - just to be sure that combining the destination url of different parts will produce correct result.

The next requirement is distinction between WSS and Publishing Site requests.

string destinationUrl = String.Empty;

SPSecurity.RunWithElevatedPrivileges(delegate()
{
try
    {
using (SPSite site = new SPSite(requestUrl))
{
using (SPWeb web = site.OpenWeb())
{
if (PublishingWeb.IsPublishingWeb(web))
destinationUrl = String.Concat(requestUrl,
publishingWeb.DefaultPage.Url);
else
                    destinationUrl = String.Concat(requestUrl,
“default.aspx”);
}
}
}
catch { }
});

Based on the request url we create a new instance of SPSite and then open the requested web. As we can fail at this point already (for example when passing a list url) I have decided to catch the thrown exception to avoid turning the request into an error message. The distinction itself is quite straight forward and makes use of the IsPublishingWeb method. One important thing: because we are very likely to use the module for anonymous users we need to run the code with elevated privileges: the IsPublishingWeb method requires some extra permission in order to run.

Our last requirement was making the redirect module aware of Variations if used by the Site Collection. Depending on the requirements defined by your customer you might need to implement the standard SharePoint Variation logic which chooses the variation basin on the User Agent language settings. Unfortunately most users are not aware of the existence and usage possibilities of the language settings most of our customers choose to load the Dutch variation by default. If your customer requires the standard SharePoint approach you would need to implement the logic from the VariationRootLanding User Control in the ControlTemplates directory. I will focus on the scenario we’re using.

PublishingWeb publishingWeb = PublishingWeb.GetPublishingWeb(web);
if (publishingWeb.DefaultPage.Url.EndsWith(“/VariationRoot.aspx”,
StringComparison.CurrentCultureIgnoreCase))
{
string defaultPage = String.Empty;
using (SPWeb nlWeb = site.OpenWeb(“nl”))
{
defaultPage =
PublishingWeb.GetPublishingWeb(nlWeb).DefaultPage.Url;
}

destinationUrl = String.Concat(requestUrl, “nl/”, defaultPage);
}
else
    destinationUrl =
String.Concat(requestUrl, publishingWeb.DefaultPage.Url);

In most scenarios the variation redirect finds place at the Site Collection level. The default page of the root web is then set to Pages/VariationRoot.aspx. Knowing this we can check whether we need to use the variation redirect or not. The rest is quite straight-forward: we obtain the Dutch site and its Welcome Page.

The last part is the redirect itself using the 301 header:

if (!String.IsNullOrEmpty(destinationUrl))
{
app.Response.AddHeader(“Location”, destinationUrl);
app.Response.StatusCode = 301;
}

The destination url might be empty if an exception has occurred during the request processing. We will therefore redirect only if a destination url has been set by our module.

Putting it all together:

namespace Imtech.SharePoint.Enhancement.HttpModules
{
public class RedirectModule : IHttpModule
  {
    #region IHttpModule Members

public void Dispose()
{ }

public void Init(HttpApplication context)
{
context.BeginRequest +=
new EventHandler(context_BeginRequest);
}

void context_BeginRequest(object sender, EventArgs e)
{
HttpApplication app = (HttpApplication)sender;
string requestUrl = app.Request.Url.ToString();
Regex regEx =
new Regex(@”^https?://.*(?<itemUrl>/[^/]+\.[^/\.]+)$”);
if (regEx.IsMatch(requestUrl))
return;

if (!requestUrl.EndsWith(“/”,
StringComparison.CurrentCulture))
requestUrl += “/”;

string destinationUrl = String.Empty;

SPSecurity.RunWithElevatedPrivileges(delegate()
{
try
        {
using (SPSite site = new SPSite(requestUrl))
{
using (SPWeb web = site.OpenWeb())
{
if (PublishingWeb.IsPublishingWeb(web))
{
PublishingWeb publishingWeb =
PublishingWeb.GetPublishingWeb(web);
if (publishingWeb.DefaultPage.Url.
EndsWith(“/VariationRoot.aspx”,
StringComparison.CurrentCultureIgnoreCase))
{
string defaultPage = String.Empty;
using (SPWeb nlWeb = site.OpenWeb(“nl”))
{
defaultPage =
                       PublishingWeb.GetPublishingWeb(nlWeb)
.DefaultPage.Url;
}

destinationUrl = String.Concat(requestUrl,
“nl/”, defaultPage);
}
else
                  destinationUrl = String.Concat(requestUrl,
publishingWeb.DefaultPage.Url);
}
else
                destinationUrl = String.Concat(requestUrl, “default.aspx”);
}
}
}
catch { }
});

if (!String.IsNullOrEmpty(destinationUrl))
{
app.Response.AddHeader(“Location”, destinationUrl);
app.Response.StatusCode = 301;
}
}

    #endregion
  }
}

To see it working build the project, copy the assembly to the bin directory of your web application and add the following element to the httpModules section of the web.config:

<add name=“ImtechRedirectModule“
type=“Imtech.SharePoint.Enhancement.HttpModules.RedirectModule“ />

Summary

Redirects using the 302 header can form a serious issue on Internet-facing web sites as it comes to indexing the content of a web site. Using custom HttpModules to overrule the standard behavior of SharePoint is a flexible solution for this challenge.
The example above should work good enough in most scenarios. Depending on the requirements of your customers you might need to extend it with some extra functionality like for example standard Variations logic support. Custom HttpModules prove the extensibility and flexibility of SharePoint 2007 and the way it can be made to fit various requirements and scenarios.

Published Jan 21 2008, 10:46 AM by Waldek Mastykarz Filed under: Web Content Management, SharePoint customization, Accessibility, SharePoint Best Practices

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
no comment
August 28th, 2007

MS .NET 2.0 Caching

john in caching, spidering

I found some really good information about caching.  There were a lot of significant chaneges introduced in .NET 2.0 and the excert below lays it out better than I can. The excerpt below is on the 15 seconds website

    “…the ASP.NET 1.x Cache API does not allow you to invalidate an item in the cache when data in a SQL Server database changes. This is a very common capability most applications will require. ASP.NET 2.0 addresses this by providing the database triggered cache invalidation capability to ensure that the items in the cache are kept up-to-date with the changes in the database. You can accomplish this using any one of the following methods.
    • Declarative Output caching - This is similar to declarative output caching in ASP.NET 1.x, wherein you configure caching by specifying the OutputCache directive and their related attributes.
    • Programmatic Output caching - In this method, you will use the SqlCacheDependency object programmatically to specify the items to be cached and set their attributes.
    • Cache API - In this option, you will use the static methods of the Cache class such as Insert, Remove, Add and so on to add or remove items from the ASP.NET cache, while still using the SqlCacheDependency object to trigger the cache invalidation.
    Another important caching feature in ASP.NET 2.0 is the ability to create custom cache dependencies, which is not possible with ASP.NET 1.x Cache API. To accomplish this, you need to inherit from the CacheDependency class. Since the CacheDependency is a sealed class in ASP.NET 1.x, you can’t inherit and extend it. However, in ASP.NET 2.0, this is no longer the case. You can inherit from CacheDependency class and create your own custom cache dependencies. This opens up a world of opportunities where you can roll your own custom cache dependencies required for a particular class of applications. For example, you can create a StockPriceCacheDependency class that automatically invalidates the cached data when the stock price changes.“
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
no comment
August 28th, 2007

View State Performance

john in caching, spidering

I was learning about how to optimize the View State in MS applications like MOSS.  While investigating moving the tag lower in the source code to allow a spider to get to relevant content without crossing the view state I realized that once moved the state would fail to load in memory and break the page. I found a really cool article in the MSDN library article about View State that addresses the function, uses and optimization of View State. First, it is not always relevant.  A View State may exist in a stateless page.  Although the tag will not continue to bloat the way that a tag may if actually used, it is unnecessary and should be removed by turning it off.  Also, there are three other places to store your ViewState besides the page, and they are resoundingly efficient compared to its home in your source code.  Below is an excerpt of the data from the article referred to above.

ViewState Session Application Cache Globals
Number of Hits 3322 184883 18476 20117 16723
Requests / sec. 27.64 153.8 153,74 167.84 140.32
Total kb rec’d 68016.63 22923.25 22977.13 25074.51 20798
Percentage Difference N/A 456.44% 456.22% 507.24% 407.6%

The tab labeled VIEWSTATE represents the function working in the source code and serves as the baseline. As you can see the performance increaces are amazing. I recommend reading the entire article referenced above.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
no comment

Search

Rss

  • Main Entries RSS
  • Comments RSS

Recent Entries

  • SharePoint 2007 301 redirects
  • Preventing unwanted robots from crawling your site.
  • The top 10 ways to twist your ankle doing search.
  • .htaccess commands
  • Search in 2007
  • What can ACAP do for me?
  • ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?
  • Example uses of the visitor engagement metric
  • My thoughts about Omniture and WebTrends
  • SharePoint SEO Fundamentals

Recent Comments

  • Keine Kommentare vorhanden.

Meta

  • Register
  • Log in
  • Valid XHTML
  • Valid CSS 3.0
  • WordPress

Categories

  • analytics
  • apache
  • caching
  • duplicate content
  • google
  • htaccess
  • Local Search
  • microsoft
  • MOSS
  • robots.txt
  • search fun
  • silverlight
  • spidering
  • Universal search

Archives

  • May 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007

Blogroll

  • Charles blog
  • Data Sage
  • Garage Sale
  • Link Princess
  • SEO Hardcore
  • SEO Tools
November 2008
S M T W T F S
« May    
 1
2345678
9101112131415
16171819202122
23242526272829
30  
© 2008 Wired by SEO Hardcore
Dezzain Studio
Nature Pictures | Bamboo Blinds