Archive for the 'search engines' Category

Site Router

Friday, September 21st, 2007

A new project is under development, it is called “SiteRouter“. SiteRouter is a system for managing RSS, Sitemaps between other things - judging by the name, one can easily imagine some of the things that this software hopefully will have. The SiteRouter is being developed as an easy plugable system - this means that with some easy adjustmentes everyone will be able to integrate it into already existing or developing project. The first version is being written in PHP 4, but if it will pass the tests with success, the version in ASP.NET will come right after the first stable release. MySQL is the choice for the database, but there are plans for PostgreSQL as well. At the moment I am looking into releasing it under GPL.

SiteRouter will have a plugin system some day, this will help other databases and action modules integration, development and upgrade. As for the visual part there are some icons that i have found on the net, but which will be eventually replaced in the nearest future.

I am still not sure, when the first version available to the public in general will be available, but i hope, that at some point in October it will reach this stage.

Google updates Trends and Analytics

Wednesday, May 30th, 2007

This month, Google has released quite a number of updates to its services: Google Trends, Google Analytics and Google Maps.

Google trends has introduced a new area - “hot trends“, which display the hottest trends in the search, or the way i understand it - it should show those phrases which have received a sudden increase of the searches. Besides hotness (right now from “medium” to “on fire”), the results also displays related searches (similar ones), peak time and most used searches location. During an intensive web search engine optimization it could be quite an important criteria to look for.
Trend Hotness

Google analytics has launched a completely redesigned interface, completely based on Ajax. They are still maintaining the elder one, for a couple of weeks, but the users are suggested starting their “migration” to the new one. I found the new interface sometimes being quite unstable, crashing the Firefox completely, but since it must be a “beta” release, i hope they will improve the service very fastly. No webmaster or search engine optimizer, who uses Google Analytics will like to continue using quite buggy product. Besides some crashes (something to do with the Firefox extensions ?), the new interface is quite different, and one have to get used to it, so i confess, that for the most of the time i still continue using the old one, while trying to push myself into learning the new one. I hope to be able to “move” to the new one during next couple of days.

I am still fuming about Google Zeitgeist and the absence of Portugal from the results, and by the way, at this very moment they have 2 Irelands listed at the menu links, which should be quite a surprising fact to the united kingdom government. I gotta email to someone at Google, telling them to stop ignoring portuguese optimizers. =O)

p.s. I have forgotten to mention the Google Maps update, which includes now 360 Streetside Views. As usual, this update is for US-based map only(it will be available in Denver, Las Vegas, Miami, New York and San Francisco at the moment), but in the nearest future. Some people over the net have already reported that it helped them avoiding traffic, so it sounds to be a good update, but since Portuguese map has arrived more then a year after american and british counterparts, i am not expecting any “important” info for me in the next months.

Jakob Nielsen on sitemaps in 2000

Thursday, May 24th, 2007

Today, while browsing an old book called “Designing Web Usability”, written by Jakob Nielsen in the far 2000 (some 7 years ago), i have found a reference to the idea of distributing site content to the search engines in the form of the. On the page 238 there is small subtitle “Integrating Sites and Search Engines”, where he discusses the idea of integrating sites more closely with search engines. The problem for its implementation was considered at the time of writing - the agreement of a standardized method for encoding the user’s query terms. Right now we know, that the Search Engines can agree on the way of crawling the websites, but at the same time we know, that they use a lot of own “meta-extensions” to the crawling, like Google’s “no-follow” or Yahoo’s “no-content” for example. While user’s query terms are really far away from being interpreted by the sites (at the moment of writing it seems that only Yahoo pays attention to the meta content tag), the search engines have already done the first step into the direction of a better cooperation with website administrators and into the assuring of a better quality and better search results content.

Interesting, is that recently the governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites. Of course it is quite a publicity stunt with a support for Google in the first place (creators of the sitemap protocol), but at the same time it is a quite a recognition of the Sitemap protocol. First Yahoo, then Microsoft and Ask.com has recognized it, and now we have some “enterprises” from the public sector coming for its support. Way to go, Jakob, i am looking for your next book =O)

Yahoo’s robots-nocontent

Friday, May 4th, 2007

Yahoo has announced that they introduced a new way of marking extraneous content: a CSS class, which is called “robots-nocontent“. First, when i have read about it, i could not believe my own eyes - so i read this again and again … A CSS class which will serve for one search engine, sounds weird, isn’t it ? Now that is one kind of a CSS class - i can imagine hundreds of pages all over internet filled with robots-nocontent tag, making it all less and less relevant and more ridiculous. Yahoo is obviously making fun of us, but no, it was not announced on fools day.

Yahoo has provided some examples of robots-nocontent usage:

<div class="robots-nocontent">
Hehe this is a extraneous content, please ignore me
</div>

Personally, I hope that people won’t step into this rather disappointing game of using CSS classes to mark their content. In my personal opinion, it is absolutely unappropriated way of coding the web page. Mixing something, which is intended purely for the presentational use, with a something which is just a content processing algorithms(bots), is quite a “hack”, which with no doubt could lead to some serious problems in the future (imagine people starting using CSS instead of robots.txt for redirection and indexing). There must be another way to avoid such happenings, and by the way - why should web designers create the way the search engines are indexing their sites. Its like being lazy, and if Yahoo is lazy about it, then i do not see why web designers should be excited about it.

I do not wish to see pages full of class attributes, written with half a dozen of classes, something like this is surely to avoid:

class="footer classical myspace robots-nocontent"

Final thought: avoid it, it does not make any sense at all mixing the things which were separated by the mean and by the design. Presentational layer(css) should not provide functionality, which even its own functional layer (xhtml) is not providing by design.

Google Webmaster Tools

Friday, April 20th, 2007

Google has done quite a lot of changes to Google Webmaster Tools console recently. Just in a couple of days such things as a “Page Analise Link Keyword” and “Content Removal Tool” have appeared there.

First the Page Analise Link Keyword and Phrases functionality, it is showing now name variations of the links. First the functionality of the external links has been implemented at Google Webmasters Tools, but it has shown only the name of the external links, with no details of how those names are written or variations between the anchor text, but right now there is a new part at the same page of the Statistics Page analysis, but with much more information about the links text to your site, and as usually all this information is available for download in CSV format. Links text is one of the most important criterias for the search engines, and every webmaster like to know with what text other pages are referring their site, and Google Webmasters Console gives exact information about top 100 links to your page, so this is some information you won’t be wishing to miss.

Secondly, there is new “Content Removal Tool” which comes in line with a whole sequence of the changes related to the robots.txt file. Google Webmaster Console has added a new functionality to remove content without altering a line in robots.txt, with a help of new interface, you can remove indexing from selected files to whole directories on your site. I see that this is a very good improvement, but as the web is not equal to Google (there are still Yahoo, Altavista, Ask and MSN around), every reasonable webmaster will still need to remove manually in robots.txt all those files and directories. But if you don’t have access to your server and need to remove some file or directory from Google just go to your Google Webmaster Console, select URL Removals -> New Removal Request, and then just enter its name, and you are done.

Google Webmaster tools started very slowly with quite a few functionality available, but with no shade of doubts, Google has improved it quite a lot in the recent months (first of all with exact links to your site) and it is exciting to see where they are going to take it next. With Yahoo’s SiteExplorer appearing on the horizon this is quite a powerful move, and i know quite a number of people who has “switched” from SiteExplorer to Google Webmaster Tools console in the recent 3-4 months.

Google sitemaps final version

Monday, April 16th, 2007

A couple of days ago, without big announcements Sitemaps.org has released the final specification for sitemap protocol. A version 1.0 of the sitemap protocol now besides being supported by the big 3 search engines (Google,Yahoo and Microsoft) is also being supported by the Ask.com. I hope that the rest of the search world will join this effort in helping the search engines the website structure recognition.

The sitemaps.org site have also updated and published the information in 18 (eighteen) languages, which is quite a surprise for me. Portuguese language was not forgotten, so i am a kind of happy besides still complaining about Google Zeitgeist, but at the same time, the front page in all languages is still referring to the 0.9 version of the sitemap protocol, which is already an outdated specification =O)

Also a notable fact is that the search engine bots are supporting sitemaps reference from the world famous robots.txt file. The syntax is quite simple:
Sitemap: http://www.mysite.com/sitemap.xml

This is quite an interesting and very smart way of sitemap publishing - this way you don’t really have to submit your sitemap to each and every search engine, when they visiting your site, they will find about the sitemap from the robots.txt, so it is a great optimization of time. No more manual sitemap submission is the most important change done to the way, the sitemap protocol is handled.

Top 10 Drupal modules

Thursday, March 22nd, 2007

1. Askimet - how can you build a site, with a possibility for users to leave comments and not to have this plugin ? There are thousands of spam bots around, leaving hundreds of stupid and sometimes offensive comments wherever they can. Askimet is a perfect plugin to stop them from doing it on your Drupal site. Originally created for Wordpres, the Askimet is absolutely essential for any community-driven site.

2. Category - allows you to structure your site and to organize content with categories, which is quite useful as for SEO. Categories and containers can be created as nodes and the content can be assigned to the categories. The category module will improve your site navigation vastly, turning it into more tree-like hierarchy.

3. XML Sitemap - generate dynamic sitemap for keeping search engines well informed about the changes in your site structure. At the moment of writing only Google and Yahoo providing the services for using this information directly, while MSN is already working on a similar solution, they have already announced about joining the sitemaps standard. For any webmaster this is a must have module.

4. Nodewords - which is also known as “Meta Tags”, a module which gives you control over meta tags and their content. I have seen a lot of Drupal-based sites completely free of meta information. From the site description to keywords and Geo tags, all that is to be controlled by the Nodewords. A good site may not have an empty <head> section =O)

5. Page Title - lets you customize every page title the way you wish. It is a very important factor for SEO and even if you do not care much about it, altering page title conforming the content you providing is so important for usability. A lot of times, the title of the page is not _exactly_ the same title that you are using for your heading, for example when providing a bigger view over the content of the page, you might choose to skip some of the words while adding others - for all those purposes and even more, i need “Page Title” module for every Drupal installation.

6. Path Redirect - Imagine, that you are moving some of your pages from one location to another. All the links that the search engines have indexed, and your partners have placed on your site are going to be destroyed, if you won’t do something about it. You can ask all the sites that are linking to you, to alter their links, but first - it will take some time and second - some of them won’t be available to do that; and what will you do about the search engines, waiting for Google or Yahoo to reindex your links will take some very serious time, and in the mean time, your potential users and customers will be hardly disappointed. Path Redirect solves this problem.

7. Views no modern Drupal site is created without this module. This module is essentially a smart query builder that, given enough information, can build the proper query, execute it, and display the results. The views module can give you the flexibility, that for example elder versions of Drupal were completely incapable of doing. If you want to sort your content differently, if you need to display a block with the 5 most recent posts of some particular type or if you require to provide ‘unread forum posts’. A lot of different modern Drupal modules also depends on the Views module.

8. Update Status - if you wish to have version control of your modules, then this is the best way of doing it. Update Status can automatically check new versions of installed modules and notify you at the administration panel right after you log in. Having a lot of modules on the Drupal installation will oblige you checking the updates very regularly, and that means visiting dozens of pages every couple of weeks, which is not a big fun. Update Status was created exactly to help resolving this problem. This module is only available for versions starting with Drupal 5.

9. TinyMCE - is the module that you probably can’t live without. Having anyone responsible for the content, who does not understand XHTML will be a disaster without this module, and in so many cases, the people don’t have an idea of what XHTML is. I believe it is a shame, that Drupal does not have a default editor for the image uploading, it’s hard to find any CMS which does not have this functionality. TinyMCE will solve all problems with images inserting by providing nice usable interface. One word of caution - consult this TinyMCE compatibility chart before you really starting using

10. PathAuto - is a module for generating automatically the path aliases for all possible types of content. When having a lot of content appearing almost every day then no one will be able to invent new url for every content post. The PathAuto module handles these cases, generating path aliases based on the content of the page.

There are some other modules worth mentioning, but they all depend on the implemented project, but may become quite popular with the time, such as Adsense, Flash Video (until the new <video> tag is not available in HTML, its a nice way to have videos), Video module (is an alternative), Events (a lot of communities have events =O)), Pdf View (there are so many times things that you might need in PDF format),