Monthly Archives: April 2012

How to use the Google WMT crawl data – once you fix the date format!

The crawl data that you can download from Google’s Webmaster Central Tool would be much more useful IF you could arrange the data by date. You can’t pivot or sort data if the data is [first of all] in American date format, and 2ndly some is in date format, some in text format and some just throws an error.

So, how do you cleanse this data?

Step 1 – download your crawl data

Click on the “Download” option inside in Google’s webmaster tools interface

Step 2 – Open excel and import your data as “From Text”

Here is the advice from the Excel help file

Importing text files
You can use Excel to import data from a text file into a worksheet. (On the Data tab, in the Get External Data group, click From Text.) The Text Import Wizard examines the text file that you are importing and helps you ensure that the data is imported in the way that you want.

There are two ways to import data from a text file by using Excel: You can open the text file in Excel (which does not make a connection to the text file), or you can import the text file as an external data range (which does make a connection to the text file).

There are two commonly used text file formats:
Delimited text files (.txt), in which the TAB character (ASCII character code 009) usually separates each field of text.
Comma-separated values (CSV) text files (.csv), in which the comma character (,) usually separates each field of text.

You can also change the separator character that is used in both delimited and .csv text files. This may be necessary to make sure that the import or export operation works the way that you want it to.

You will be importing a .csv

Step 3 – Use the Text Import wizard

Use the Text Import Wizard. I have never had to make any changes on the first step.

On step 2 of 3, you should choose COMMA and not Tab.

Here is the important bit. On step 3 of 3 you can choose what is the format for the import. So, in the “Data Preview” pane, scroll to the right. Select the Date column and choose your preferred date format (see arrow).

Excel's Text Import Wizard - choosing data type

Excel's Text Import Wizard - choosing data type

Click Finish. Select the location you want your new table.

You would think that you would now be finished! No, Google also give us 2 date format


Standarding the year format

I am sure that there must be an easier way than this, please share if you know. You can’t just format cells to date. So, insert some new columns to the right of the date column.

  1. Hightlight the Date column. Select “Text to Columns” on the “Data” tab
  2. In the wizard, choose to seperate by “other”, enter a slash “/”.
  3. Now use the good old fashioned “Find and replace”. Replace “12” with “2012”, then find all the accidentally created “202012” and replace with “2012”. And repeat if you still have 2011 data in there
  4. Then just Concatenate the 3 columns back together using this formula structure (assume in columns 1,2 and 3 and use the text insert of a slash inbetween quotation marks)
  5. =CONCATENATE(a1,”/”,a2,”/”,a3)

  6. Double click on the bottom right hand corner of the cell to copy all the way down
  7. Copy the resultant column, “Paste Special” next to it as “Values only”
  8. Delete the rest of the working columns

You are now ready to actually use your own data. Google please help us!

However, there are still bad data and formats in the data. So, I always prefer server side tools.

How does Google work?

Its a start. Google shares everything, but still doesn’t tell you much in absolutely detail, as normal. They are making SEO accessible to the masses, by explaining to the lay-man about crawling and indexation on the basics of retrieving and recalling web pages. Shame he didn’t go into personalisation and how social signals are influencing that.

Without further-ado, here is the latest video presented by Matt Cutts.