What is Log Analysis? How to Do Log Analysis for Technical SEO?
Log Analysis is the analysis of the log file that processes all requests received by the server line by line.
We can make the most accurate analysis for our on-page technical SEO work with log files.
In this way, we can be absolutely sure of how, when and how search engines crawl our site and pages.
With the help of log analysis, we can determine our most crawled pages, how we should configure the site architecture, and which response code at which times the important pages give, apart from our crawl problems.
The log file is kept on servers. You can ask your IT team to access these files or you can access this file via FTP or CPanel through your hosting provider.
By entering the file manager via cPanel, the files that you can access your log records are kept here as “logs”.
In a sample log file, respectively:
- IP address sending request to server
- Timestamp when the request is made
- Type of request (method)
- The response code of the server to the request sent
- ID of the source sending the request (user-agent)
- The page where the request is sent (URL request)
- The address to which the request is sent (Host)
Sample log file lines:
66.249.70.69 — — [06/Jan/2021:15:13:01 -0400] “GET /file/log-analysis HTTP/1.1” 200 278 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
This is how we can see the IP sending the request, the time stamp, the request type, the server’s response code, the downloaded byte, and the identity of the user sending the request.
Log files can be in 5 formats as Apache, Amazon Elastic Load Balance, W3C, HAProxy, JSON log files.
How is Log Analysis Used for SEO?
1.Making sense of whether the crawl budget is used well.
Crawl budget: This shows how many pages Google will crawl when visiting our site.
2.Don’t crawl important pages due to problems in the configuration of the website architecture?
For websites with bad site architecture, important pages may not be crawled or they may be crawled less than other pages.
Thanks to log analysis, we can identify them and determine a roadmap for our on-site SEO work.
3.Can the pages on the site give search engines 200 status codes for each request?
In cases where we don’t encounter Google Search Console or crawler tools, but there are cases such as overloading the servers, especially important pages should be able to give 200 response codes for each request.
4.Which bots visit our website the most?
Considering that the largest and most used search engine in the world is Google, we may want our website to be crawled mostly by Google bots.
5.Are there any orphaned pages on the site?
Pages that are not linked to any page are called orphan pages, and by discovering these pages, we can improve internal linking.
6.Are the pages containing parameters using resources inefficiently?
In particular, filters and their parameters on sites such as e-commerce and hotel reservation may use the crawl budget unnecessarily.
How Does Log Analysis
First of all, we may need to check the correctness of IP addresses, because we need to block fake or malicious bots from our site.
For this, we can check the incoming IP address.
To check the IP address, we can do our checks by using the reverse lookup tool on this link https://mxtoolbox.com/SuperTool.aspx.
We need to use log analysis tools to make sense of thousands of lines of files.
Tools we can use for log analysis:
- Screaming Frog Log File Analyzer
- OnCrawl Log File Analyzer
- SEmrush Log File Analyzer
- Google Search Console Tarama İstatistikleri
How to Log Analysis with SEMrush Log File Analyzer?
With the SEMrush Log analysis tool, we can only make an analysis for googlebots.
Here we can easily find the pages most crawled by the bots, their crawl frequency and the most recent crawl times.
How to Log Analysis with Screaming Frog Log File Analyzer?
Screaming Frog Log File Analyzer allows us to perform a more comprehensive log analysis.
With this tool, we can perform log analysis for medium and large sites.
In the first tab, we can quickly see the data of this analysis in the summary section.
If we want to make an inference from this summary data;
For example, we see that we have a site with 1000 URLs as Unique URL, and the number of crawled unique URLs is 10 in our log analysis.
Accordingly, we can say that it takes 100 days to crawl all pages on the website.
#note: This number is not usually that way, it varies depending on how often bad and worthless pages and important pages are crawled.
In the section analyzed as errors, we can see how many resources we have that spend our crawl budget and have error response codes.
We can use other tabs and bot type filters to refine this data.
In the Event tab, we can see which user agents have made how many requests to which of our pages and their response codes.
On the URL tab, when we select the events in descending order, we can find which pages are most crawled and the least crawled pages.
In this section, you can discover your low-crawled but important pages and create an internal linking strategy from high-crawled pages.
We can analyze the size of the pages by ranking average bytes in the URL tab.
This analysis provides us with information on the efficiency of our crawl budget.
We can analyze information such as which pages are too large in size and maybe preventing the crawling of other pages by spending the most resources during browsing.
If you have old and outdated but crawled pages, you may consider updating and improving these pages or you can take different actions to avoid wasting resources.
Large javascript files will also reduce the efficiency of the crawl budget.
In order to detect them quickly, we can detect files that consume a lot of resources here by filtering Javascript on the URL tab and consider optimizing them.
Filters such as Javascript, HTML, CSS, Image can be used for file types here.
In the response code section, we can see which response codes and how much they give these response codes to requests within the analyzed log file range (for example 1 month).
For example; We can see that a page crawled by googlebot 100 times for 1 month gives a response code of 200, 90 times and a response code of 4xx or 5xx for 10 times.
In the Directories section, we can see our site structure as files in this tool.
In this way, we can see which categories and subfiles are crawled more and which ones are crawled less.
The analysis here guides us for internal links in order to achieve category balance.
Thus, we can improve our site structure.
How to Use Google Search Console Crawl Stats for Log Analysis?
We can also use Google Search Console crawl stats reports for log analysis.
On Search Console, we can see which response code the pages give to bots in percentage distributions.
We can also see the request percentages according to the file types on the pages.
For example, HTML requests are 56%, Image requests are 4%.
We can also analyze which pages by expanding this analysis.
Thus, we get information about our pages with 4xx response code.
We can see the pages that were crawled for the first time and previously crawled and which user-agent visited the site more.
In summary, we can improve our SEO performance with log analysis.
You can also get ahead of your competitors by using Log Analysis to improve your technical SEO performance.