Last Updated January 7th, 2020 at 11:35 amToday we want to explain to you how to track the indexing of sensitive PDFs and block their access to the Google robot . You have no idea what we are talking about?Let’s give an example:Once upon a time, there was a client who sold courses over the internet. As it had a bad programming of the site , the malaysia phone book online robots indexed internal PDF’s with the classes of those courses. What is the problem? Well, customer sales were down : PDF’s should be administered to users as an answer to their inquiries, and the intention (obviously) was for them to pay for it . Anyone could Google the PDF and download it for free.With this example, we can draw three conclusions :PDF’s achieve good position .Poor site programming can lead to decreased sales .You have to be careful with the tools that track positions : if we don’t check what URL you are registering, we could consider that we are on the right track when in reality we are throwing money away.Not everything is Big Data when it comes to blocking PDF’s
With Search Console we can get a lot of information.
Within “Search traffic”, in “Search analytics” we can filter pages that contain PDFs in their URLs: thus we will have a notion of the traffic that reaches the Malaysia Phone Number List pages from the search engine . Here we must check if we want to block all PDF’s or if we are interested in indexing some: for example, in the case of the course vendor, a free sample can be provided; a summary or an introduction.site traffic – block pdf’s Traffic info graphicThe software ping pong: bouncing the Google robotTo avoid indexing, Google provides several methods :A meta robots no index tag located in the <head> section of the site’s html codeIf we do not have access to the server it can almost always be applied from the CMS that manages the site: this is clearly an advantage. The problem is that it does not work for PDF’s because they do not have html code .
URL removal in Search Console . This method partially fixes the problem, but we do not recommend it to fix the root problem.url removed – block pdf’sThe robots.txt meta file . It is simple to apply, you only need FTP access to the site server. Inside Search Console there is a tool to test the changes and then download the final robots.txt file to upload to the root of the site. Simply adding the line “Disallow: * .pdf” we block the access of the crawler .Robot.txt – Block Pdf’sThe tester is useful to verify with any of the PDF’s found previously:pdf disallow – Block pdf’sConclutionIt is highly recommended that you check Search Console weekly for pages receiving traffic for unwanted indexing. Google dedicates a large amount of resources to improve its little robot: it scans the content of the millions of websites that exist every day . It’s part of our job to make sure you’re on the right track.