There are various good reasons you might have to have to search out every one of the URLs on a website, but your correct aim will determine Whatever you’re searching for. For instance, you may want to:
Detect every single indexed URL to investigate problems like cannibalization or index bloat
Acquire latest and historic URLs Google has seen, especially for website migrations
Find all 404 URLs to recover from publish-migration mistakes
In Each and every situation, a single Software received’t Supply you with anything you will need. Unfortunately, Google Lookup Console isn’t exhaustive, as well as a “web page:example.com” search is proscribed and hard to extract info from.
In this particular article, I’ll stroll you through some applications to make your URL listing and just before deduplicating the data employing a spreadsheet or Jupyter Notebook, determined by your site’s sizing.
Previous sitemaps and crawl exports
If you’re searching for URLs that disappeared through the Reside site not long ago, there’s an opportunity someone with your group may have saved a sitemap file or simply a crawl export before the alterations ended up built. In the event you haven’t previously, look for these data files; they can frequently supply what you may need. But, when you’re examining this, you probably didn't get so Blessed.
Archive.org
Archive.org
Archive.org is a useful Device for Website positioning jobs, funded by donations. In case you hunt for a domain and select the “URLs” option, you could accessibility nearly 10,000 mentioned URLs.
Even so, There are some limitations:
URL limit: You could only retrieve around web designer kuala lumpur ten,000 URLs, which happens to be insufficient for greater web-sites.
High quality: A lot of URLs could possibly be malformed or reference resource files (e.g., pictures or scripts).
No export alternative: There isn’t a developed-in strategy to export the list.
To bypass the lack of the export button, make use of a browser scraping plugin like Dataminer.io. Nonetheless, these limitations necessarily mean Archive.org might not offer an entire Resolution for more substantial internet sites. Also, Archive.org doesn’t reveal whether Google indexed a URL—however, if Archive.org observed it, there’s a fantastic possibility Google did, too.
Moz Professional
Although you may commonly utilize a connection index to locate exterior websites linking to you, these equipment also find out URLs on your internet site in the method.
The best way to utilize it:
Export your inbound one-way links in Moz Professional to acquire a swift and straightforward list of concentrate on URLs out of your web page. In case you’re addressing a large website, consider using the Moz API to export info past what’s manageable in Excel or Google Sheets.
It’s vital that you Be aware that Moz Professional doesn’t verify if URLs are indexed or found by Google. Nonetheless, due to the fact most web sites utilize the exact same robots.txt guidelines to Moz’s bots as they do to Google’s, this process commonly functions effectively for a proxy for Googlebot’s discoverability.
Google Search Console
Google Search Console delivers quite a few valuable sources for creating your listing of URLs.
Links experiences:
Much like Moz Pro, the One-way links area offers exportable lists of focus on URLs. Regrettably, these exports are capped at one,000 URLs Each and every. You can use filters for precise internet pages, but considering that filters don’t utilize towards the export, you could possibly need to depend on browser scraping resources—limited to five hundred filtered URLs at any given time. Not perfect.
Functionality → Search Results:
This export offers you an index of webpages receiving research impressions. While the export is restricted, You need to use Google Look for Console API for greater datasets. There's also totally free Google Sheets plugins that simplify pulling far more extensive facts.
Indexing → Pages report:
This area provides exports filtered by challenge form, even though they're also minimal in scope.
Google Analytics
Google Analytics
The Engagement → Web pages and Screens default report in GA4 is a superb source for collecting URLs, having a generous Restrict of one hundred,000 URLs.
Even better, you can use filters to create different URL lists, correctly surpassing the 100k limit. Such as, if you'd like to export only weblog URLs, abide by these actions:
Phase 1: Include a segment on the report
Phase 2: Click “Create a new phase.”
Action 3: Determine the section using a narrower URL sample, which include URLs that contains /weblog/
Note: URLs present in Google Analytics may not be discoverable by Googlebot or indexed by Google, but they offer precious insights.
Server log files
Server or CDN log information are perhaps the ultimate Device at your disposal. These logs capture an exhaustive checklist of every URL route queried by buyers, Googlebot, or other bots through the recorded time period.
Things to consider:
Details sizing: Log documents could be huge, so many web pages only retain the last two weeks of knowledge.
Complexity: Examining log files might be challenging, but different instruments can be obtained to simplify the method.
Blend, and superior luck
Once you’ve collected URLs from all of these sources, it’s time to mix them. If your internet site is small enough, use Excel or, for greater datasets, resources like Google Sheets or Jupyter Notebook. Be certain all URLs are constantly formatted, then deduplicate the listing.
And voilà—you now have an extensive listing of current, previous, and archived URLs. Fantastic luck!