Hi,
I created a recipe to scrape the data from a Crunchbase search that has 22k results.
Crunchbase show 50 rows per page hence, 22,000/50=440 pages.
I selected my rows and cols and the easy Nav finder easily found the “Next” on the page and I set it going.
After trying it once or twice and not getting the output I was expecting, I was watching to the log file (see below) and noticed that it would progress till it reached PageID=20, and then restart at 2.
So its not a factor of the Nav/Next page not working (It goes from 1 to 20 OK) and its not at the roll over from Pg9 to 10 (1 to 2 digit) or Pg 99 to 100 (2 to 3 digit) so I am perplexed as to why DataMiner is doing this.
So now I have burnt through a bunch of my “page per month in my subscription” to re-scrape the same page, and I have not got the data I need.
(Trying to post this I get a restriction of only 2 URLs in a post so have replaced https://www.crunchbase.com/ with … below)
Has anyone seem this behaviour before and know how to fix it please?
Thanks
Ian
Here is the output from the Log
Scraping logs
Page: “…discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499”
22:03:15 Not Started
22:03:25 Scrape Requested.
22:03:25 Scraping Data: Crunchbase Sg Private Company
22:03:25 Scraped Page.
22:03:27 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=2_a_d94ecad4-dc36-4bfa-9f28-461c9b1244b4”
22:03:37 Scraped Page.
22:03:39 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=3_a_00a1b342-5f4d-4dbc-0687-dc4c5ef900bc”
22:03:52 Scraped Page.
22:03:53 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=4_a_d75aee19-b425-1a27-eaa9-18cd45c821b8”
22:04:07 Scraped Page.
22:04:08 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=5_a_9cc14b9c-da70-2da3-3643-9056ea285e07”
22:04:21 Scraped Page.
22:04:23 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=6_a_61aadb9c-a60d-45d9-976b-cdcf71700604”
22:04:36 Scraped Page.
22:04:39 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=7_a_abb252ed-9deb-4f84-87c3-42b2e59597f4”
22:04:52 Scraped Page.
22:04:55 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=8_a_0096880d-1427-ada2-c5c4-6996f2f6639d”
22:05:06 Scraped Page.
22:05:08 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=9_a_eff4a653-b90d-4b02-a551-abebd10f5b4e”
22:05:20 Scraped Page.
22:05:21 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=10_a_9b49dc21-99a4-4ba2-bb6d-d7295004f5d3”
22:05:35 Scraped Page.
22:05:36 Waiting…
Page: …/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=11_a_78dfd53e-4519-4bde-a6e8-f39e92c58ceb"
22:05:46 Scraped Page.
22:05:48 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=12_a_aedfe7d0-0348-4d5c-ba93-743b36a6f71a”
22:06:00 Scraped Page.
22:06:01 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=13_a_114bcec0-35d8-462a-88e7-d66055abcef6”
22:06:11 Scraped Page.
22:06:13 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=14_a_cdfe812d-8f2e-47ad-bdc4-f2e391dcdb96”
22:06:23 Scraped Page.
22:06:24 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=15_a_f4d737cf-a282-42b5-89ae-5f4aec6cb6b0”
22:06:38 Scraped Page.
22:06:40 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=16_a_8df40707-365a-4121-97a8-feec5c631bfd”
22:06:51 Scraped Page.
Page: "…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=17_a_6385c794-4a95-40e0-a90b-667fe7917985
22:06:53 Waiting…
22:07:04 Scraped Page.
22:07:07 Waiting…
Page:" …/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=18_a_085714a3-e5a4-408e-9601-94e3324528da"
22:07:18 Scraped Page.
22:07:20 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=19_a_34f18b86-9aa2-456c-94df-1731ca4dcb99”
22:07:32 Scraped Page.
22:07:33 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=20_a_cb46bb4c-7023-4ae2-9f17-48c8f7aa5788”
22:07:48 Scraped Page.
22:07:49 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499”
22:08:02 Scraped Page.
22:08:03 Waiting…
Page: …/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=2_a_d94ecad4-dc36-4bfa-9f28-461c9b1244b4"
22:08:16 Scraped Page.
22:08:18 Waiting…
Page: “…/discover/saved/sg-private-companies/efc04efd-e14e-45df-99c2-5dd2089f4499?pageId=3_a_00a1b342-5f4d-4dbc-0687-dc4c5ef900bc”