LinkedIn Webscraping Tools
Jump to Code/Solution
If you want to skip the article and go to the code and solution then just go straight to the LinkedIn Scraping Toolkit Jupyter notebook or its repository on Github.
Background
If you use LinkedIn you might have seen tips when you login saying things like 'profile views matter' followed by suggestions for improving your profile to make it stand out more.
Well... one good way to get more profile views is to actually view others' profiles. Have you ever logged on at LinkedIn to see that someone you didn't know has viewed your profile. Often when this happens I would, out of curiosity, view the profile of the person who viewed me. Sometimes it is a hiring manager, or a recruiter, but I've had others--students, or other consultants even. At any rate, getting a profile view has the potential to spark curiosity.
Well, I considered several months ago that if I were looking for a job in a specific location and specialization--it might be useful to have some code that 'views' all the local people with potential power to hire me. If my tagline said something like 'looking for work as x' (where x=whatever you are interested in), then those people may get in touch if they are (1) actively recruiting, or (2) considering the idea of hiring someone. I suppose the same would be more potent if they were existing connections on LinkedIn.
Cracking LinkedIn's Shell
So, that is what I have attempted to do in my Jupyter Notebook.
Finding Relevant Local Companies
My first problem was trying to get a way of targeting a search on companies local to one city. But that is not possible with a free (non-premium) LinkedIn account; so I looked at Glassdoor and Companies House (in the UK) but then decided LinkedIn job postings would serve as a great source of local employers--and you can definitely search jobs by location!
So, in the notebook, I get local jobs then pick off the companies posting those jobs.
(I just want to say--at this point--that I tried going down the API route for data initially--but looks like since 2016 and again in 2018, LinkedIn have locked down API access pretty hard. As far as I could tell I am only able to grab my own profile data from the API--but if anyone knows more by all means email me at the address below.)
Exposing their People
Next, I needed to open up those companies and reveal their people that matched certain criteria (e.g. "Analytics Chief"). I did this by manipulating the search results urls and looping through paginated results for each company and for each search term string. I was a little nervous about busting LinkedIn's throttle limits so I added sleep statements to make it feel a bit more like the bot is a real person browsing the site. For anyone that came up in paginated search results, and that was in my area of interest, I viewed their profile.
As I mention on the Github readme file, the code is a bit volatile for reasons I currently can't explain. I added sleep statements to give the pages a chance to load properly, and also had to maximise the window so the brower window is wide enough for elements to be fully visible and loaded on the page. I also added scroll-down code so that the javascript always has a chance to load the page before Selenium attempts to grab the html.
Despite these measures, I still see it crashing when a page hangs for too long at certain points in the code. However, when this happens I can stop it and re-commence again after modifying the companies.csv file. Hopefully you can use the notebook to your advantage and if you do so you will see what I mean and be able to work with it.