A few days ago, on the #indieweb Freenode channel1 one of the users asked if we knew an indieweb-friendly way of getting data out of LinkedIn. I wasn't paying attention to any recent news related to LinkedIn, though I've heard a few things, such as they are struggling to prevent data scraping: the note mentioned that they believe it's a problem that employers keep an eye on changes in LinkedIn profiles via 3rd party. This, indeed, can be an issue, but there are ways to manage this within LinkedIn: your public profile settings2.
In my case, this was set to visible to everyone for years, and by the time I had to set it up (again: years), it was working as intended. But a few days ago, for my surprise, visiting my profile while logged out resulted in this:
$ wget -O- https://www.linkedin.com/in/petermolnareu --2018-01-14 10:26:12-- https://www.linkedin.com/in/petermolnareu Resolving www.linkedin.com (www.linkedin.com)... 22.214.171.124, 2620:109:c00c:104::b93f:9001 Connecting to www.linkedin.com (www.linkedin.com)|126.96.36.199|:443... connected. HTTP request sent, awaiting response... 999 Request denied 2018-01-14 10:26:12 ERROR 999: Request denied.
Despite the settings, there is no public profile for logged out users.
I'd like to understand what it going on, because so far, this looks like a fat lie from LinkedIn. Hopefully just a bug.
I tried setting referrers and user agents, used different IP addresses, still nothing. I can't type today and managed to mistype
https://google.com - the referrer ended up as
https:/google.com. So, following the notes on HN, setting a referrer to Google sometimes works. After a few failures it will lock you out again, referrer or not. This is even uglier if it was a proper authwall for everyone.
curl 'https://www.linkedin.com/in/petermolnareu' \ -e 'https://google.com/' \ -H 'accept-encoding: text' -H \ 'accept-language: en-US,en;q=0.9,' \ -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'