Show HN: Use cookies from Chrome (CDP) in cURL without copy pasting

juujian · on April 1, 2023

I have done a lot of scraping in the past. Cookies are a pain, this is a really elegant solution. Of course the biggest problem is that everything interesting is hidden away behind JavaScript these days and then you have to resort to Selenium and the whole thing just spirals out of control. But I'm looking forward to giving this a shot for non-JavaScript content in the future.

edit: JavaScript not Java

mkl · on April 1, 2023

Do you mean JavaScript? I have never run into content hidden by Java, but many pages load content dynamically using JavaScript.

I have found it's quite easy to snoop on those JavaScript API requests using the Network tab of Chrome Devtools, then copy the network request as a curl command for bash scripts or as JavaScript for browser extensions.

tomashubelbauer · on April 1, 2023

> I have never run into content hidden by Java

Tongue in cheek: You'd never know - servers running Java code generating HTML pages have probably conditionally not-rendered many pieces of HTML that you've never come across in your browsing :)

ghqst · on April 1, 2023

Yeah, you can sometimes find the API or find data sent in JavaScript but not in prerendered HTML, which can save you the pain of headless scraping.

juujian · on April 1, 2023

I do mean JavaScript. Not sure how many times I have made that mistake... And great advice, that sounds like a neat approach.

1vuio0pswjnm7 · on April 2, 2023

The term "everything interesting" is of course subjective. What is interesting to person A might not be interesting to person B. I never use Selenium and I generally have no problem acessing "everything interesting". The simplest example is reading and submitting HN comments. Presumably we all find this interesting enough. Javascript is neither required to read, vote nor submit to HN.

What if the phrase "everything interesting" was replaced with specific examples and questions. Something like, "I cannot access X without Javascript. How do I access X without using Javascript."

RockRobotRock · on April 2, 2023

HN is in the minority of websites in that it works completely without JS. Surely you're aware of this, right?

1vuio0pswjnm7 · on April 2, 2023

1. Define "works".

2. Provide examples of sites that do not "work".

It's possible that people might disagree on the definition of "works". For example, perhaps web developers might be biased toward a definition that puts them in control instead of the user. If I can retrieve information from a server with HTTP requests then the website "works" for me. As a user, I certainly do not need to use Javascript to make HTTP requests. Nor do I need to use a particular client.

One could argue that even HN does not "work" completely without Javascript. For example, the script at https://news.ycombinator.com/hn.js will not run.

totetsu · on April 1, 2023

There are python libraries you can use that import cookies directly from wherever your browsers stores them to use in selenium projects.

berkle4455 · on April 1, 2023

Javascript is delivered as text and sends text-based HTTP calls to the server to fetch more data. Why do you need selenium?

LelouBil · on April 1, 2023

if you don't want to reverse engineer the javascript

KomoD · on April 2, 2023

Most of the time you don't need to, just open up devtools, look at the network tab, locate the right request(s).

rhd · on April 1, 2023

I've once used Selenium to run javascript in the webpage to steal a few dynamic tokens required by the sites API to reuse in my more well-trodden python-requests workflow.

bdcravens · on April 1, 2023

If you'd be standing up CDP to grab the cookies, you'd probably use Puppeteer or Playwright instead of Selenium.

juujian · on April 1, 2023

Appreciate the recommendation, I just used whatever python had to offer, Puppeteer looks promising though!

bdcravens · on April 1, 2023

Using the tools at hand is often the best approach. That said, I've spent most of the last 13 years of my career automating browsers. For years, I used Selenium with a variety of libraries. After switching to Puppeteer/Playwright, I have zero interest in going back lol. Playwright actually has first party Python support. (Puppeteer has a port called Pyppeteer, but it's no longer maintained and the author recommends using Playwright)

https://playwright.dev/python/

rgrieselhuber · on April 1, 2023

I second Playwright, it's amazing.

robertlagrant · on April 1, 2023

Third.

thrdbndndn · on April 1, 2023

Feel like you can just read Chrome's cookie from the file (and filter out the ones you need by site, of course) so you don't need to bother run chrome in debugging mode?

Like https://github.com/borisbabic/browser_cookie3

toomuchtodo · on April 1, 2023

yt-dlp does this also.

https://news.ycombinator.com/item?id=28320666

thrdbndndn · on April 1, 2023

Thanks for the link. I know yt-dlp does, but from your link I found another library (https://github.com/n8henrie/pycookiecheat) that can do that and it seems more popular than browser_cookie3. (browser_cookie3 works totally fine last time I tried).

fipso · on April 1, 2023

This is awesome. I did not know decrypting chrome's password db is still that easy.

paulirish · on April 1, 2023

Cookies != Passwords..

But anyway... You know this is also easily accessible within DevTools, yah? https://umaar.com/dev-tips/3-copy-as-curl/

eurasiantiger · on April 1, 2023

One could argue that cookies need to be more securely stored than passwords, because they can allow an attacker to bypass passwords and all other authentication factors.

djbusby · on April 2, 2023

Password DB in basic-store uses the key 'peanut'.

herpderperator · on April 1, 2023

Why is the shebang `#!python` instead of `#!/usr/bin/env python`?

2h · on April 1, 2023

> Tired of copy pasting cURL commands from chrome to your terminal ?

FYI for anyone that does this, MITM Proxy is usually a better option for this type of stuff. Not sure about Chrome, but especially with Firefox, you have no way of getting the full raw request on anything with a request body like POST. You have to Copy Request Headers, then Copy POST Data. With MITM Proxy or similar you can just get the full request at once. Also you can inject headers like X-Forwarded-For into all or specific requests.

folmar · on April 1, 2023

> you have no way of getting the full raw request on anything with a request body like POST

On FF right click on Request -> Copy Value -> As cURL This gives everything and works with POST since a few years at least.

2h · on April 1, 2023

my comment said "full raw request", not whatever you are talking about.

folmar · on April 4, 2023

This is literally the full request.

2h · on April 7, 2023

how many times do I need to say "full raw request" before you understand it? I am talking about the wire format HTTP request, not a shell command.

natpalmer1776 · on April 1, 2023

Repo has a single open pull request where someone (by their own admission) just fed the code into ChatGPT then pasted the results into a pull request without actually testing it.

Has anyone seen this happening anywhere else?

fipso · on April 2, 2023

Honestly thought it's troll cuz April fools or something, but the guy actually has like 10% usefull suggestions in the PR. Will probably close that one tho.

crimsonraven · on April 2, 2023

Woah! I didn't realise it was 1st April when I raised that PR. Sorry if it isn't of much use (as can be expected of straight GPT responses). I didn't mean to come out as a troll.

I agree; should've at least tested the change before raising a PR, but I would've just dropped the idea, so felt it best to share my findings at least.

fipso · on April 2, 2023

All good there were some good points in the PR which I will likely adapt into the script.

But when you wrote that you ported my 3 line bash script to rust I thought you were kidding.

cookiengineer · on April 1, 2023

I've had kind of the same problem in the past. For me I built a cookiejar textfile generating chrome extension, because it turns out most relevant tracking or session cookies are on external domains or oauth provider domains. [1]

You just need to copy/paste the generated text content to a cookies.txt and you're set, so it worked for my workflow in the terminal.

[1] https://github.com/cookiengineer/me-want-cookies

1vuio0pswjnm7 · on April 2, 2023

NB. Cannot enable remote debugging if using Chrome in Guest mode on ChromeOS. Why is left as question for the reader.

A more universal solution, one that does not require enabling websockets, is a localhost-bound forward proxy where HTTP traffic, including cookies, is saved in log files. No need to copy/paste with a GUI or mouse. Can use standard UNIX utilities to work with log files.

GauntletWizard · on April 2, 2023

I used to use an extension called "Get Cookies.txt" to do this. Then several more by similar names with the same codebase. I looked at the code myself and didn't see anything up, but they each got removed from the chrome store for security violations.

malborodog · on April 2, 2023

How could this underlying idea be implemented in a python script from a running selenium driver?