Did you bulk download the arxiv metadata, PDF and or LaTeX files?
I am trying to figure out what the required space is for just the most recent version of the PDF's.
I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.
I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.
Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html
Did you bulk download the arxiv metadata, PDF and or LaTeX files?
I am trying to figure out what the required space is for just the most recent version of the PDF's.
I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.
I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.
Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html