Fetching Tweets in Laravel With Python’s Social Networking Services Scraper
How to fetch data from Twitter without paying for a Twitter developer account
--
If you landed here looking for a guide on scraping tweets with Python, please check out Martin Beck’s guide below.
Introduction
I recently experimented with Laravel Actions and built basic commands to scrape posts from Twitter, Reddit, and a static website. You can find the open-source repository at the bottom.
Twitter decided to no longer provide free access to their API on February 9th, 2023. See Twitter remove free API access in the latest money-making quest.
I tried to apply for a developer account, and was denied. The basic tier costs $100 per month, which I think is ridiculous. Thanks Elon.
I couldn’t find a library in PHP that allows you to scrape data without an API key. If you know of such a library, please comment; or create one.
Why Python?
Python has an elegant library for scraping social networking services (SNS), called snscrape. It was initially created by JustAnotherArchivist, and allows you to scrape data from the most popular social media sites.
The following services are currently available
- Facebook: user profiles, groups, and communities (aka visitor posts)
- Instagram: user profiles, hashtags, and locations
- Mastodon: user profiles and toots (single or thread)
- Reddit: users, subreddits, and searches (via Pushshift)
- Telegram: channels
- Twitter: users, user profiles, hashtags, searches (live tweets, top tweets, and users), tweets (single or surrounding thread), list posts, communities, and trends
- VKontakte: user profiles
- Weibo (Sina Weibo): user profiles
Integrating Python with Laravel
I decided on a simple approach, as it was just an experiment. But I’m sure it should be sufficient for most use cases.
Overview of the flow
- Install the dependencies on the server
- Trigger a shell command via PHP to execute snscrape
- Save the results to a static file
- Parse the results to make sense of it
Install the dependencies
I recommend using Laravel Sail for this, but you should be able to configure it on any server environment. You only need to install Python v3 with the snscrape package.
The standard Dockerfile provided by Laravel Sail requires the following instructions.
RUN apt-get update \
&& apt-get install -y python3-pip \
&& pip3 install snscrape
The shell command
This might make most developers cringe, as ideally, you don’t want to allow anything running on your server to execute raw shell commands.
But sometimes, you need to compromise. Consider this approach carefully, especially when implementing it in a production environment.
I created a simple static utility class for this, inspired by Bertug Korucu’s post about executing shell commands in Laravel.
namespace App\Utilities;
// uses ...
class ShellCommand
{
public static function execute($cmd): string
{
$process = Process::fromShellCommandline($cmd);
$processOutput = '';
$captureOutput = function ($type, $line) use (&$processOutput) {
$processOutput .= $line;
};
$process->setTimeout(null)->run($captureOutput);
if ($process->getExitCode()) {
$exception = new Exception($cmd.' - '.$processOutput);
report($exception);
throw $exception;
}
return $processOutput;
}
}
Fetching the tweets
From this point, it was simple to relay the command via the shell to fetch the relevant tweets and save them to a static file to be processed.
I started by experimenting with calling the command directly.
~/path/to/project > ./vendor/bin/sail shell
sail@guid:/var/www/html$ snscrape \
--jsonl \
--progress \
--max-results 1 \
twitter-search "#php #laravel since:2023-04-01" \
| python3 -m json.tool
Integrating the command with Laravel
For some reason, Twitter’s developers decided not to allow ordering results with advanced search. This forced me to group the results recursively from most popular (based on likes) to least popular for the current month.
The results of each run get appended in sequence to a static log file in the storage before being processed.
Note: I delayed each request by 60 seconds to prevent rate limiting.
namespace App\Actions\Scrapers;
// uses ...
class FetchTwitterPosts
{
// ...
private function fetchTweets(int $minFaves = 200): Collection
{
$this->command->info(sprintf(
'Fetching tweets with minimum likes of %d',
$minFaves
));
$filepathLogAbs = Storage::path($this->filepathLog);
$commandArgs = [
'snscrape',
'--jsonl',
'--progress',
'--max-results 100',
sprintf(
'twitter-search "#php #laravel since:%s until:%s min_faves:%d"',
now()->startOfMonth()->format('Y-m-d'),
now()->endOfMonth()->format('Y-m-d'),
$minFaves
),
];
$command = sprintf(
'%s >> %s',
implode(' ', $commandArgs),
$filepathLogAbs
);
$this->command->info(sprintf('Executing: %s', $command));
ShellCommand::execute($command);
if ($minFaves > 1) {
sleep(60);
if ($minFaves > 100) {
$minFaves -= 100;
} else {
$minFaves -= 10;
}
return $this->fetchTweets($minFaves);
}
$tweets = collect();
$lines = Storage::get($this->filepathLog);
foreach (explode("\n", $lines) as $line) {
if (empty($line)) {
continue;
}
$tweets->push(json_decode($line, true));
}
return $tweets;
}
// ...
}
Check out and feel free to contribute to the repository below.