Practical Guide: How to Build a Web Scraper with Laravel Actions

In this walkthrough, we’ll build a simple web scraper from scratch

Hendrik Prinsloo
6 min readMar 26, 2023
Generated by Mage with Stable Diffusion 2.1

It is possible to scrape and parse almost any kind of data that is available online. This sample will show extracting data from JSON, which is probably the simplest. Some packages help parse other formats like HTML, XML, PDF, CSV, Excel, and even images with the help of AI.

In The Netherlands, there is a website (https://www.verlorenofgevonden.nl) that tracks lost and impounded bicycles. With this project, we will fetch the relevant data found on this site to do some analytics on the raw data.

Screenshot of the website

Getting started

If you are familiar with this part you can skip it. We’ll just follow the official documentation to create a Laravel project with Docker. I’m doing this on MacOS, so please refer to the documentation, as your instructions might be different for Linux or Windows.

Create the project

Navigate to your project directory and create a new project. It might take a while as it will pull and build the containers locally.

curl -s "https://laravel.build/laravel-scraper" | bash

Start the environment

Navigate to the new directory and start the environment. It is also advised to create an alias, see the docs.

cd laravel-scraper && sail up -d --build

# if you get a response like "command not found" you should
# learn to do as advised

Once the application’s Docker containers have been started, you can access the application in your web browser at http://localhost. You should see the default landing page.

Default landing page

Dependencies

We’ll use a few basic dependencies to help us get started.

Laravel Actions

For this project, I decided to experiment with Laravel Actions. It is not required but seems like the right tool for the job.

sail composer require lorisleiva/laravel-actions

Guzzle HTTP client

Laravel provides an expressive, minimal API around the Guzzle HTTP client, allowing you to quickly make outgoing HTTP requests to communicate with other web applications.

sail composer require guzzlehttp/guzzle

Laravel Pint (optional)

Laravel Pint is an opinionated PHP code-style fixer for minimalists. Pint is built on top of PHP-CS-Fixer and makes it simple to ensure that your code style stays clean and consistent.

sail composer require laravel/pint --dev

The core logic

Fetching data from a website is quite simple. You can follow some basic steps to apply the approach to your use case.

  1. Browse the site manually.
  2. Inspect the network log to identify how the website fetches the data, ideally, you will find some API requests to target directly.
  3. Replicate the requests manually with a tool like Postman to figure out the flow, parameters, and shape of the data.

The schema

For the sake of this guide, we’ll keep the schema simple by just creating a single Bicycle model for the data. Generate the model and migration in your terminal with the following command.

sail artisan make:model Bicycle --migration

File: app/Models/Bicycle.php

class Bicycle extends Model
{
protected $fillable = [
'object_number',
'type',
'sub_type',
'brand',
'color',
'description',
'city',
'storage_location',
'registered_at',
];

protected $casts = [
'registered_at' => 'datetime',
];
}

File: database/migrations/create_bicycles_table.php

return new class extends Migration
{
public function up(): void
{
Schema::create('bicycles', function (Blueprint $table) {
$table->id();
$table->string('object_number', 32)->unique();
$table->string('type', 32);
$table->string('sub_type', 32);
$table->string('brand', 32);
$table->string('color', 32);
$table->text('description');
$table->string('city', 32);
$table->string('storage_location', 64);
$table->dateTime('registered_at');
$table->timestamps();
});
}

public function down(): void
{
Schema::dropIfExists('bicycles');
}
};

Run the migration/s

sail artisan migrate

The scraper

We’ll create a single class with Laravel Actions to keep the required logic contained. The sample command will focus on fetching the most basic data. You can always extend this functionality and go as deep as you need to.

For example, you could extrapolate the unique “frame number” from the description to check how many times a single bicycle has been impounded.

File: app/Actions/Scrapers/FetchBicycles.php

class FetchBicycles
{
use AsAction;

public string $commandSignature = 'scraper:fetch-bicycles';

public string $commandDescription = 'Fetch impounded bicycles from www.verlorenofgevonden.nl';

public function asCommand(Command $command): void
{
$this->handle($command);
}

public function handle(Command $command): void
{
$this->fetch($command, now()->subMonth(), now());
}

private function fetch(
Command $command,
Carbon $dateFrom,
Carbon $dateTo,
int $from = 0
): void {
$response = Http::acceptJson()->get('https://verlorenofgevonden.nl/scripts/ez.php', [
'site' => 'nl',
'q' => 'fietsendepot',
'date_from' => $dateFrom->format('d-m-Y'),
'date_to' => $dateTo->format('d-m-Y'),
'timestamp' => now()->timestamp,
'from' => $from,
]);

$hits = collect($response->json('hits.hits'));
if ($hits->isEmpty()) {
$command->info('Done processing');

return;
}

$upserts = collect();
foreach ($hits as $hit) {
$registeredAt = Carbon::parse(data_get($hit, '_source.RegistrationDate'));
$upserts->push([
'object_number' => data_get($hit, '_source.ObjectNumber'),
'type' => data_get($hit, '_source.Category'),
'sub_type' => data_get($hit, '_source.SubCategory'),
'brand' => data_get($hit, '_source.Brand'),
'color' => data_get($hit, '_source.Color'),
'description' => data_get($hit, '_source.Description'),
'city' => data_get($hit, '_source.City'),
'storage_location' => data_get($hit, '_source.StorageLocation.Name'),
'registered_at' => $registeredAt,
]);
}

Bicycle::upsert($upserts->toArray(), ['object_number'], [
'type',
'sub_type',
'brand',
'color',
'description',
'city',
'storage_location',
'registered_at',
]);

$total = $from + $upserts->count();
$command->info(sprintf('Processed %d results', $total));
$this->fetch($command, $dateFrom, $dateTo, $total);
}
}

File: app/Console/Kernel.php

class Kernel extends ConsoleKernel
{
protected $commands = [
FetchBicycles::class,
];

// ...
}

Run the command

sail artisan scraper:fetch-bicycles
Sample of running the command
Sample of the data

Deeper analysis of historical data

After my bicycle was impounded for the second time in a few weeks, I was inspired to check the stats by scraping data from the site. To be fair, I parked my bicycle illegally on both occasions.

The findings were quite interesting and I generated the following charts from the scraped data. Note that these graphs were generated on 2022–10–01.

Impounds by city per quarter since 2018
Amsterdam versus grand total since 2021
Impounds by city per month
Estimated penalty fees of impounds since 2018

Idea: Bicycle auction app

I believe there is a gap in the market in The Netherlands with impounded bicycles. The website helps the community to find their lost or impounded bicycles and allows you to have them delivered. At a fee for the penalty and delivery.

Bicycle depot in Amsterdam — See the source

The bicycle depots in The Netherlands are filling up with thousands of bicycles. Many people never claim their impounded bicycles, causing a constant build-up of unwanted and unclaimed bikes.

Someone should create a simple application to put these abandoned bicycles up for auction. To fetch the data is quite simple, as proved in this guide. To close the loop allow the owners to set a minimum bid and/or deadline. The owner could then send the keys through the post to the buyer upon a successful bid, and the standard delivery process can be used to deliver the bicycle to the new owner.

--

--

Hendrik Prinsloo
Hendrik Prinsloo

Written by Hendrik Prinsloo

Full Stack Developer ● Toaster mechanic ● Technical sales advisor ● Forgotten password specialist ● Let-me-google-that-for-you expert

No responses yet