End-to-end configuration
Sometimes it is necessary to integrate various stages of Querido Diário, especially when developing frontend solutions that require data that is not yet in production.
Since QD uses podman to manage its infrastructure, orchestrating the stages consists of adjusting environment variables and ensuring that the containers are communicating properly.
Attention
This documentation has been primarily tested on Linux environments. Windows WSL or MacOS environments may have some differences (contributions are welcome!).
Starting the setup with data processing
Since the resources (database, search engine, etc.) used by QD locally will be shared by different stages, the data processing project is a good starting point for setup as it uses several of these resources in its processing tasks.
To set it up, we will follow these steps:
Install podman (if not already installed);
Run:
make buildRun (
FULL_PROJECT=trueallows additional ports to be opened by the pod that will be created):FULL_PROJECT=true make setup
Tip
If this is your first time running the project, the environment variables will be
created in the envvars file, at the root of the repository. Here you will find
credentials for the object storage (Minio), for example.
Warning
When running the make setup command, a bind: address already in use error
may appear. In this case, check if another service on your computer is using the
indicated port and stop it (or modify the environment variables and/or Makefile if
you know what you’re doing).
If the port is being used by podman itself due to an execution error, you can
terminate the program with the command (using port 8000 as an example)
sudo kill -9 $(sudo lsof -t -i:8000).
After terminating, run make setup again.
Now the pod has been created, and within it, several resources like Opensearch, Postgres, and Minio are running. However, they are still “empty.” Let’s populate them using the spiders repository.
Generating data with spiders
We want to run spiders to populate our database with official gazettes for processing. To do this, we just need to configure the following in the spiders repository:
Copy
data_collection/.local.envtodata_collection/.env;Set up the development environment and run spiders normally, as indicated in its documentation.
Processing the scraped documents
Now that we have the object storage and some Postgres database tables populated, let’s run the main data processing pipeline to generate TXT files and populate the search engine:
Run in the data processing repository:
make re-run
Note
Notice that make re-run is being executed here, and not make run, because
make run executes make setup, and if make setup is run, all resources
will be destroyed and rebuilt, nullifying the scraping that was done.
Now we have files, tables, and indices populated. We can enable the API.
Enabling the API
Run, in the API repository:
make buildRun:
make re-run
Note
For the same reason as in data processing, make re-run is being executed here,
and not make run.
With the API available, we can start the local backend.
Enabling the backend
To handle Querido Diário: Technologies in Education, the backend needs to be set up as follows:
Set up the development environment as indicated in the documentation;
Create a superuser account as requested.
With the backend available, the frontend that uses local API and backend can also be configured.
Enabling the frontend
Finally, we’ve reached the other end of the QD architecture, the frontend! Here we’ll do the following:
Set up the development environment as indicated in the documentation;
Apply this patch in the repository:
./src/app/constants.ts - export const API = 'https://api.queridodiario.ok.org.br'; + export const API = 'http://localhost:8080'; ./src/app/services/utils/index.ts - export const educationApi = 'https://backend-api.api.queridodiario.ok.org.br/'; + export const educationApi = 'http://localhost:8000/api/';
Done! Now the entire environment is configured 🎉
Environment usage tips
Some useful ways to use the development environment:
Want to access the postgres database to see official gazette records, backend users, etc.?
Run
make shell-databasein the querido-diario-data-processing repository and you’ll be in thepsqlcommand line.Want to access the search engine to see textual indices of gazettes and thematic excerpts?
Run:
curl -k -u admin:admin -X GET "localhost:9200/_cat/indices?v&pretty=true
Other endpoints will work similarly according to the Opensearch documentation.
Want to access the file system to see where they were downloaded?
Go to localhost:9000 with the credentials found in the
envvarsof the querido-diario-data-processing repository.Want to download more gazette files and process them?
Run another
scrapy crawlin the querido-diario repository and then runmake re-runin querido-diario-data-processing again.In the frontend, live reload is enabled, but not in the API and backend. How to check changes?
In the API, run
make re-runagain. In the backend, run:python -m cli setup -- pod-name querido-diario
How to access the API documentation?
Go to 0.0.0.0:8080/docs.
Tip
If
0.0.0.0doesn’t work, use localhost:8080/docs.How to access the backend admin panel?
Go to 0.0.0.0:8000/api/admin with the superuser credentials created earlier.
Tip
If
0.0.0.0doesn’t work, use localhost:8000/api/admin.