Skip to content
Snippets Groups Projects
Commit b4a780cf authored by benoit-cty's avatar benoit-cty
Browse files

Installation doc

parent 80de9a0f
Branches master
No related tags found
No related merge requests found
# Tweet Archiveur
> This project aim at storing the tweet of all members of the French Parliament.
> This project aim at storing tweets in a database. But you could use it without database.
The goal is to use tweets to get an idea of the topics of the tweets using NLP.
- Input : tweetos id in a CSV file
- Output : A databases of tweets and hastags
## Install
The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.
But you could use the project for other purpose with other people.
## How to install the package
TODO : push it to Pipy when :
- Rename "nom" to name in users
......@@ -16,7 +21,7 @@ TODO : push it to Pipy when :
`pip install tweetarchiveur`
## How to use
## How to use the package in your project
There is two class :
- A Scrapper() to use the Twitter API
......@@ -51,6 +56,49 @@ del scrapper
2021-03-22 10:22:03,915 - tweet-archiveur INFO Done scrapping, we got 400 tweets from 2 tweetos.
## How we use it
We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.
We then explore them with Apache Superset.
### How we deploy it
Prepare the environment :
```sh
git clone https://github.com/leximpact/tweet-archiveur.git
cd tweet-archiveur
cp docker/docker.env .env
```
Edit the _.env_ to your needs.
Run the application :
```sh
docker-compose up -d
```
To view what's going on :
```sh
docker logs tweet-archiveur_tweet_archiveur_1 -f
```
The script _archiveur.py_ use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires
The parameters is read in a _.env_ file.
It is launched by the _entrypoint.sh_ script every 8 hours.
To stop it :
```sh
docker-compose down
```
The data is kept in a docker volume, to clean them :
```sh
docker-compose down -v
```
## What to do with it ?
- Most used hashtag (per period, per person)
......
......@@ -6,8 +6,8 @@ title: Tweet Archiveur
keywords: fastai
sidebar: home_sidebar
summary: "This project aim at storing the tweet of all members of the French Parliament."
description: "This project aim at storing the tweet of all members of the French Parliament."
summary: "This project aim at storing tweets in a database. But you could use it without database."
description: "This project aim at storing tweets in a database. But you could use it without database."
nb_path: "notebooks/index.ipynb"
---
<!--
......@@ -31,14 +31,19 @@ nb_path: "notebooks/index.ipynb"
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The goal is to use tweets to get an idea of the topics of the tweets using NLP.</p>
<ul>
<li>Input : tweetos id in a CSV file</li>
<li>Output : A databases of tweets and hastags</li>
</ul>
<p>The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.</p>
<p>But you could use the project for other purpose with other people.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Install">Install<a class="anchor-link" href="#Install"> </a></h2><p>TODO : push it to Pipy when :</p>
<h2 id="How-to-install-the-package">How to install the package<a class="anchor-link" href="#How-to-install-the-package"> </a></h2><p>TODO : push it to Pipy when :</p>
<ul>
<li>Rename "nom" to name in users</li>
<li>reactivate unit tests (<a href="https://docs.github.com/en/actions/guides/creating-postgresql-service-containers">https://docs.github.com/en/actions/guides/creating-postgresql-service-containers</a>)</li>
......@@ -60,7 +65,7 @@ nb_path: "notebooks/index.ipynb"
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="How-to-use">How to use<a class="anchor-link" href="#How-to-use"> </a></h2>
<h2 id="How-to-use-the-package-in-your-project">How to use the package in your project<a class="anchor-link" href="#How-to-use-the-package-in-your-project"> </a></h2>
</div>
</div>
</div>
......@@ -82,11 +87,11 @@ nb_path: "notebooks/index.ipynb"
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">tweet_archiveur.scrapper</span> <span class="kn">import</span> <span class="n">Scrapper</span>
<span class="kn">from</span> <span class="nn">tweet_archiveur.database</span> <span class="kn">import</span> <span class="n">Database</span>
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">tweet_archiveur.scrapper</span> <span class="k">import</span> <span class="n">Scrapper</span>
<span class="kn">from</span> <span class="nn">tweet_archiveur.database</span> <span class="k">import</span> <span class="n">Database</span>
<span class="c1"># Force some variable outside Docker</span>
<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">environ</span>
<span class="kn">from</span> <span class="nn">os</span> <span class="k">import</span> <span class="n">environ</span>
<span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_PORT&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;8479&#39;</span>
<span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_HOST&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;localhost&#39;</span>
<span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_USER&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;tweet_archiveur_user&#39;</span>
......@@ -128,6 +133,35 @@ nb_path: "notebooks/index.ipynb"
</div>
{% endraw %}
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="How-we-use-it">How we use it<a class="anchor-link" href="#How-we-use-it"> </a></h2><p>We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.</p>
<p>We then explore them with Apache Superset.</p>
<h3 id="How-we-deploy-it">How we deploy it<a class="anchor-link" href="#How-we-deploy-it"> </a></h3><p>Prepare the environment :</p>
<div class="highlight"><pre><span></span>git clone https://github.com/leximpact/tweet-archiveur.git
<span class="nb">cd</span> tweet-archiveur
cp docker/docker.env .env
</pre></div>
<p>Edit the <em>.env</em> to your needs.</p>
<p>Run the application :</p>
<div class="highlight"><pre><span></span>docker-compose up -d
</pre></div>
<p>To view what's going on :</p>
<div class="highlight"><pre><span></span>docker logs tweet-archiveur_tweet_archiveur_1 -f
</pre></div>
<p>The script <em>archiveur.py</em> use the package to get the parliament accounts from <a href="https://github.com/regardscitoyens/twitter-parlementaires">https://github.com/regardscitoyens/twitter-parlementaires</a></p>
<p>The parameters is read in a <em>.env</em> file.</p>
<p>It is launched by the <em>entrypoint.sh</em> script every 8 hours.</p>
<p>To stop it :</p>
<div class="highlight"><pre><span></span>docker-compose down
</pre></div>
<p>The data is kept in a docker volume, to clean them :</p>
<div class="highlight"><pre><span></span>docker-compose down -v
</pre></div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="What-to-do-with-it-?">What to do with it ?<a class="anchor-link" href="#What-to-do-with-it-?"> </a></h2>
......
%% Cell type:code id: tags:
``` python
#hide
```
%% Cell type:markdown id: tags:
# Tweet Archiveur
> This project aim at storing the tweet of all members of the French Parliament.
> This project aim at storing tweets in a database. But you could use it without database.
%% Cell type:markdown id: tags:
The goal is to use tweets to get an idea of the topics of the tweets using NLP.
- Input : tweetos id in a CSV file
- Output : A databases of tweets and hastags
The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.
But you could use the project for other purpose with other people.
%% Cell type:markdown id: tags:
## Install
## How to install the package
TODO : push it to Pipy when :
- Rename "nom" to name in users
- reactivate unit tests (https://docs.github.com/en/actions/guides/creating-postgresql-service-containers)
- Made scrapper a Class
- Switch to SQL Alchemy
- Flake8
- Documentation
%% Cell type:markdown id: tags:
`pip install tweetarchiveur`
%% Cell type:markdown id: tags:
## How to use
## How to use the package in your project
%% Cell type:markdown id: tags:
There is two class :
- A Scrapper() to use the Twitter API
- A Database() to store tweets and hastags in it
%% Cell type:code id: tags:
``` python
from tweet_archiveur.scrapper import Scrapper
from tweet_archiveur.database import Database
# Force some variable outside Docker
from os import environ
environ["DATABASE_PORT"] = '8479'
environ["DATABASE_HOST"] = 'localhost'
environ["DATABASE_USER"] = 'tweet_archiveur_user'
environ["DATABASE_PASS"] = '1234leximpact'
environ["DATABASE_NAME"] = 'tweet_archiveur'
scrapper = Scrapper()
df_users = scrapper.get_users_accounts('../tests/sample-users.csv')
users_id = df_users.twitter_id.tolist()
database = Database()
database.create_tables_if_not_exist()
database.insert_twitter_users(df_users)
scrapper.get_all_tweet_and_store_them(database, users_id[0:2])
del database
del scrapper
```
%% Output
2021-03-22 10:21:59,837 - tweet-archiveur INFO Scrapper ready
2021-03-22 10:21:59,841 - tweet-archiveur INFO Loading database module...
2021-03-22 10:21:59,842 - tweet-archiveur DEBUG DEBUG : connect(user=tweet_archiveur_user, password=XXXX, host=localhost, port=8479, database=tweet_archiveur, url=None)
2021-03-22 10:22:03,915 - tweet-archiveur INFO Done scrapping, we got 400 tweets from 2 tweetos.
%% Cell type:markdown id: tags:
## How we use it
We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.
We then explore them with Apache Superset.
### How we deploy it
Prepare the environment :
```sh
git clone https://github.com/leximpact/tweet-archiveur.git
cd tweet-archiveur
cp docker/docker.env .env
```
Edit the _.env_ to your needs.
Run the application :
```sh
docker-compose up -d
```
To view what's going on :
```sh
docker logs tweet-archiveur_tweet_archiveur_1 -f
```
The script _archiveur.py_ use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires
The parameters is read in a _.env_ file.
It is launched by the _entrypoint.sh_ script every 8 hours.
To stop it :
```sh
docker-compose down
```
The data is kept in a docker volume, to clean them :
```sh
docker-compose down -v
```
%% Cell type:markdown id: tags:
## What to do with it ?
%% Cell type:markdown id: tags:
- Most used hashtag (per period, per person)
- Most/Less active user
- Timeline of
- NLP Topic detection
- Word cloud
%% Cell type:markdown id: tags:
# Annexes
Exit code :
- 1 : Unknown error when storing tweets
- 2 : Unknown error getting tweets
- 3 : Failed more than 3 consecutive times
- 4 : no env
%% Cell type:markdown id: tags:
If one thing fail no tweet will be saved.
status code = 429 : 429 'Too many requests' error is returned when you exceed the maximum number of requests allowed
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment