Installation doc

b4a780cf · benoit-cty · 80de9a0f · b4a780cf · b4a780cf · b4a780cf
Commit b4a780cf authored 4 years ago by benoit-cty
--- a/README.md
+++ b/README.md
 # Tweet Archiveur
-> This project aim at storing the tweet of all members of the French Parliament.
+> This project aim at storing tweets in a database. But you could use it without database.


-The goal is to use tweets to get an idea of the topics of the tweets using NLP.
+- Input : tweetos id in a CSV file
+- Output : A databases of tweets and hastags

-## Install
+The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.
+
+But you could use the project for other purpose with other people.
+
+## How to install the package

 TODO : push it to Pipy when :
 - Rename "nom" to name in users
@@ -16,7 +21,7 @@ TODO : push it to Pipy when :

 `pip install tweetarchiveur`

-## How to use
+## How to use the package in your project

 There is two class :
 - A Scrapper() to use the Twitter API
@@ -51,6 +56,49 @@ del scrapper
    2021-03-22 10:22:03,915 -  tweet-archiveur INFO     Done scrapping, we got 400 tweets from 2 tweetos.


+## How we use it
+
+We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.
+
+We then explore them with Apache Superset.
+
+### How we deploy it
+
+Prepare the environment :
+```sh
+git clone https://github.com/leximpact/tweet-archiveur.git
+cd tweet-archiveur
+cp docker/docker.env .env
+```
+
+Edit the _.env_ to your needs.
+
+Run the application :
+```sh
+docker-compose up -d
+```
+
+To view what's going on :
+```sh
+docker logs tweet-archiveur_tweet_archiveur_1 -f
+```
+
+The script _archiveur.py_ use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires
+
+The parameters is read in a _.env_ file.
+
+It is launched by the _entrypoint.sh_ script every 8 hours.
+
+To stop it :
+```sh
+docker-compose down
+```
+
+The data is kept in a docker volume, to clean them :
+```sh
+docker-compose down -v
+```
+
 ## What to do with it ?

 - Most used hashtag (per period, per person)

--- a/docs/index.html
+++ b/docs/index.html
@@ -6,8 +6,8 @@ title: Tweet Archiveur
 keywords: fastai
 sidebar: home_sidebar

-summary: "This project aim at storing the tweet of all members of the French Parliament."
-description: "This project aim at storing the tweet of all members of the French Parliament."
+summary: "This project aim at storing tweets in a database. But you could use it without database."
+description: "This project aim at storing tweets in a database. But you could use it without database."
 nb_path: "notebooks/index.ipynb"
 ---
 <!--
@@ -31,14 +31,19 @@ nb_path: "notebooks/index.ipynb"

 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<p>The goal is to use tweets to get an idea of the topics of the tweets using NLP.</p>
+<ul>
+<li>Input : tweetos id in a CSV file</li>
+<li>Output : A databases of tweets and hastags</li>
+</ul>
+<p>The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.</p>
+<p>But you could use the project for other purpose with other people.</p>

 </div>
 </div>
 </div>
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<h2 id="Install">Install<a class="anchor-link" href="#Install"> </a></h2><p>TODO : push it to Pipy when :</p>
+<h2 id="How-to-install-the-package">How to install the package<a class="anchor-link" href="#How-to-install-the-package"> </a></h2><p>TODO : push it to Pipy when :</p>
 <ul>
 <li>Rename "nom" to name in users</li>
 <li>reactivate unit tests (<a href="https://docs.github.com/en/actions/guides/creating-postgresql-service-containers">https://docs.github.com/en/actions/guides/creating-postgresql-service-containers</a>)</li>
@@ -60,7 +65,7 @@ nb_path: "notebooks/index.ipynb"
 </div>
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<h2 id="How-to-use">How to use<a class="anchor-link" href="#How-to-use"> </a></h2>
+<h2 id="How-to-use-the-package-in-your-project">How to use the package in your project<a class="anchor-link" href="#How-to-use-the-package-in-your-project"> </a></h2>
 </div>
 </div>
 </div>
@@ -82,11 +87,11 @@ nb_path: "notebooks/index.ipynb"

 <div class="inner_cell">
    <div class="input_area">
-<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">tweet_archiveur.scrapper</span> <span class="kn">import</span> <span class="n">Scrapper</span>
-<span class="kn">from</span> <span class="nn">tweet_archiveur.database</span> <span class="kn">import</span> <span class="n">Database</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">tweet_archiveur.scrapper</span> <span class="k">import</span> <span class="n">Scrapper</span>
+<span class="kn">from</span> <span class="nn">tweet_archiveur.database</span> <span class="k">import</span> <span class="n">Database</span>

 <span class="c1"># Force some variable outside Docker</span>
-<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">environ</span>
+<span class="kn">from</span> <span class="nn">os</span> <span class="k">import</span> <span class="n">environ</span>
 <span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_PORT&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;8479&#39;</span>
 <span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_HOST&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;localhost&#39;</span>
 <span class="n">environ</span><span class="p">[</span><span class="s2">&quot;DATABASE_USER&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;tweet_archiveur_user&#39;</span>
@@ -128,6 +133,35 @@ nb_path: "notebooks/index.ipynb"
 </div>
    {% endraw %}

+<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
+<div class="text_cell_render border-box-sizing rendered_html">
+<h2 id="How-we-use-it">How we use it<a class="anchor-link" href="#How-we-use-it"> </a></h2><p>We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.</p>
+<p>We then explore them with Apache Superset.</p>
+<h3 id="How-we-deploy-it">How we deploy it<a class="anchor-link" href="#How-we-deploy-it"> </a></h3><p>Prepare the environment :</p>
+<div class="highlight"><pre><span></span>git clone https://github.com/leximpact/tweet-archiveur.git
+<span class="nb">cd</span> tweet-archiveur
+cp docker/docker.env .env
+</pre></div>
+<p>Edit the <em>.env</em> to your needs.</p>
+<p>Run the application :</p>
+<div class="highlight"><pre><span></span>docker-compose up -d
+</pre></div>
+<p>To view what's going on :</p>
+<div class="highlight"><pre><span></span>docker logs tweet-archiveur_tweet_archiveur_1 -f
+</pre></div>
+<p>The script <em>archiveur.py</em> use the package to get the parliament accounts from <a href="https://github.com/regardscitoyens/twitter-parlementaires">https://github.com/regardscitoyens/twitter-parlementaires</a></p>
+<p>The parameters is read in a <em>.env</em> file.</p>
+<p>It is launched by the <em>entrypoint.sh</em> script every 8 hours.</p>
+<p>To stop it :</p>
+<div class="highlight"><pre><span></span>docker-compose down
+</pre></div>
+<p>The data is kept in a docker volume, to clean them :</p>
+<div class="highlight"><pre><span></span>docker-compose down -v
+</pre></div>
+
+</div>
+</div>
+</div>
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
 <h2 id="What-to-do-with-it-?">What to do with it ?<a class="anchor-link" href="#What-to-do-with-it-?"> </a></h2>

--- a/notebooks/index.ipynb
+++ b/notebooks/index.ipynb
@@ -15,21 +15,26 @@
   "source": [
    "# Tweet Archiveur\n",
    "\n",
-    "> This project aim at storing the tweet of all members of the French Parliament."
+    "> This project aim at storing tweets in a database. But you could use it without database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The goal is to use tweets to get an idea of the topics of the tweets using NLP."
+    "- Input : tweetos id in a CSV file\n",
+    "- Output : A databases of tweets and hastags\n",
+    "\n",
+    "The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.\n",
+    "\n",
+    "But you could use the project for other purpose with other people."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Install\n",
+    "## How to install the package\n",
    "\n",
    "TODO : push it to Pipy when :\n",
    "- Rename \"nom\" to name in users\n",
@@ -51,7 +56,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## How to use"
+    "## How to use the package in your project"
   ]
  },
  {
@@ -102,6 +107,54 @@
    "del scrapper"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How we use it\n",
+    "\n",
+    "We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.\n",
+    "\n",
+    "We then explore them with Apache Superset.\n",
+    "\n",
+    "### How we deploy it\n",
+    "\n",
+    "Prepare the environment :\n",
+    "```sh\n",
+    "git clone https://github.com/leximpact/tweet-archiveur.git\n",
+    "cd tweet-archiveur\n",
+    "cp docker/docker.env .env\n",
+    "```\n",
+    "\n",
+    "Edit the _.env_ to your needs.\n",
+    "\n",
+    "Run the application :\n",
+    "```sh\n",
+    "docker-compose up -d\n",
+    "```\n",
+    "\n",
+    "To view what's going on :\n",
+    "```sh\n",
+    "docker logs tweet-archiveur_tweet_archiveur_1 -f\n",
+    "```\n",
+    "\n",
+    "The script _archiveur.py_ use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires\n",
+    "\n",
+    "The parameters is read in a _.env_ file.\n",
+    "\n",
+    "It is launched by the _entrypoint.sh_ script every 8 hours.\n",
+    "\n",
+    "To stop it :\n",
+    "```sh\n",
+    "docker-compose down\n",
+    "```\n",
+    "\n",
+    "The data is kept in a docker volume, to clean them :\n",
+    "```sh\n",
+    "docker-compose down -v\n",
+    "```"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},

 %% Cell type:code id: tags:

 ``` python
 #hide
 ```

 %% Cell type:markdown id: tags:

 # Tweet Archiveur

-> This project aim at storing the tweet of all members of the French Parliament.
+> This project aim at storing tweets in a database. But you could use it without database.

 %% Cell type:markdown id: tags:

-The goal is to use tweets to get an idea of the topics of the tweets using NLP.
+- Input : tweetos id in a CSV file
+- Output : A databases of tweets and hastags
+
+The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.
+
+But you could use the project for other purpose with other people.

 %% Cell type:markdown id: tags:

-## Install
+## How to install the package

 TODO : push it to Pipy when :
 - Rename "nom" to name in users
 - reactivate unit tests (https://docs.github.com/en/actions/guides/creating-postgresql-service-containers)
 - Made scrapper a Class
 - Switch to SQL Alchemy
 - Flake8
 - Documentation

 %% Cell type:markdown id: tags:

 `pip install tweetarchiveur`

 %% Cell type:markdown id: tags:

-## How to use
+## How to use the package in your project

 %% Cell type:markdown id: tags:

 There is two class :
 - A Scrapper() to use the Twitter API
 - A Database() to store tweets and hastags in it

 %% Cell type:code id: tags:

 ``` python
 from tweet_archiveur.scrapper import Scrapper
 from tweet_archiveur.database import Database

 # Force some variable outside Docker
 from os import environ
 environ["DATABASE_PORT"] = '8479'
 environ["DATABASE_HOST"] = 'localhost'
 environ["DATABASE_USER"] = 'tweet_archiveur_user'
 environ["DATABASE_PASS"] = '1234leximpact'
 environ["DATABASE_NAME"] = 'tweet_archiveur'

 scrapper = Scrapper()
 df_users = scrapper.get_users_accounts('../tests/sample-users.csv')
 users_id = df_users.twitter_id.tolist()
 database = Database()
 database.create_tables_if_not_exist()
 database.insert_twitter_users(df_users)
 scrapper.get_all_tweet_and_store_them(database, users_id[0:2])
 del database
 del scrapper
 ```

 %% Output

    2021-03-22 10:21:59,837 -  tweet-archiveur INFO     Scrapper ready
    2021-03-22 10:21:59,841 -  tweet-archiveur INFO     Loading database module...
    2021-03-22 10:21:59,842 -  tweet-archiveur DEBUG    DEBUG : connect(user=tweet_archiveur_user, password=XXXX, host=localhost, port=8479, database=tweet_archiveur, url=None)
    2021-03-22 10:22:03,915 -  tweet-archiveur INFO     Done scrapping, we got 400 tweets from 2 tweetos.

 %% Cell type:markdown id: tags:

+## How we use it
+
+We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.
+
+We then explore them with Apache Superset.
+
+### How we deploy it
+
+Prepare the environment :
+```sh
+git clone https://github.com/leximpact/tweet-archiveur.git
+cd tweet-archiveur
+cp docker/docker.env .env
+```
+
+Edit the _.env_ to your needs.
+
+Run the application :
+```sh
+docker-compose up -d
+```
+
+To view what's going on :
+```sh
+docker logs tweet-archiveur_tweet_archiveur_1 -f
+```
+
+The script _archiveur.py_ use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires
+
+The parameters is read in a _.env_ file.
+
+It is launched by the _entrypoint.sh_ script every 8 hours.
+
+To stop it :
+```sh
+docker-compose down
+```
+
+The data is kept in a docker volume, to clean them :
+```sh
+docker-compose down -v
+```
+
+%% Cell type:markdown id: tags:
+
 ## What to do with it ?

 %% Cell type:markdown id: tags:

 - Most used hashtag (per period, per person)
 - Most/Less active user
 - Timeline of
 - NLP Topic detection
 - Word cloud

 %% Cell type:markdown id: tags:

 # Annexes

 Exit code :
 - 1 : Unknown error when storing tweets
 - 2 : Unknown error getting tweets
 - 3 : Failed more than 3 consecutive times
 - 4 : no env

 %% Cell type:markdown id: tags:

 If one thing fail no tweet will be saved.

 status code = 429 : 429 'Too many requests' error is returned when you exceed the maximum number of requests allowed

 %% Cell type:code id: tags:

 ``` python
 ```