doc

d9bc140f · Benoît Courty · ccbe23b0 · d9bc140f
Commit d9bc140f authored 1 month ago by Benoît Courty
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
+# Contribution au projet
+
+
+<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
+
+## Pré-requis
+
+Il faut installer python3-venv, curl, make et git avant Poetry :
+
+    sudo apt-get install -y curl make git python3-venv
+
+### Poetry
+
+``` bash
+curl -sSL https://install.python-poetry.org | python3 -
+```
+
+Ajouter la commande suivante dans le .bashrc :
+
+``` bash
+export PATH="$HOME/.local/bin:$PATH"
+```
+
+#### Specifier la version de Python à Poetry : python 3.10
+
+Si on a une version de python \> 3.10 on peut utiliser pyenv pour
+spécifier la version de python à utiliser en local sur le dossier :
+
+    pyenv local 3.10
+
+    poetry env use 3.10
+
+#### Installation des dépendances
+
+``` bash
+poetry config virtualenvs.in-project true
+poetry install
+```
+
+`poetry config virtualenvs.in-project true` permet d’installer
+l’environnement comme un sous-dossier du projet plutôt que dans le home.
+C’est recommandé pour que VSCode trouve l’environnement.
+
+Pour développer la pipeline, il faut des packages supplémentaires :
+
+``` bash
+poetry install --extras "pipeline"
+```
+
+#### Debug Poetry
+
+Pour supprimer un environnement :
+https://python-poetry.org/docs/managing-environments/
+
+    poetry env list
+    poetry env remove 3.7
+
+Pour nettoyer tout
+
+    rm poetry.lock 
+    poetry env list
+    poetry env remove leximpact-prepare-data-0Rkp9wuO-py3.8
+    poetry cache clear --all pypi
+    poetry env use -vvv 3.8
+    poetry install
+
+Pour afficher l’arbre des dépendances:
+
+     poetry show --tree 
+
+### En cas de problèmes d’install:
+
+    rm poetry.lock
+
+Pour supprimer un environnement :
+https://python-poetry.org/docs/managing-environments/
+
+# How to develop
+
+## Lien sécurisé vers l’ERFS-FPR
+
+To use hosted protected data with local algorithm:
+
+    sudo mkdir -p /mnt/data-in /mnt/data-out
+    sudo chown $USER:$USER /mnt/data-*
+    sshfs ysabell:/data/private-data/input /mnt/data-in
+    sshfs ysabell:/data/private-data/output /mnt/data-out
+
+as local \$USER and where `ysabell` is defined in local `~/.ssh/config`.
+
+## Create symlink
+
+``` python
+!ln -s ../leximpact_prepare_data
+!cd analyses && ln -s ../../leximpact_prepare_data
+!cd extractions_base_des_impots && ln -s ../../leximpact_prepare_data
+!cd retraitement_erfs-fpr && ln -s ../../leximpact_prepare_data
+```
+
+## Update package to last version
+
+``` bash
+poetry update
+```
+
+## Jupyter
+
+First time, and after adding a librairy :
+
+`poetry run python -m ipykernel install --name leximpact-prepare-data-kernel --user`
+
+### Launch jupyter
+
+``` bash
+poetry run jupyter lab
+```
+
+## Check style
+
+``` bash
+make precommit
+```
+
+### Update precommit
+
+A faire de temps en temps pour rester à jour:
+
+``` bash
+poetry run pre-commit autoupdate
+```
+
+## NBDev
+
+Run pre-commit before converting notebooks
+`poetry run pre-commit  run --all-files`
+
+Build lib from notebook `poetry run nbdev_build_lib`
+
+Build docs from notebook `poetry run nbdev_build_docs`
+
+Re-run pre-commit `poetry run pre-commit  run --all-files`
+
+``` python
+# Pour formater automatiquement le code (voir l'entrée precommit dans Makefile pour le détail)
+!make precommit
+```
+
+``` python
+# Build docs from notebookµ
+#!poetry run nbdev_build_docs
+!cd .. && make docs
+```
+
+# How we build the docs
+
+The documentation is available at
+https://documentation.leximpact.dev/leximpact_prepare_data/
+
+It’s build with [NBDev](https://github.com/fastai/nbdev) in the GitLab
+CI.
+
+We do it like this: - Use Poetry env for default environnnement - Use
+venv for specific env to remove notebook output, because
+`--clear-output` do not work with nbconvert \< 6 that is needed by other
+dependencies. We do it to avoid publishing sensitive data. We have to
+find a better way to publish outputs without sensitive data. - Build the
+docs with `poetry run nbdev_docs`.
+
+Then we copy the docs via `scp` to our server and serve them with Nginx.
+
+*Since NBDev v2 the doc is a pure static site.*
+
+*After upgrading NBDev, do not forget to upgrade Quarto* with:
+`curl -LO https://www.quarto.org/download/latest/quarto-linux-amd64.deb && dpkg -i quarto-linux-amd64.deb`
+
+The CI also push the doc to a branch. To do it we need a token from
+https://git.leximpact.dev/admin/users/project_18_bot/impersonation_tokens
+to be put in the CI variable `API_TOKEN`.
+
+## Test de la doc en local
+
+`poetry run nbdev_preview`
+
+## Anaconda sur CASD
+
+### Construction du paquet
+
+    docker run -i -t -v $PWD:/src continuumio/miniconda3 /bin/bash
+    cd /src
+    python3 gitlab-ci/src/get_pypi_info.py -p leximpact-prepare-data
+    conda install -y conda-build anaconda-client
+    conda config --set anaconda_upload yes
+    conda build -c conda-forge -c leximpact -c openfisca .conda
+
+Pour faire l’upload:
+
+    anaconda login
+    anaconda upload \
+        /opt/conda/conda-bld/noarch/leximpact-prepare-data-0.0.8-py_0.tar.bz2 \
+        /opt/conda/conda-bld/noarch/leximpact-prepare-data-casd-0.0.8-py_0.tar.bz2 \
+        /opt/conda/conda-bld/noarch/leximpact-prepare-data-dev-0.0.8-py_0.tar.bz2
+
+### Test en local
+
+Installer le paquet dans un environnement propre:
+
+    mkdir -p casd-test
+    cd casd-test
+    git clone https://git.leximpact.dev/leximpact/simulateur-socio-fiscal/budget/leximpact-prepare-data.git
+    rm -r  ./conda-env
+    conda create  --prefix ./conda-env python=3.8
+    conda activate ./conda-env
+    conda config --add channels conda-forge
+    conda config --set channel_priority strict
+    conda install -c conda-forge -c openfisca -c leximpact leximpact-prepare-data-casd
+    ipython kernel install --user --name=prepare-data-conda-env
+
+Pour vérifier que tout a fonctionné:
+
+    jupyter lab
+
+Puis ouvrir le fichier
+`leximpact-prepare-data/notebook/extractions_base_des_impots/test_install.ipynb`
+et l’exécuter.
+
+Pour sortir de l’environnement
+
+    conda deactivate
+
+# Initialisation de la base ERFS-FPR
+
+Nous recevons de l’INSEE des fichiers SAS concernant des ménages.
+
+Or nous avons besoin de foyers fiscaux pour nos traitements.
+
+Pour passer des ménages aux foyers fiscaux nous utilisons [OpenFisca
+France Data](https://github.com/openfisca/openfisca-france-data).
+
+L’intégration continue de OpenFisca France Data effectue ce traitement.
+On le trouve sur le serveur dans `/mnt/data-out/leximpact/erfs-fpr/`,
+cela nous permet d’obtenir le fichier `openfisca_erfs_fpr_2021.h5` que
+l’on va utiliser à l’étape suivante.
+
+Si jamais vous souhaitez le refaire à la main :
+
+``` shell
+clone git@git.leximpact.dev:benjello/openfisca-france-data.git
+cd openfisca-france-data/
+python3 -m venv .venv
+source .venv/bin/activate
+make install
+cp /mnt/data-out/openfisca-france-data/openfisca_survey_manager_config-after-build-collection.ini ~/.config/openfisca-survey-manager/config.ini
+cp /mnt/data-out/data_collections/bilal/erfs_fpr.json ./erfs_fpr.json
+nano ~/.config/openfisca-survey-manager/config.ini
+nano /home/jupyter-benoit/openfisca-france-data/erfs_fpr.json
+cp /mnt/data-out/erfs_fpr_2021.h5 /home/jupyter-benoit/openfisca-france-data/erfs_fpr_2021.h5
+build-erfs-fpr -y 2021
+```
+
+Le script `build-erfs-fpr` exécute le code
+`openfisca_france_data.erfs_fpr.input_data_builder:main`.