Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
P
Prix carburants
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
leximpact
Simulateur socio-fiscal
Adaptations OpenFisca
Prix carburants
Commits
ca419f7f
Commit
ca419f7f
authored
1 year ago
by
Benoît Courty
Browse files
Options
Downloads
Patches
Plain Diff
Fix prix_par_carburant_annee_hectolitre
parent
c79889a2
No related branches found
No related tags found
1 merge request
!1
Update 2023
Changes
2
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
notebook_gouv/prix_carburant_gouv.ipynb
+240
-71
240 additions, 71 deletions
notebook_gouv/prix_carburant_gouv.ipynb
notebook_gouv/prix_par_carburant_annee_hectolitre.csv
+95
-2523
95 additions, 2523 deletions
notebook_gouv/prix_par_carburant_annee_hectolitre.csv
with
335 additions
and
2594 deletions
notebook_gouv/prix_carburant_gouv.ipynb
+
240
−
71
View file @
ca419f7f
...
...
@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count":
3
,
"execution_count":
2
,
"id": "d60999c6-2ae5-430b-934c-a95d309a496c",
"metadata": {
"tags": []
...
...
@@ -26,7 +26,7 @@
},
{
"cell_type": "code",
"execution_count":
4
,
"execution_count":
3
,
"id": "523b21af-b7db-4fba-9ba5-831774c8e699",
"metadata": {},
"outputs": [],
...
...
@@ -1414,7 +1414,7 @@
},
{
"cell_type": "code",
"execution_count":
33
,
"execution_count":
4
,
"id": "238df7cb-b1bb-41f3-a447-05c625c46bc8",
"metadata": {
"tags": []
...
...
@@ -1502,7 +1502,7 @@
"max 2023.000000 2.270000"
]
},
"execution_count":
33
,
"execution_count":
4
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -1524,7 +1524,7 @@
},
{
"cell_type": "code",
"execution_count":
34
,
"execution_count":
5
,
"id": "428ffee1-3ef4-4738-a447-7e5958435042",
"metadata": {
"tags": []
...
...
@@ -1669,7 +1669,7 @@
"[32786 rows x 5 columns]"
]
},
"execution_count":
34
,
"execution_count":
5
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -1709,7 +1709,7 @@
},
{
"cell_type": "code",
"execution_count":
35
,
"execution_count":
6
,
"id": "3985c3d7-f4d0-43b9-a9db-394cd83534ea",
"metadata": {
"tags": []
...
...
@@ -1749,7 +1749,7 @@
},
{
"cell_type": "code",
"execution_count":
36
,
"execution_count":
7
,
"id": "8a712431-90ff-42bb-9449-3f89bbaf2a15",
"metadata": {
"tags": []
...
...
@@ -1819,7 +1819,7 @@
"30263 75 GPLc 2023 12 0.99"
]
},
"execution_count":
36
,
"execution_count":
7
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -1838,7 +1838,7 @@
},
{
"cell_type": "code",
"execution_count":
37
,
"execution_count":
8
,
"id": "0803570e-3b2c-4f0d-bc8b-aa34a3f6dfa6",
"metadata": {
"tags": []
...
...
@@ -1904,7 +1904,7 @@
"2521 75 GPLc 2023 1.00"
]
},
"execution_count":
37
,
"execution_count":
8
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -1924,7 +1924,7 @@
},
{
"cell_type": "code",
"execution_count":
38
,
"execution_count":
9
,
"id": "c48fc388-00cb-4c5d-a373-29be65b2559e",
"metadata": {
"tags": []
...
...
@@ -1994,7 +1994,7 @@
"30263 75 GPLc 2023 12 99.0"
]
},
"execution_count":
38
,
"execution_count":
9
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -2012,7 +2012,7 @@
},
{
"cell_type": "code",
"execution_count":
39
,
"execution_count":
10
,
"id": "6d54c6f0-0d79-4292-b13b-f1ada4a621a6",
"metadata": {
"tags": []
...
...
@@ -2078,7 +2078,7 @@
"2521 75 GPLc 2023 100.0"
]
},
"execution_count":
39
,
"execution_count":
10
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -2097,7 +2097,7 @@
},
{
"cell_type": "code",
"execution_count":
40
,
"execution_count":
11
,
"id": "9d0cc5e4-1054-4efc-aa17-b23b0a46b2e0",
"metadata": {
"tags": []
...
...
@@ -2159,7 +2159,7 @@
"93 SP98 2023 1.972"
]
},
"execution_count":
40
,
"execution_count":
11
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -2174,7 +2174,7 @@
},
{
"cell_type": "code",
"execution_count":
4
1,
"execution_count": 1
2
,
"id": "11c180d5-6c67-4a13-8b75-7f13dfd80712",
"metadata": {
"tags": []
...
...
@@ -2240,7 +2240,7 @@
"1127 SP98 2023 12 1.991"
]
},
"execution_count":
4
1,
"execution_count": 1
2
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -2255,7 +2255,7 @@
},
{
"cell_type": "code",
"execution_count":
42
,
"execution_count":
18
,
"id": "058ae67f-3dd9-40cc-8215-f7a633028329",
"metadata": {
"tags": []
...
...
@@ -2290,6 +2290,62 @@
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>82</td>\n",
" <td>Gazole</td>\n",
" <td>2007</td>\n",
" <td>110.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>82</td>\n",
" <td>Gazole</td>\n",
" <td>2008</td>\n",
" <td>128.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>82</td>\n",
" <td>Gazole</td>\n",
" <td>2009</td>\n",
" <td>101.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>82</td>\n",
" <td>Gazole</td>\n",
" <td>2010</td>\n",
" <td>116.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>82</td>\n",
" <td>Gazole</td>\n",
" <td>2011</td>\n",
" <td>135.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2517</th>\n",
" <td>75</td>\n",
" <td>GPLc</td>\n",
" <td>2019</td>\n",
" <td>87.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2518</th>\n",
" <td>75</td>\n",
" <td>GPLc</td>\n",
" <td>2020</td>\n",
" <td>87.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2519</th>\n",
" <td>75</td>\n",
" <td>GPLc</td>\n",
...
...
@@ -2312,16 +2368,27 @@
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2522 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" region carburant annee prix_moyen_par_hectolitre\n",
"0 82 Gazole 2007 110.0\n",
"1 82 Gazole 2008 128.0\n",
"2 82 Gazole 2009 101.0\n",
"3 82 Gazole 2010 116.0\n",
"4 82 Gazole 2011 135.0\n",
"... ... ... ... ...\n",
"2517 75 GPLc 2019 87.0\n",
"2518 75 GPLc 2020 87.0\n",
"2519 75 GPLc 2021 87.0\n",
"2520 75 GPLc 2022 86.0\n",
"2521 75 GPLc 2023 100.0"
"2521 75 GPLc 2023 100.0\n",
"\n",
"[2522 rows x 4 columns]"
]
},
"execution_count":
42
,
"execution_count":
18
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -2331,13 +2398,115 @@
"df_ann_hecto = pd.read_csv(\"prix_annuel_carburants_par_regions_litre.csv\", sep=\",\")\n",
"df_ann_hecto[\"prix_moyen_par_hectolitre\"] = round(df_ann_hecto['prix_moyen_par_litre'] * 100,2)\n",
"df_ann_hecto.drop([\"prix_moyen_par_litre\"], inplace=True, axis=1)\n",
"df_ann_hecto"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "cf7934b5-ec63-41f6-b439-1eede448e103",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"array(['region', 'carburant', 'annee', 'prix_moyen_par_hectolitre'],\n",
" dtype=object)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_ann_hecto.columns.values"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "581f9f82-9820-41a2-bed7-513c40626fb2",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>annee</th>\n",
" <th>carburant</th>\n",
" <th>prix_moyen_par_hectolitre</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>2023</td>\n",
" <td>Gazole</td>\n",
" <td>185.54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>2023</td>\n",
" <td>SP95</td>\n",
" <td>193.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>2023</td>\n",
" <td>SP98</td>\n",
" <td>197.23</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" annee carburant prix_moyen_par_hectolitre\n",
"91 2023 Gazole 185.54\n",
"92 2023 SP95 193.00\n",
"93 2023 SP98 197.23"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Supprime la région\n",
"df_ann_hecto = df_ann_hecto.groupby(['annee', 'carburant']).agg({'prix_moyen_par_hectolitre': ['mean']}, as_index=False).round(2)\n",
"df_ann_hecto.reset_index(inplace=True)\n",
"df_ann_hecto.columns = [[\"annee\", \"carburant\", \"prix_moyen_par_hectolitre\"]]\n",
"\n",
"df_ann_hecto.to_csv(r'prix_par_carburant_annee_hectolitre.csv',index = False, header=True)\n",
"df_ann_hecto.tail(3)"
]
},
{
"cell_type": "code",
"execution_count":
43
,
"execution_count":
21
,
"id": "ad5b8499-9eaf-4a8f-84c2-2237b94ab818",
"metadata": {
"tags": []
...
...
@@ -2356,49 +2525,49 @@
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align:
righ
t;\n",
" .dataframe thead
tr
th {\n",
" text-align:
lef
t;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr
style=\"text-align: right;\"
>\n",
" <tr>\n",
" <th></th>\n",
" <th>carburant</th>\n",
" <th>annee</th>\n",
" <th>prix_moyen_par_litre</th>\n",
" <th>carburant</th>\n",
" <th>prix_moyen_par_hectolitre</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>
E10
</td>\n",
" <td>
2009
</td>\n",
" <td>
1
.24
0
</td>\n",
" <td>
2007
</td>\n",
" <td>
E85
</td>\n",
" <td>
82
.24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>
E10
</td>\n",
" <td>
2010
</td>\n",
" <td>
1.360
</td>\n",
" <td>
2007
</td>\n",
" <td>
GPLc
</td>\n",
" <td>
73.27
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>
E10
</td>\n",
" <td>
2011
</td>\n",
" <td>1
.51
9</td>\n",
" <td>
2007
</td>\n",
" <td>
Gazole
</td>\n",
" <td>1
10.0
9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>
E10
</td>\n",
" <td>
2012
</td>\n",
" <td>1
.573
</td>\n",
" <td>
2007
</td>\n",
" <td>
SP95
</td>\n",
" <td>1
29.48
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>
E10
</td>\n",
" <td>
2013
</td>\n",
" <td>
1.556
</td>\n",
" <td>
2008
</td>\n",
" <td>
E85
</td>\n",
" <td>
85.82
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
...
...
@@ -2408,33 +2577,33 @@
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>
SP98
</td>\n",
" <td>
2019
</td>\n",
" <td>1
.582
</td>\n",
" <td>
2023
</td>\n",
" <td>
E85
</td>\n",
" <td>1
15.00
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>
SP98
</td>\n",
" <td>
2020
</td>\n",
" <td>1
.436
</td>\n",
" <td>
2023
</td>\n",
" <td>
GPLc
</td>\n",
" <td>1
02.69
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>
SP98
</td>\n",
" <td>
2021
</td>\n",
" <td>1
.621
</td>\n",
" <td>
2023
</td>\n",
" <td>
Gazole
</td>\n",
" <td>1
85.54
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>
SP98
</td>\n",
" <td>
2022
</td>\n",
" <td>1
.856
</td>\n",
" <td>
2023
</td>\n",
" <td>
SP95
</td>\n",
" <td>1
93.00
</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>SP98</td>\n",
" <td>2023</td>\n",
" <td>1.972</td>\n",
" <td>SP98</td>\n",
" <td>197.23</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
...
...
@@ -2442,29 +2611,29 @@
"</div>"
],
"text/plain": [
" carburant
annee
prix_moyen_par_litre\n",
"0
E10 2009
1
.24
0
\n",
"1
E10 2010
1.360
\n",
"2
E10 2011
1
.51
9\n",
"3
E10 2012
1
.573
\n",
"4
E10 2013
1.556
\n",
"
annee
carburant prix_moyen_par_
hecto
litre\n",
"0
2007 E85
82
.24\n",
"1
2007
GPLc
73.27
\n",
"2
2007 Gazole
1
10.0
9\n",
"3
2007
SP95
1
29.48
\n",
"4
2008 E85
85.82
\n",
".. ... ... ...\n",
"89
SP98 2019
1
.582
\n",
"90
SP98 2020
1
.436
\n",
"91
SP98 2021
1
.621
\n",
"92 SP9
8
2022
1
.856
\n",
"93 SP98
2023
1
.
97
2
\n",
"89
2023 E85
1
15.00
\n",
"90
2023 GPLc
1
02.69
\n",
"91
2023 Gazole
1
85.54
\n",
"92
2023
SP9
5
1
93.00
\n",
"93
2023
SP98 197
.23
\n",
"\n",
"[94 rows x 3 columns]"
]
},
"execution_count":
43
,
"execution_count":
21
,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_ann"
"df_ann
_hecto
"
]
},
{
...
...
%% Cell type:code id:d60999c6-2ae5-430b-934c-a95d309a496c tags:
```
python
import
zipfile
import
os
import
xml.etree.ElementTree
as
ET
import
csv
import
time
from
urllib.request
import
urlretrieve
from
datetime
import
date
from
calendar
import
monthrange
from
tqdm
import
tqdm
import
pandas
as
pd
import
requests
import
json
from
retrying
import
retry
```
%% Cell type:code id:523b21af-b7db-4fba-9ba5-831774c8e699 tags:
```
python
START_DATE
=
2007
END_DATE
=
2023
```
%% Cell type:code id:bbf067e2-95d6-4375-93f2-41ef842893b0 tags:
```
python
#recupération des bases de donnée sur le site du gouvernement.
def
recuperation_xml
(
date_debut
,
date_fin
):
for
date
in
tqdm
(
range
(
date_debut
,
date_fin
+
1
,
1
)):
directory_to_extract_to
=
os
.
path
.
join
(
"
unzip_file
"
)
path_to_zip_file
=
os
.
path
.
join
(
"
zip_file
"
,
f
"
PrixCarburants_annuel_
{
date
}
.zip
"
)
url
=
f
"
https://donnees.roulez-eco.fr/opendata/annee/
{
date
}
"
print
(
url
)
urlretrieve
(
url
,
path_to_zip_file
)
with
zipfile
.
ZipFile
(
path_to_zip_file
,
'
r
'
)
as
zip_ref
:
zip_ref
.
extractall
(
directory_to_extract_to
)
recuperation_xml
(
START_DATE
,
END_DATE
)
```
%% Output
0%| | 0/17 [00:00<?, ?it/s]
https://donnees.roulez-eco.fr/opendata/annee/2007
6%|▌ | 1/17 [00:01<00:27, 1.71s/it]
https://donnees.roulez-eco.fr/opendata/annee/2008
12%|█▏ | 2/17 [00:04<00:31, 2.13s/it]
https://donnees.roulez-eco.fr/opendata/annee/2009
18%|█▊ | 3/17 [00:06<00:30, 2.18s/it]
https://donnees.roulez-eco.fr/opendata/annee/2010
24%|██▎ | 4/17 [00:08<00:30, 2.33s/it]
https://donnees.roulez-eco.fr/opendata/annee/2011
29%|██▉ | 5/17 [00:10<00:25, 2.10s/it]
https://donnees.roulez-eco.fr/opendata/annee/2012
35%|███▌ | 6/17 [00:12<00:24, 2.18s/it]
https://donnees.roulez-eco.fr/opendata/annee/2013
41%|████ | 7/17 [00:15<00:24, 2.41s/it]
https://donnees.roulez-eco.fr/opendata/annee/2014
47%|████▋ | 8/17 [00:17<00:20, 2.25s/it]
https://donnees.roulez-eco.fr/opendata/annee/2015
53%|█████▎ | 9/17 [00:20<00:18, 2.35s/it]
https://donnees.roulez-eco.fr/opendata/annee/2016
59%|█████▉ | 10/17 [00:23<00:18, 2.61s/it]
https://donnees.roulez-eco.fr/opendata/annee/2017
65%|██████▍ | 11/17 [00:26<00:16, 2.70s/it]
https://donnees.roulez-eco.fr/opendata/annee/2018
71%|███████ | 12/17 [00:28<00:12, 2.60s/it]
https://donnees.roulez-eco.fr/opendata/annee/2019
76%|███████▋ | 13/17 [00:31<00:10, 2.72s/it]
https://donnees.roulez-eco.fr/opendata/annee/2020
82%|████████▏ | 14/17 [00:34<00:07, 2.63s/it]
https://donnees.roulez-eco.fr/opendata/annee/2021
88%|████████▊ | 15/17 [00:36<00:05, 2.60s/it]
https://donnees.roulez-eco.fr/opendata/annee/2022
94%|█████████▍| 16/17 [00:40<00:02, 2.97s/it]
https://donnees.roulez-eco.fr/opendata/annee/2023
100%|██████████| 17/17 [00:43<00:00, 2.56s/it]
%% Cell type:code id:6c27528f-fbbd-4c34-86fe-a904c8181f77 tags:
```
python
# utilisation de l'API de adress.data.gouv.fr pour passer de la latitude et longitude, au citycode
@retry
(
stop_max_attempt_number
=
5
,
wait_fixed
=
2500
)
def
citycode_from_lat_long
(
longitude
,
latitude
):
url
=
f
"
https://api-adresse.data.gouv.fr/reverse/?lon=
{
longitude
}
&lat=
{
latitude
}
"
response
=
requests
.
get
(
url
)
contenu
=
response
.
json
()
features
=
contenu
[
'
features
'
]
if
len
(
features
)
==
0
:
return
None
else
:
citycode
=
contenu
[
'
features
'
][
0
][
'
properties
'
][
'
citycode
'
]
return
citycode
```
%% Cell type:code id:d67ca228-3db6-446b-bcac-f1efafd129f6 tags:
```
python
# passage du citycode au code du departement
def
code_departement_from_citycode
(
citycode
):
if
citycode
[
:
2
]
>=
'
97
'
:
code_departement
=
citycode
[
:
3
]
else
:
code_departement
=
citycode
[
:
2
]
return
code_departement
```
%% Cell type:code id:e8b5e2f4-2095-4c8f-a11d-d11de4cff76c tags:
```
python
# passage du code postal au code du departement
def
code_departement_from_code_postal
(
code_postal
):
if
code_postal
==
'
99999
'
:
return
None
elif
code_postal
[
:
2
]
>=
'
97
'
:
code_departement
=
code_postal
[
:
3
]
elif
code_postal
[
:
3
]
in
[
"
200
"
,
"
201
"
]
:
code_departement
=
"
2A
"
elif
code_postal
[
:
3
]
in
[
"
202
"
,
"
206
"
]:
code_departement
=
"
2B
"
else
:
code_departement
=
code_postal
[
:
2
]
return
code_departement
```
%% Cell type:code id:64a0d8fc-649a-4710-839e-416706a5f712 tags:
```
python
# passage du code du departement au code région en utilisant l'API Métadonnées - V1 de l'INSEE
# documentation à API nomenclatures géographiques Insee
# attention, la clé doit être réactualisé tous les 7 jours...
# l'API est limité à 30 requêtes par minute
cache_code_region_from_code_departement
=
{}
@retry
(
stop_max_attempt_number
=
5
,
wait_fixed
=
2000
)
def
code_region_from_code_departement
(
code_departement
,
date
):
if
cache_code_region_from_code_departement
.
get
(
code_departement
):
if
cache_code_region_from_code_departement
.
get
(
code_departement
).
get
(
date
):
return
cache_code_region_from_code_departement
.
get
(
code_departement
).
get
(
date
)
else
:
cache_code_region_from_code_departement
[
code_departement
][
date
]
=
None
else
:
cache_code_region_from_code_departement
[
code_departement
]
=
{}
# Cache non trouvé, on appel l'INSEE
headers
=
{
'
Accept
'
:
'
application/json
'
,
'
Authorization
'
:
'
Bearer 64011ad9-a729-3fc1-bcfe-93521808e51a
'
,
#Le changement est ici
}
params
=
{
'
date
'
:
date
,
}
url
=
f
'
https://api.insee.fr/metadonnees/V1/geo/departement/
{
code_departement
}
/ascendants
'
response
=
requests
.
get
(
url
,
params
=
params
,
headers
=
headers
)
if
response
.
status_code
!=
200
:
error
=
f
"
code_region_from_code_departement - Warning : code retour
{
response
.
status_code
}
,
{
response
.
text
}
retrying...
"
print
(
error
)
raise
Exception
(
error
)
contenu
=
response
.
json
()
# l'API est limité à 30 requêtes par minute
time
.
sleep
(
2.1
)
if
isinstance
(
contenu
,
dict
):
print
(
contenu
)
cache_code_region_from_code_departement
[
code_departement
][
date
]
=
contenu
[
0
][
'
code
'
]
return
cache_code_region_from_code_departement
[
code_departement
][
date
]
```
%% Cell type:code id:0f05b801-1601-4d78-858e-30fdadf4608a tags:
```
python
code_region_from_code_departement
(
"
21
"
,
"
2023-01-01
"
)
```
%% Output
'27'
%% Cell type:code id:c5f67bd6-5cf9-4e09-a587-f4b2454f4618 tags:
```
python
#Les APIs sont relativement fragile, il arrive qu'il y ai des erreurs 500 ou 502.
#Dans le cas là if faut supprimer l'année qui était en train de boucler de "prix_by_region".
#Il faut ensuite recommencer la boucle à partir de cette date.
def
debug_if_error_500
(
date_debut
,
date_fin
):
for
region
,
prix_by_carburant
in
prix_by_region
.
items
():
for
carburant
,
prix_by_annee
in
prix_by_carburant
.
items
():
for
annee
in
range
(
date_debut
,
date_fin
+
1
):
if
annee
in
prix_by_annee
:
del
prix_by_annee
[
annee
]
# debug_if_error_500(2007,2007)
```
%% Cell type:code id:14979ff2-770a-4a6c-8780-13a76a98512a tags:
```
python
# tree = ET.parse('unzip_file/PrixCarburants_annuel_2021.xml')
# pdv_liste = tree.getroot()
```
%% Cell type:code id:bb42e6c2-f9e8-49da-a372-88b9b869993b tags:
```
python
citycode_lat_long
=
{}
```
%% Cell type:code id:4d1a148e-db02-42b5-b4b9-35c1ab57d924 tags:
```
python
prix_by_region
=
{}
```
%% Cell type:code id:2cd9550a-5c9b-4787-a372-d4f8309eaf9d tags:
```
python
# Temps de traitement : 5 minutes par année.
#boucle principale, qui récupére les données des fichiers XML,
#trouve le code région de chaque station,
#récupère les données importantes, dont le prix par jour, par carburant, par station,
#nous avons uniquement les prix des jours ou il y a eu un changement, il faut créer un prix aux jours ou il n'y en a pas eu,
#fait la moyenne par jour de toutes les stations,
#fait la moyenne par région, par mois et par annee, des prix des différents carburants.
for
annee
in
range
(
START_DATE
,
END_DATE
+
1
):
print
(
annee
)
tree
=
ET
.
parse
(
f
'
unzip_file/PrixCarburants_annuel_
{
annee
}
.xml
'
)
pdv_liste
=
tree
.
getroot
()
date
=
f
'
{
annee
}
-01-01
'
region
=
{}
for
pdv
in
tqdm
(
pdv_liste
):
longitude
=
pdv
.
attrib
.
get
(
'
longitude
'
)
latitude
=
pdv
.
attrib
.
get
(
'
latitude
'
)
citycode
=
None
if
latitude
and
longitude
:
lat_long
=
f
"
{
latitude
}
,
{
longitude
}
"
if
latitude
and
longitude
:
lat_long
=
f
"
{
latitude
}
,
{
longitude
}
"
citycode
=
citycode_lat_long
.
get
(
lat_long
)
if
citycode
is
None
:
citycode
=
citycode_from_lat_long
(
float
(
longitude
)
/
100000
,
float
(
latitude
)
/
100000
)
if
citycode
is
not
None
:
citycode_lat_long
[
lat_long
]
=
citycode
code_departement
=
(
code_departement_from_code_postal
(
pdv
.
attrib
[
'
cp
'
])
if
citycode
is
None
else
code_departement_from_citycode
(
citycode
)
)
if
code_departement
is
None
:
print
(
'
code_departement is None
'
)
continue
code_region
=
region
.
get
(
code_departement
)
if
code_region
is
None
:
code_region
=
code_region_from_code_departement
(
code_departement
,
date
)
region
[
code_departement
]
=
code_region
for
prix_element
in
pdv
:
if
prix_element
.
tag
!=
'
prix
'
:
continue
if
prix_element
.
attrib
.
get
(
'
maj
'
)
is
None
:
continue
if
prix_element
.
attrib
.
get
(
'
nom
'
)
is
None
:
continue
if
prix_element
.
attrib
.
get
(
'
valeur
'
)
is
None
:
continue
prix_by_carburant
=
prix_by_region
.
setdefault
(
code_region
,{})
# prix_by_carburant = prix_by_region.get(code_region)
# if prix_by_carburant is None:
# prix_by_carburant = prix_by_region[code_region] = {}
if
'
T
'
in
prix_element
.
attrib
[
'
maj
'
]:
date_prix
=
prix_element
.
attrib
[
'
maj
'
].
split
(
'
T
'
)[
0
]
else
:
date_prix
=
prix_element
.
attrib
[
'
maj
'
].
split
(
'
'
)[
0
]
annee_prix
,
mois_prix
,
jour_prix
=
date_prix
.
split
(
'
-
'
)
annee_prix
,
mois_prix
,
jour_prix
=
int
(
annee_prix
),
int
(
mois_prix
),
int
(
jour_prix
)
prix_by_annee
=
prix_by_carburant
.
setdefault
(
prix_element
.
attrib
[
'
nom
'
],{})
prix_by_mois
=
prix_by_annee
.
setdefault
(
annee_prix
,{})
prix_by_jour
=
prix_by_mois
.
setdefault
(
mois_prix
,{})
prix_by_station
=
prix_by_jour
.
setdefault
(
jour_prix
,{})
prix_by_station
[
pdv
.
attrib
[
'
id
'
]]
=
prix_element
.
attrib
[
'
valeur
'
]
for
region
,
prix_by_carburant
in
prix_by_region
.
items
():
stations
=
set
()
prix_by_carburant
=
prix_by_region
[
region
]
for
carburant
,
prix_by_annee
in
prix_by_carburant
.
items
():
dernier_prix_par_station
=
{}
prix_by_mois
=
prix_by_annee
.
setdefault
(
annee
,{})
for
mois
in
range
(
1
,
13
):
prix_by_jour
=
prix_by_mois
.
setdefault
(
mois
,{})
dernier_jour
=
monthrange
(
annee
,
mois
)[
1
]
for
jour
in
range
(
1
,
dernier_jour
+
1
):
prix_by_station
=
prix_by_jour
.
setdefault
(
jour
,{})
stations
=
stations
.
union
(
prix_by_station
.
keys
())
for
station
in
stations
:
prix
=
prix_by_station
.
get
(
station
)
if
prix
is
None
:
prix_by_station
[
station
]
=
dernier_prix_par_station
.
get
(
station
)
else
:
dernier_prix_par_station
[
station
]
=
prix
for
region
,
prix_by_carburant
in
prix_by_region
.
items
():
for
carburant
,
prix_by_annee
in
prix_by_carburant
.
items
():
prix_by_mois
=
prix_by_annee
.
setdefault
(
annee
,{})
for
mois
,
prix_by_jour
in
prix_by_mois
.
items
():
for
jour
,
prix_by_station
in
prix_by_jour
.
items
():
count
=
0
total
=
0
for
station
,
prix
in
prix_by_station
.
items
():
if
prix
is
not
None
:
total
+=
float
(
prix
)
count
+=
1
prix_by_jour
[
jour
]
=
round
(
total
/
count
,
2
)
if
count
>
0
else
None
for
region
,
prix_by_carburant
in
prix_by_region
.
items
():
for
carburant
,
prix_by_annee
in
prix_by_carburant
.
items
():
prix_by_mois
=
prix_by_annee
[
annee
]
count_annee
=
0
total_annee
=
0
for
mois
,
prix_by_jour
in
prix_by_mois
.
items
():
count_mois
=
0
total_mois
=
0
for
jour
,
prix
in
prix_by_jour
.
items
():
if
prix
is
not
None
:
count_mois
+=
1
total_mois
+=
prix
count_annee
+=
1
total_annee
+=
prix
if
count_mois
==
0
:
prix_by_mois
[
mois
]
=
None
else
:
prix_by_mois
[
mois
]
=
round
(
total_mois
/
count_mois
,
2
)
if
count_annee
==
0
:
prix_by_mois
[
'
moyenne
'
]
=
None
else
:
prix_by_mois
[
'
moyenne
'
]
=
round
(
total_annee
/
count_annee
,
2
)
```
%% Output
2007
100%|██████████| 7904/7904 [12:25<00:00, 10.61it/s]
2008
100%|██████████| 8394/8394 [05:08<00:00, 27.18it/s]
2009
100%|██████████| 9387/9387 [05:14<00:00, 29.85it/s]
2010
100%|██████████| 10130/10130 [05:17<00:00, 31.86it/s]
2011
100%|██████████| 10001/10001 [04:29<00:00, 37.09it/s]
2012
100%|██████████| 10256/10256 [04:32<00:00, 37.59it/s]
2013
100%|██████████| 10807/10807 [04:26<00:00, 40.50it/s]
code_departement is None
2014
100%|██████████| 11064/11064 [04:15<00:00, 43.24it/s]
2015
100%|██████████| 12333/12333 [04:45<00:00, 43.15it/s]
2016
100%|██████████| 12391/12391 [04:52<00:00, 42.33it/s]
2017
100%|██████████| 12559/12559 [06:51<00:00, 30.52it/s]
2018
100%|██████████| 12785/12785 [07:37<00:00, 27.96it/s]
2019
100%|██████████| 12969/12969 [05:42<00:00, 37.89it/s]
2020
100%|██████████| 13188/13188 [05:32<00:00, 39.63it/s]
2021
100%|██████████| 13386/13386 [05:03<00:00, 44.13it/s]
2022
100%|██████████| 13645/13645 [05:37<00:00, 40.39it/s]
2023
100%|██████████| 13755/13755 [04:46<00:00, 47.93it/s]
%% Cell type:code id:a57cc7e5-fbd2-4d3f-bdec-0d0e56118478 tags:
```
python
with
open
(
"
cache_code_region_from_code_departement.json
"
,
"
w
"
)
as
outfile
:
outfile
.
write
(
json
.
dumps
(
cache_code_region_from_code_departement
,
indent
=
4
))
with
open
(
"
prix_by_region.json
"
,
"
w
"
)
as
outfile
:
outfile
.
write
(
json
.
dumps
(
prix_by_region
,
indent
=
4
))
```
%% Cell type:code id:0f26bf8a-397d-4522-8409-f9f4681ce870 tags:
```
python
#Lisse le dictionnaire "prix_by_region".
liste_prix_mensuel
=
[]
liste_prix_annuel
=
[]
for
region
,
prix_by_carburant
in
prix_by_region
.
items
():
for
carburant
,
prix_by_annee
in
prix_by_carburant
.
items
():
for
annee
,
prix_by_mois
in
prix_by_annee
.
items
():
for
mois
,
prix
in
prix_by_mois
.
items
():
if
prix_by_mois
.
values
==
'
moyenne
'
:
pass
prix_region_mensuel
=
{
"
region
"
:
region
,
"
carburant
"
:
carburant
,
"
annee
"
:
annee
,
"
mois
"
:
mois
,
"
prix_moyen
"
:
prix
,
}
liste_prix_mensuel
.
append
(
prix_region_mensuel
)
```
%% Cell type:code id:9f4a3d64-7221-4cb4-82a4-b33622fdedcc tags:
```
python
prix_region_mensuel
```
%% Output
{'region': '75',
'carburant': 'GPLc',
'annee': 2023,
'mois': 'moyenne',
'prix_moyen': 1.0}
%% Cell type:code id:6e6ab168-50df-48b1-995f-f31813e23dda tags:
```
python
with
open
(
"
liste_prix_mensuel_region.json
"
,
"
w
"
)
as
outfile
:
outfile
.
write
(
json
.
dumps
(
liste_prix_mensuel
,
indent
=
4
))
```
%% Cell type:code id:5aa8e3f5-45e0-482c-8510-5fb7e8d79edc tags:
```
python
df_prix_region_litre
=
pd
.
DataFrame
.
from_dict
(
liste_prix_mensuel
)
df_prix_region_litre
```
%% Output
region carburant annee mois prix_moyen
0 82 Gazole 2007 1 1020.41
1 82 Gazole 2007 2 1026.61
2 82 Gazole 2007 3 1042.85
3 82 Gazole 2007 4 1070.66
4 82 Gazole 2007 5 1077.74
... ... ... ... ... ...
32781 75 GPLc 2023 5 1.01
32782 75 GPLc 2023 10 0.99
32783 75 GPLc 2023 11 0.99
32784 75 GPLc 2023 12 0.99
32785 75 GPLc 2023 moyenne 1.00
[32786 rows x 5 columns]
%% Cell type:code id:0ea9fe1d-4247-45a6-8c6a-26dd5dd8407a tags:
```
python
df_prix_region_litre
.
query
(
"
mois ==
'
moyenne
'
and annee == 2022 and region ==
'
82
'"
)
```
%% Output
region carburant annee mois prix_moyen
207 82 Gazole 2022 moyenne NaN
428 82 SP95 2022 moyenne NaN
649 82 GPLc 2022 moyenne NaN
870 82 E85 2022 moyenne NaN
1065 82 E10 2022 moyenne NaN
1208 82 SP98 2022 moyenne NaN
%% Cell type:code id:8e5043a5-0519-46d4-a935-654ea6cae005 tags:
```
python
df_prix_region_litre
.
query
(
"
mois ==
'
moyenne
'
and annee == 2022 and region ==
'
75
'"
)
```
%% Output
region carburant annee mois prix_moyen
32252 75 Gazole 2022 moyenne 1.87
32356 75 SP95 2022 moyenne 1.84
32460 75 E85 2022 moyenne 0.82
32564 75 SP98 2022 moyenne 1.89
32668 75 E10 2022 moyenne 1.79
32772 75 GPLc 2022 moyenne 0.86
%% Cell type:code id:2184e8e0-d785-4083-9591-87ec10e2d2f8 tags:
```
python
df_liste_prix_mensuel_region
=
pd
.
DataFrame
.
from_dict
(
liste_prix_mensuel
)
df_liste_prix_mensuel_region
.
tail
(
3
)
```
%% Output
region carburant annee mois prix_moyen
32783 75 GPLc 2023 11 0.99
32784 75 GPLc 2023 12 0.99
32785 75 GPLc 2023 moyenne 1.00
%% Cell type:markdown id:7fae17f3-2188-486d-99f4-fe2140488168 tags:
## Conversion JSON en CSV
%% Cell type:code id:238df7cb-b1bb-41f3-a447-05c625c46bc8 tags:
```
python
with
open
(
"
liste_prix_mensuel_region.json
"
,
"
r
"
)
as
file
:
liste_prix_mensuel_region
=
json
.
load
(
file
)
df_prix_region_litre
=
pd
.
DataFrame
(
liste_prix_mensuel_region
)
def
fix_prix
(
row
):
# Avant 2022 les prix sont en millième de centime
if
row
[
"
prix_moyen
"
]
>
500
:
row
[
"
prix_moyen
"
]
=
row
[
"
prix_moyen
"
]
/
1000
return
row
df_liste_prix_mensuel_region
=
df_prix_region_litre
.
apply
(
fix_prix
,
axis
=
1
)
df_liste_prix_mensuel_region
.
describe
()
```
%% Output
annee prix_moyen
count 32786.000000 21089.000000
mean 2016.101507 1.236566
std 4.662337 0.340319
min 2007.000000 0.605400
25% 2012.000000 0.891460
50% 2017.000000 1.311400
75% 2020.000000 1.501820
max 2023.000000 2.270000
%% Cell type:code id:428ffee1-3ef4-4738-a447-7e5958435042 tags:
```
python
df_liste_prix_mensuel_region
```
%% Output
region carburant annee mois prix_moyen
0 82 Gazole 2007 1 1.02041
1 82 Gazole 2007 2 1.02661
2 82 Gazole 2007 3 1.04285
3 82 Gazole 2007 4 1.07066
4 82 Gazole 2007 5 1.07774
... ... ... ... ... ...
32781 75 GPLc 2023 5 1.01000
32782 75 GPLc 2023 10 0.99000
32783 75 GPLc 2023 11 0.99000
32784 75 GPLc 2023 12 0.99000
32785 75 GPLc 2023 moyenne 1.00000
[32786 rows x 5 columns]
%% Cell type:markdown id:130286e7-c44c-4dbc-b5d7-fb80c831f6a6 tags:
Liste des régions en 2022 :
```
REG NCC
1 GUADELOUPE
2 MARTINIQUE
3 GUYANE
4 LA REUNION
6 MAYOTTE
11 ILE DE FRANCE
24 CENTRE VAL DE LOIRE
27 BOURGOGNE FRANCHE COMTE
28 NORMANDIE
32 HAUTS DE FRANCE
44 GRAND EST
52 PAYS DE LA LOIRE
53 BRETAGNE
75 NOUVELLE AQUITAINE
76 OCCITANIE
84 AUVERGNE RHONE ALPES
93 PROVENCE ALPES COTE D AZUR
94 CORSE
```
%% Cell type:code id:3985c3d7-f4d0-43b9-a9db-394cd83534ea tags:
```
python
for
year
in
range
(
START_DATE
,
END_DATE
+
1
):
df
=
df_liste_prix_mensuel_region
.
query
(
"
carburant==
'
Gazole
'
and annee==@year and mois==
'
moyenne
'
and prix_moyen!=prix_moyen
"
)
missing_dep
=
len
(
df
)
if
missing_dep
>
0
:
print
(
f
"
En
{
year
}
il manque
{
missing_dep
}
région :
{
df
.
region
.
values
}
"
)
```
%% Output
En 2015 il manque 1 région : ['02']
En 2016 il manque 17 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02']
En 2017 il manque 17 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02']
En 2018 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
En 2019 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
En 2020 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
En 2021 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
En 2022 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
En 2023 il manque 18 région : ['82' '22' '83' '73' '21' '91' '25' '54' '74' '26' '72' '43' '23' '31'
'41' '42' '02' '04']
%% Cell type:code id:8a712431-90ff-42bb-9449-3f89bbaf2a15 tags:
```
python
#créer la dataframe "prix_mensuel_carburants_par_regions_litre.csv"
indexNames
=
df_liste_prix_mensuel_region
[
df_liste_prix_mensuel_region
[
'
mois
'
]
==
'
moyenne
'
].
index
df_prix_mensuel_carburants_par_regions_litre
=
df_liste_prix_mensuel_region
.
copy
().
drop
(
indexNames
)
df_prix_mensuel_carburants_par_regions_litre
.
reset_index
(
drop
=
True
,
inplace
=
True
)
df_prix_mensuel_carburants_par_regions_litre
[
'
prix_moyen
'
]
=
round
(
df_prix_mensuel_carburants_par_regions_litre
[
'
prix_moyen
'
]
*
1
,
2
)
df_prix_mensuel_carburants_par_regions_litre
.
rename
(
columns
=
{
'
prix_moyen
'
:
'
prix_moyen_by_litre
'
},
inplace
=
True
)
df_prix_mensuel_carburants_par_regions_litre
.
to_csv
(
r
'
prix_mensuel_carburants_par_regions_litre.csv
'
,
index
=
False
,
header
=
True
)
df_prix_mensuel_carburants_par_regions_litre
.
head
(
3
)
df_prix_mensuel_carburants_par_regions_litre
.
tail
(
3
)
```
%% Output
region carburant annee mois prix_moyen_by_litre
30261 75 GPLc 2023 10 0.99
30262 75 GPLc 2023 11 0.99
30263 75 GPLc 2023 12 0.99
%% Cell type:code id:0803570e-3b2c-4f0d-bc8b-aa34a3f6dfa6 tags:
```
python
#créer la dataframe "prix_annuel_carburants_par_regions_litre.csv"
indexNames
=
df_liste_prix_mensuel_region
[
df_liste_prix_mensuel_region
[
'
mois
'
]
!=
'
moyenne
'
].
index
df_prix_annuel_carburants_par_regions_litre
=
df_liste_prix_mensuel_region
.
copy
().
drop
(
indexNames
,
inplace
=
False
)
df_prix_annuel_carburants_par_regions_litre
.
reset_index
(
drop
=
True
,
inplace
=
True
)
df_prix_annuel_carburants_par_regions_litre
.
drop
(
columns
=
[
'
mois
'
],
inplace
=
True
)
df_prix_annuel_carburants_par_regions_litre
[
'
prix_moyen
'
]
=
round
(
df_prix_annuel_carburants_par_regions_litre
[
'
prix_moyen
'
]
*
1
,
2
)
df_prix_annuel_carburants_par_regions_litre
.
rename
(
columns
=
{
'
prix_moyen
'
:
'
prix_moyen_par_litre
'
},
inplace
=
True
)
df_prix_annuel_carburants_par_regions_litre
.
to_csv
(
r
'
prix_annuel_carburants_par_regions_litre.csv
'
,
index
=
False
,
header
=
True
)
df_prix_annuel_carburants_par_regions_litre
.
tail
(
3
)
```
%% Output
region carburant annee prix_moyen_par_litre
2519 75 GPLc 2021 0.87
2520 75 GPLc 2022 0.86
2521 75 GPLc 2023 1.00
%% Cell type:code id:c48fc388-00cb-4c5d-a373-29be65b2559e tags:
```
python
#créer la dataframe "prix_mensuel_carburants_par_regions_hectolitre.csv"
indexNames
=
df_liste_prix_mensuel_region
[
df_liste_prix_mensuel_region
[
'
mois
'
]
==
'
moyenne
'
].
index
df_prix_mensuel_carburants_par_regions_hectolitre
=
df_liste_prix_mensuel_region
.
copy
().
drop
(
indexNames
)
df_prix_mensuel_carburants_par_regions_hectolitre
.
reset_index
(
drop
=
True
,
inplace
=
True
)
df_prix_mensuel_carburants_par_regions_hectolitre
[
'
prix_moyen
'
]
=
round
(
df_prix_mensuel_carburants_par_regions_hectolitre
[
'
prix_moyen
'
]
*
100
,
2
)
df_prix_mensuel_carburants_par_regions_hectolitre
.
rename
(
columns
=
{
'
prix_moyen
'
:
'
prix_moyen_par_hectolitre
'
},
inplace
=
True
)
df_prix_mensuel_carburants_par_regions_hectolitre
.
to_csv
(
r
'
prix_mensuel_carburants_par_regions_hectolitre.csv
'
,
index
=
False
,
header
=
True
)
df_prix_mensuel_carburants_par_regions_hectolitre
.
tail
(
3
)
```
%% Output
region carburant annee mois prix_moyen_par_hectolitre
30261 75 GPLc 2023 10 99.0
30262 75 GPLc 2023 11 99.0
30263 75 GPLc 2023 12 99.0
%% Cell type:code id:6d54c6f0-0d79-4292-b13b-f1ada4a621a6 tags:
```
python
#créer la dataframe "prix_annuel_carburants_par_regions_hectolitre.csv"
indexNames
=
df_liste_prix_mensuel_region
[
df_liste_prix_mensuel_region
[
'
mois
'
]
!=
'
moyenne
'
].
index
prix_annuel_carburants_par_regions_hectolitre
=
df_liste_prix_mensuel_region
.
copy
().
drop
(
indexNames
)
prix_annuel_carburants_par_regions_hectolitre
.
reset_index
(
drop
=
True
,
inplace
=
True
)
prix_annuel_carburants_par_regions_hectolitre
.
drop
(
columns
=
[
'
mois
'
],
inplace
=
True
)
prix_annuel_carburants_par_regions_hectolitre
[
'
prix_moyen
'
]
=
round
(
prix_annuel_carburants_par_regions_hectolitre
[
'
prix_moyen
'
]
*
100
,
2
)
prix_annuel_carburants_par_regions_hectolitre
.
rename
(
columns
=
{
'
prix_moyen
'
:
'
prix_moyen_par_hectolitre
'
},
inplace
=
True
)
prix_annuel_carburants_par_regions_hectolitre
.
to_csv
(
r
'
prix_annuel_carburants_par_regions_hectolitre.csv
'
,
index
=
False
,
header
=
True
)
prix_annuel_carburants_par_regions_hectolitre
.
tail
(
3
)
```
%% Output
region carburant annee prix_moyen_par_hectolitre
2519 75 GPLc 2021 87.1
2520 75 GPLc 2022 86.0
2521 75 GPLc 2023 100.0
%% Cell type:code id:9d0cc5e4-1054-4efc-aa17-b23b0a46b2e0 tags:
```
python
#agrege les prix au niveau national, pour pouvoir les verifier par rapport aux données de l'INSEE, et voir si il y a une coeherence.
df_ann
=
pd
.
read_csv
(
"
prix_annuel_carburants_par_regions_litre.csv
"
,
sep
=
"
,
"
)
df_ann
=
df_ann
.
groupby
([
'
carburant
'
,
'
annee
'
])[[
'
prix_moyen_par_litre
'
]].
mean
().
reset_index
().
round
(
3
)
df_ann
.
to_csv
(
r
'
prix_par_carburant_annee.csv
'
,
index
=
False
,
header
=
True
)
df_ann
.
tail
(
3
)
```
%% Output
carburant annee prix_moyen_par_litre
91 SP98 2021 1.621
92 SP98 2022 1.856
93 SP98 2023 1.972
%% Cell type:code id:11c180d5-6c67-4a13-8b75-7f13dfd80712 tags:
```
python
#agrege les prix au niveau national, pour pouvoir les verifier par rapport aux données de l'INSEE, et voir si il y a une coeherence.
df_mens
=
pd
.
read_csv
(
"
prix_mensuel_carburants_par_regions_litre.csv
"
,
sep
=
"
,
"
)
df_mens
=
df_mens
.
groupby
([
'
carburant
'
,
'
annee
'
,
'
mois
'
])[[
'
prix_moyen_by_litre
'
]].
mean
().
reset_index
().
round
(
3
)
df_mens
.
to_csv
(
r
'
prix_par_carburant_mois.csv
'
,
index
=
False
,
header
=
True
)
df_mens
.
tail
(
3
)
```
%% Output
carburant annee mois prix_moyen_by_litre
1125 SP98 2023 10 1.991
1126 SP98 2023 11 1.991
1127 SP98 2023 12 1.991
%% Cell type:code id:058ae67f-3dd9-40cc-8215-f7a633028329 tags:
```
python
# Création de prix_par_carburant_annee_hectolitre.csv
df_ann_hecto
=
pd
.
read_csv
(
"
prix_annuel_carburants_par_regions_litre.csv
"
,
sep
=
"
,
"
)
df_ann_hecto
[
"
prix_moyen_par_hectolitre
"
]
=
round
(
df_ann_hecto
[
'
prix_moyen_par_litre
'
]
*
100
,
2
)
df_ann_hecto
.
drop
([
"
prix_moyen_par_litre
"
],
inplace
=
True
,
axis
=
1
)
df_ann_hecto
.
to_csv
(
r
'
prix_par_carburant_annee_hectolitre.csv
'
,
index
=
False
,
header
=
True
)
df_ann_hecto
.
tail
(
3
)
df_ann_hecto
```
%% Output
region carburant annee prix_moyen_par_hectolitre
0 82 Gazole 2007 110.0
1 82 Gazole 2008 128.0
2 82 Gazole 2009 101.0
3 82 Gazole 2010 116.0
4 82 Gazole 2011 135.0
... ... ... ... ...
2517 75 GPLc 2019 87.0
2518 75 GPLc 2020 87.0
2519 75 GPLc 2021 87.0
2520 75 GPLc 2022 86.0
2521 75 GPLc 2023 100.0
[2522 rows x 4 columns]
%% Cell type:code id:cf7934b5-ec63-41f6-b439-1eede448e103 tags:
```
python
df_ann_hecto
.
columns
.
values
```
%% Output
array(['region', 'carburant', 'annee', 'prix_moyen_par_hectolitre'],
dtype=object)
%% Cell type:code id:581f9f82-9820-41a2-bed7-513c40626fb2 tags:
```
python
# Supprime la région
df_ann_hecto
=
df_ann_hecto
.
groupby
([
'
annee
'
,
'
carburant
'
]).
agg
({
'
prix_moyen_par_hectolitre
'
:
[
'
mean
'
]},
as_index
=
False
).
round
(
2
)
df_ann_hecto
.
reset_index
(
inplace
=
True
)
df_ann_hecto
.
columns
=
[[
"
annee
"
,
"
carburant
"
,
"
prix_moyen_par_hectolitre
"
]]
df_ann_hecto
.
to_csv
(
r
'
prix_par_carburant_annee_hectolitre.csv
'
,
index
=
False
,
header
=
True
)
df_ann_hecto
.
tail
(
3
)
```
%% Output
annee carburant prix_moyen_par_hectolitre
91 2023 Gazole 185.54
92 2023 SP95 193.00
93 2023 SP98 197.23
%% Cell type:code id:ad5b8499-9eaf-4a8f-84c2-2237b94ab818 tags:
```
python
df_ann
df_ann
_hecto
```
%% Output
carburant
annee
prix_moyen_par_litre
0
E10 2009
1
.24
0
1
E10 2010
1.360
2
E10 2011
1
.51
9
3
E10 2012
1
.573
4
E10 2013
1.556
.. ...
...
...
89
SP98 2019
1
.582
90
SP98 2020
1
.436
91
SP98 2021
1
.621
92 SP9
8
2022
1
.856
93 SP98
2023
1
.
97
2
annee
carburant prix_moyen_par_
hecto
litre
0
2007 E85
82
.24
1
2007
GPLc
73.27
2
2007 Gazole
1
10.0
9
3
2007
SP95
1
29.48
4
2008 E85
85.82
..
...
... ...
89
2023 E85
1
15.00
90
2023 GPLc
1
02.69
91
2023 Gazole
1
85.54
92
2023
SP9
5
1
93.00
93
2023
SP98 197
.23
[94 rows x 3 columns]
%% Cell type:code id:40fd5c27-a827-47d7-912e-072dc71ca860 tags:
```
python
```
...
...
This diff is collapsed.
Click to expand it.
notebook_gouv/prix_par_carburant_annee_hectolitre.csv
+
95
−
2523
View file @
ca419f7f
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment