Vote utilisateur: 4 / 5

Etoiles activesEtoiles activesEtoiles activesEtoiles activesEtoiles inactives

I recently stumbled on this interesting post on RealPython (excellent website by the way!):

Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects

This post has different subjects related to Pandas: - creating a datetime column - looping over Pandas data - saving/loading HDF data stores - ...

I focused on the looping over Pandas data part. They compare different approaches for looping over a dataframe and applying a basic (piecewise linear) function: - a "crappy" loop with .iloc to access the data - iterrows() - apply() with a lambda function

But I was a little bit disapointed to see that they did not actually implement the following other approaches: - itertuples()`

While .itertuples() tends to be a bit faster, let’s stay in Pandas and use .iterrows() in this example, because some readers might not have run across nametuple. - Numpy vectorize - Numpy (just a loop over Numpy vectors) - Cython - Numba

So I just wanted to complete their post by adding the latter approaches to the performance comparison, using the same .csv file. In order to compare all the different implementations on the same computer, I also copied and re-ran their code.

Note: my laptop CPU is an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (with some DDDR4-2400 RAM).

Etoiles inactivesEtoiles inactivesEtoiles inactivesEtoiles inactivesEtoiles inactives

In this post we are simply going to retrieve the restaurants from the city of Lyon-France from Open Street Map, and then plot them with Bokeh.

Downloading the restaurants name and coordinates is done using a fork of the great OSMnx library. The OSM-POI feature of this fork will probably soon be added to OSMnx from what I understand (issue).

First we create a fresh conda env, install jupyterlab, bokeh (the following lines show the Linux way to do it but a similar thing could be done with Windows):

$ conda create -n restaurants python=3.6
$ source activate restaurants
$ conda install jupyterlab
$ conda install -c bokeh bokeh
$ jupyter labextension install jupyterlab_bokeh
$ jupyter lab osm_restaurants.ipynb
 

The jupyterlab extension allows the rendering of JS Bokeh content.

Then we need to install the POI fork of OSMnx:

$ git clone Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.:HTenkanen/osmnx.git
$ cd osmnx/
osmnx $  git checkout 1-osm-poi-dev
osmnx $  pip install .
osmnx $  cd ..
 

And we are ready to run the notebook:

jupyter lab osm_restaurants.ipynb
 

Download OSM restaurants

In [1]:
import osmnx as ox
place = "Lyon, France"
restaurant_amenities = ['restaurant', 'cafe', 'fast_food']
restaurants = ox.pois_from_place(place=place, 
                                 amenities=restaurant_amenities)[['geometry', 
                                                                  'name', 
                                                                  'amenity', 
                                                                  'cuisine', 
                                                                  'element_type']]
 

We are looking for 3 kinds of amenity related to food: restaurants, cafés and fast-foods. The collected data is returned as a geodataframe, which is basically a Pandas dataframe associated with a geoserie of Shapely geometries. Along with the geometry, we are only keeping 4 columns:

  • restaurant name,
  • amenity type (restaurant, café or fast_food),
  • cuisine type and
  • element_type (OSM types: node, way relation).
In [2]:
restaurants.head()
 
Out[2]:
 

 geometrynameamenitycuisineelement_type
25733699 POINT (4.8634608 45.7439964) Le Petit Comptoir restaurant international node
25733700 POINT (4.8689407 45.7410332) L'Esprit Bistro restaurant NaN node
26641424 POINT (4.8346121 45.7569848) Comptoir des Marronniers restaurant NaN node
33065934 POINT (4.7732746 45.7393443) Auberge de la Vallée restaurant NaN node
35694312 POINT (4.8342288 45.7581985) McDonald's fast_food burger node

In [3]:
ax = restaurants.plot()
 

Vote utilisateur: 3 / 5

Etoiles activesEtoiles activesEtoiles activesEtoiles inactivesEtoiles inactives

A script for SQL Server to be run as sysadmin or a user that have enought priviledges on all databases to list all tables :

 

CREATE PROCEDURE [dbo].[sp_get_tables_sizes_all_dbs]


AS BEGIN

--sqlserver 2005 +
IF (SELECT count(*) FROM tempdb.sys.objects WHERE name = '##TABLESIZES_ALLDB')=1 BEGIN
DROP TABLE ##TABLESIZES_ALLDB;
END

CREATE TABLE ##TABLESIZES_ALLDB (
snapdate datetime,
srv nvarchar(1000),
sv nvarchar(1000),
_dbname nvarchar(1000),
nomTable nvarchar(1000),
"partition_id" bigint,
"partition_number" int,
lignes bigint,
"memory (kB)" bigint,
"data (kB)" bigint,
"indexes (kb)" bigint,
"data_compression" int,
data_compression_desc nvarchar(1000)
)

EXECUTE master.sys.sp_MSforeachdb
'USE [?];
insert into ##TABLESIZES_ALLDB
select getdate() as snapdate,cast(serverproperty(''MachineName'') as nvarchar(1000)) svr,cast(@@servicename as nvarchar(1000)) sv, ''?'' _dbname, nomTable= object_name(p.object_id),p.partition_id,p.partition_number,
lignes = sum(
CASE
When (p.index_id < 2) and (a.type = 1) Then p.rows
Else 0
END
),
''memory (kB)'' = cast(ltrim(str(sum(a.total_pages)* 8192 / 1024.,15,0)) as float),
''data (kB)'' = ltrim(str(sum(
CASE
When a.type <> 1 Then a.used_pages
When p.index_id < 2 Then a.data_pages
Else 0
END
) * 8192 / 1024.,15,0)),
''indexes (kb)'' = ltrim(str((sum(a.used_pages)-sum(
CASE
When a.type <> 1 Then a.used_pages
When p.index_id < 2 Then a.data_pages
Else 0
END) )* 8192 / 1024.,15,0)),p.data_compression,
p.data_compression_desc

from sys.partitions p, sys.allocation_units a ,sys.sysobjects s
where p.partition_id = a.container_id
and p.object_id = s.id and s.type = ''U'' -- User table type (system tables exclusion)
group by p.object_id,p.partition_id,p.partition_number,p.data_compression,p.data_compression_desc
order by 3 desc'
;

SELECT * FROM ##TABLESIZES_ALLDB

END
GO 
Commentaire (0) Clics: 6289

Vote utilisateur: 4 / 5

Etoiles activesEtoiles activesEtoiles activesEtoiles activesEtoiles inactives

Depuis la version 9i d'Oracle, la gestion de la mémoire peut se faire de manière automatique.

Le paramètre PGA_AGGREGATE_TARGET remplacant les paramètres SORT_AREA_SIZE et HASH_AREA_SIZE utilisé en 8i.

Il faut rappeler que la PGA est une zone mémoire privée où les processus allouent de la mémoire pour les opérations de tris, de hash ou de merge. De ce fait la zone de PGA est séparée de la SGA (System Global Area). Une troisième zone de mémoire, la UGA (User Global Area), maintient l'information sur l'état des sessions et des curseurs. En mode dédié, les processus alloue la zone UGA dans la PGA alors qu'en mode partagé la zone UGA est allouée dans la SGA (dans la LARGE POOL plus exactement).

Commentaire (0) Clics: 21205

Sous-catégories

Articles traitant de l'intégration de données

Des tutoriaux et cours gratuits sur Oracle

Tutoriaux sur Unix et les shells scripts