pandas-uuid#
API#
Base class for |
|
|
Extension array for storing uuid data in a |
|
Extension array for storing uuid data in a |
|
Pandas extension dtype for UUIDs. |
Supported storage backend for |
|
Supported element types when creating a |
Usage#
Use UuidDtype as a pandas extension dtype:
>>> from uuid import uuid4
>>> import pandas as pd
>>> from pandas_uuid import UuidDtype
>>>
>>> s = pd.Series([uuid4()], dtype=UuidDtype())
>>> s
0 cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid
Use specific storage types by importing UuidArray / ArrowUuidArray,
or by using the storage parameter of UuidDtype:
>>> cls = UuidDtype("numpy").construct_array_type()
>>> cls
<class 'pandas_uuid.UuidArray'>
>>> cls.random(2, rng=42)
<UuidArray>
[8826d916-cdfb-21c6-c1ff-91a761565a70, 2416da6e-c212-cddb-8d88-00160eb686b2]
Length: 2, dtype: uuid
Note
There is probably no good reason to ever set rng
to a static seed apart from testing.
pyarrow integration#
Convert from pandas ExtensionArray
to pyarrow.UuidArray automatically …
>>> import pyarrow as pa
>>>
>>> arr = pa.array(s.array)
>>> arr
<pyarrow.lib.UuidArray object at 0x...>
[
CD072CD8BE6F4F62AC4C09C28206E7E3
]
And back manually.
pyarrow.UuidType does not know about this package,
so we need to specifically use our UuidDtype:
>>> pd.Series(arr, dtype=UuidDtype())
0 cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid
or (this would make more sense with a pyarrow.Table):
>>> arr.to_pandas(types_mapper={pa.uuid(): UuidDtype()}.get)
0 cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid
pandas integration#
The ultimate goal for this package is to disappear and
the classes to move into the pandas package.
This would have several advantages:
As mentioned before, passing a
pyarrow.UuidArrayto any pandas API could make it automatically convert it to aUuidArray. Currently this results in a generic{Numpy,Arrow}ExtensionArrayinstead:>>> pd.Series(arr) 0 b'\xcd\x07,\xd8\xbeoOb\xacL\t\xc2\x82\x06\xe7\... dtype: object
Specifying
dtype="uuid"would work the same asdtype=UuidDtype(). Currently it fails:>>> pd.Series([uuid4()], dtype="uuid") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: data type 'uuid' not understood