pandas-uuid#

API#

pandas_uuid.BaseUuidArray()

Base class for UuidArray and ArrowUuidArray.

pandas_uuid.UuidArray(values, *[, copy, dtype])

Extension array for storing uuid data in a numpy.ndarray.

pandas_uuid.ArrowUuidArray(values, *[, dtype])

Extension array for storing uuid data in a pyarrow.ChunkedArray.

pandas_uuid.UuidDtype(storage)

Pandas extension dtype for UUIDs.

pandas_uuid.UuidStorage

Supported storage backend for UuidDtype.

pandas_uuid.UuidLike

Supported element types when creating a BaseUuidArray from a sequence.

Usage#

Use UuidDtype as a pandas extension dtype:

>>> from uuid import uuid4
>>> import pandas as pd
>>> from pandas_uuid import UuidDtype
>>>
>>> s = pd.Series([uuid4()], dtype=UuidDtype())
>>> s
0    cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid

Use specific storage types by importing UuidArray / ArrowUuidArray, or by using the storage parameter of UuidDtype:

>>> cls = UuidDtype("numpy").construct_array_type()
>>> cls
<class 'pandas_uuid.UuidArray'>
>>> cls.random(2, rng=42)
<UuidArray>
[8826d916-cdfb-21c6-c1ff-91a761565a70, 2416da6e-c212-cddb-8d88-00160eb686b2]
Length: 2, dtype: uuid

Note

There is probably no good reason to ever set rng to a static seed apart from testing.

pyarrow integration#

Convert from pandas ExtensionArray to pyarrow.UuidArray automatically …

>>> import pyarrow as pa
>>>
>>> arr = pa.array(s.array)
>>> arr
<pyarrow.lib.UuidArray object at 0x...>
[
  CD072CD8BE6F4F62AC4C09C28206E7E3
]

And back manually. pyarrow.UuidType does not know about this package, so we need to specifically use our UuidDtype:

>>> pd.Series(arr, dtype=UuidDtype())
0    cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid

or (this would make more sense with a pyarrow.Table):

>>> arr.to_pandas(types_mapper={pa.uuid(): UuidDtype()}.get)
0    cd072cd8-be6f-4f62-ac4c-09c28206e7e3
dtype: uuid

pandas integration#

The ultimate goal for this package is to disappear and the classes to move into the pandas package. This would have several advantages:

  1. As mentioned before, passing a pyarrow.UuidArray to any pandas API could make it automatically convert it to a UuidArray. Currently this results in a generic {Numpy,Arrow}ExtensionArray instead:

    >>> pd.Series(arr)
    0    b'\xcd\x07,\xd8\xbeoOb\xacL\t\xc2\x82\x06\xe7\...
    dtype: object
    
  2. Specifying dtype="uuid" would work the same as dtype=UuidDtype(). Currently it fails:

    >>> pd.Series([uuid4()], dtype="uuid")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: data type 'uuid' not understood