Hdf5 대용량 파일 처리 방법 r

hdf5

hdf5 (Hierarchical Data Format version 5)는 대용량 데이터를 저장하기 위한 파일 포맷이다. 다음과 같은 특징을 요약할 수 있다.
• Easy sharing
• Cross platform
• Fast IO
• Big Data
• Heterogeneous data

데이터 센터들이 여기저기 건립되고 있다. 바야흐로 데이터의 중요성이 이제 건물을 만드는 수준에 이르고야 말았다. 데이터를 보다 효율적으로 저장하는 것이 중요하다. 다시 사용할 수 있어야한다. 컴퓨터 사용자, 시뮬레이션을 하는 사람들에게도 마찬가지이다. 자신의 데이터를 보다 더 잘 보관하는 방법이 필요하다.

hdf5는 주로 과학기술데이터의 포맷으로 적당하다. 기록-후-읽기(save-and-read-only)형태의 사용 방식을 목표로 개발되었기 때문에 데이터 일부를 자주 갱신(update)하는 사용 형태에는 맞지 않는다. BSD 스타일의 라이선스를 채택하기때문에 수정, 배포, 상용 프로그램 사용 등에 자유롭다.

unformatted 파일 양식은 기기 의존성이 있다. 따라서, 일반으로 컴퓨터가 바뀌면 그 파일을 사용할 수 없게된다는 것이다. 파일 전달하고 사용할 때 문제가 발생한다.

XML처럼 자기기술적으로 구성되어 있어 데이터 형식을 파일 안에 기술 가능하다. (디렉토리, 데이터셋 등을 만들 수 있다.)
많은 양의 데이터를 저장 가능하다.
검색 속도가 빠르다.
병렬 입출력을 지원한다.
데이터의 무작위 조회(Random Access) 가 가능하다.
20여년 이상 개발되어온 포맷으로 안정적이다.
수많은 프로그래밍 언어와 오픈소스 라이브러리를 통한 API가 지원된다.

사용자가 저장하고자 하는 데이터를 사용자게 그룹/데이터셋 형식으로 정의할 수 있다.

최상위에는 "/" 라는 문자열로 표시되는 루트(root) 노드가 존재한다. 루트 노드의 아래로 그룹(Group)이라는 객체와 데이터셋(Dataset)이라는 객체가 있다. 당연하게도 그룹은 복수의 자식 그룹과 데이터셋을 가질 수 있다. HDF5 Data Model은 크게 두 가지 종류의 객체로 구분된다.

• 그룹(group) : 데이터 셋 인스턴스 구조를 그룹핑한다. 메타 데이터를 지원한다.
• 데이터셋(dataset) : 다차원 배열 등 데이터 관리한다. 메타 데이터를 지원한다.

그룹은 사전처럼 작동하고, 데이터셋은 Numpy 배열처럼 작동한다.

특성과 속성

HDF5 객체 특징 중 하나는 특성(property)이다. 여러가지 디폴트 특성이 있는 데, 데이터셋의 저장 레이아웃을 정의하는 contiguous(default), chunked, chunked & compressed 등이 이런 특성 중 하나이다.

---------------------------------------------------------------------------------------------------------------------

import h5py

f = h5py.File("name.hdf5")
f.close()

f = h5py.File("name.hdf5", "w")     # New file overwriting any existing file
f = h5py.File("name.hdf5", "r ")      # Open read-only (must exist)
f = h5py.File("name.hdf5", "r+")    # Open read-write (must exist)
f = h5py.File("name.hdf5", "a")     # Open read-write (create if doesn't exist)

f = h5py.File("testfile.hdf5")

arr = np.ones((5,2))
f["my dataset"] = arr
dset = f["my dataset"]
dset
<HDF5 dataset "my dataset": shape (5, 2), type "<f8">

dset.dtype
dtype('float64')

dset.shape
(5, 2)

---------------------------------------------------------------------------------------------------------------------

import h5py

filename ='file.hdf5'

f = h5py.File(filename,'r')

# List all groups

print("Keys: %s"% f.keys())

a_group_key = list(f.keys())[0]

# Get the data

data = list(f[a_group_key])

import h5py

# Create random data

import numpy as np

data_matrix = np.random.uniform(-1,1, size=(10,3))

# Write data to HDF5

data_file = h5py.File('file.hdf5','w')

data_file.create_dataset('group_name', data=data_matrix)

data_file.close()

---------------------------------------------------------------------------------------------------------------------

with h5py.File('big1.hdf5','w') as f1:
f1['big'] = bigdata

with h5py.File('big2.hdf5','w') as f2:
f2.create_dataset('big', data=bigdata, dtype=np.float32)

f1 = h5py.File("big1.hdf5")

f2 = h5py.File("big2.hdf5")
f1['big'].dtype
dtype('float64')
f2['big'].dtype
dtype('float32')

---------------------------------------------------------------------------------------------------------------------

import numpy as np

import h5py

data_to_write = np.random.random(size=(100,20))

with h5py.File('name-of-file.h5','w') as hf:

hf.create_dataset("name-of-dataset", data=data_to_write)

with h5py.File('name-of-file.h5','r')as hf:

data = hf['name-of-dataset'][:]

---------------------------------------------------------------------------------------------------------------------

import numpy as np

import h5py

a=np.random.random(size=(100,20))

h5f=h5py.File('data.h5', 'w')

h5f.create_dataset('dataset_1', data=a)

h5f.close()

h5f=h5py.File('data.h5','r')

b=h5f['dataset_1'][:]

h5f.close()

np.allclose(a,b)

---------------------------------------------------------------------------------------------------------------------

hf = h5py.File('/path/to/file','r')

data = hf.get('dataset_name').value # `data` is now an ndarray.

import numpy as np

import h5py

hf = h5py.File('path/to/file.h5','r')

n1 = np.array(hf["dataset_name"][:]) #dataset_name is the same as hdf5 object name

print(n1)

---------------------------------------------------------------------------------------------------------------------

hf = h5py.File('path/to/file','r')

n1 = np.zeros(shape, dtype=numpy_type)

hf['dataset_name'].read_direct(n1)

hf.close()

with h5py.File('name-of-file.h5','r')as hf:

data = hf['name-of-dataset'][:]

---------------------------------------------------------------------------------------------------------------------

frame = pd.DataFrame({'a': np.random.randn(100000)})

store = pd.HDFStore('mydata.h5')

frame.to_hdf('mydata.h5','obj1', format='table')

store.close()

---------------------------------------------------------------------------------------------------------------------

f = file("tmp.bin","wb")

np.save(f,a)

np.save(f,b)

np.save(f,c)

f.close()

f = file("tmp.bin","rb")

aa = np.load(f)

bb = np.load(f)

cc = np.load(f)

f.close()

---------------------------------------------------------------------------------------------------------------------

import numpy as np
import h5py
d1 = np.random.random(size = (1000,20))
d2 = np.random.random(size = (1000,200))
hf = h5py.File('data.h5', 'w')
hf.create_dataset('dataset_1', data=d1)
hf.create_dataset('dataset_2', data=d2)
hf.close()

hf = h5py.File('data.h5', 'r')
hf.keys()
n1 = hf.get('dataset_1')

n1 = np.array(n1)

hf.close()

d1 = np.random.random(size = (100,33))
d2 = np.random.random(size = (100,333))
d3 = np.random.random(size = (100,3333))

hf = h5py.File('data.h5', 'w')

g1 = hf.create_group('group1')

g1.create_dataset('data1',data=d1)
g1.create_dataset('data2',data=d2)

g2 = hf.create_group('group2/subfolder')

g2.create_dataset('data3',data=d3)

group2 = hf.get('group2/subfolder')

group2.items()

group1 = hf.get('group1')

group1.items()

n1 = group1.get('data1')
np.array(n1).shape

hf.close()

hf = h5py.File('data.h5', 'w')

hf.create_dataset('dataset_1', data=d1, compression="gzip", compression_opts=9)
hf.create_dataset('dataset_2', data=d2, compression="gzip", compression_opts=9)

hf.close()

---------------------------------------------------------------------------------------------------------------------

from tempfile import TemporaryFile

outfile=TemporaryFile()

x=np.arange(10)

np.save(outfile, x)

outfile.seek(0) # only needed here to simulate closing & reopening file

np.load(outfile)

---------------------------------------------------------------------------------------------------------------------

h5f = h5py.File('test.h5', 'w')
h5f.create_dataset('array1', data=np.array([1,2,3,4]))
h5f.create_dataset('array2', data=np.array([5,4,3,2]))
h5f.close()

# Now open it back up and read data
h5f = h5py.File('test.h5', 'r')
a = h5f['array1'][:]
b = h5f['array2'][:]
h5f.close()
print(a)
print(b)
# [1 2 3 4]
# [5 4 3 2]

---------------------------------------------------------------------------------------------------------------------

import h5py

filename ='../Results/someFileName.h5'

data = h5py.File(filename,'r')

forgroupin data.keys():

print (group)

for dset in data.[group]keys():

print(dset)

ds_data = h5f[group][dset] # returns HDF5 dataset object

print (ds_data)

print(ds_data.shape, ds_data.dtype)

arr = h5f[group][dset][:] # adding [:] returns a numpy array

print (arr.shape, arr.dtype)

print(arr)

---------------------------------------------------------------------------------------------------------------------

import h5py, numpy as np

x = np.arange(10)
y = np.array([100, 101, 102, 103, 104, 105, 106, 107])
z = {'X': x, 'Y': y}

with h5py.File('file.h5', 'w', libver='latest') as f: # use 'latest' for performance
for k, v in z.items():
f.create_dataset('dict/'+str(k), data=v)

with h5py.File('file.h5', 'r', libver='latest') as f:
x_read = f['dict']['X'][:] # [:] syntax extracts numpy array into memory
y_read = f['dict']['Y'][:]

print(x_read)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

---------------------------------------------------------------------------------------------------------------------

h5ls

h5ls -vlr

h5dump

---------------------------------------------------------------------------------------------------------------------

grp = f.create_group("bar")
grp.name
'/bar'
subgrp = grp.create_group("baz")
subgrp.name
'/bar/baz'

grp2 = f.create_group("/some/long/path")
grp2.name
'/some/long/path'
grp3 = f['/some/long']
grp3.name
'/some/long'

dset = f.create_dataset("default", (100,))
dset = f.create_dataset("ints", (100,), dtype='i8')

arr = np.arange(100)
dset = f.create_dataset("init", data=arr)

---------------------------------------------------------------------------------------------------------------------

def csavariables_write():

afile=open("dump.txt",'w')

astring=str(info.ndim)+' '+str(info.nx0)+' '+ str(info.ny0)+ '\n'

afile.write(astring)

astring=str(info.npop)+'\n'

afile.write(astring)

qlist=[]

for i in range(info.npop):

qlist.append(info.sobjective[i])

jindx=sorted(range(len(qlist)), key=lambda i: qlist[i])

for i0 in range(info.npop):

i=jindx[i0]

astring=' '

for j in range(info.ndim):

astring=astring+str(info.staticlist[i][j])+' '

astring=astring+'\n'

afile.write(astring)

qlist=[]

for i in range(info.npop):

qlist.append(info.dobjective[i])

jdynamic=sorted(range(len(qlist)), key=lambda i: qlist[i])

for i0 in range(info.npop):

i=jdynamic[i0]

astring=' '

for j in range(info.ndim):

astring=astring+str(info.dynamiclist[i][j])+' '

astring=astring+'\n'

afile.write(astring)

astring=' '

for i0 in range(info.npop):

i=jindx[i0]

astring=astring+str(info.sobjective[i])+' '

astring=astring+'\n'

afile.write(astring)

astring=' '

for i0 in range(info.npop):

i=jdynamic[i0]

astring=astring+str(info.dobjective[i])+' '

astring=astring+'\n'

afile.write(astring)

afile.close()

def csavariables_load():

if os.path.isfile("dump.txt"):

load=1

if os.path.isfile("objective.txt") and info.lfirst:

shutil.copyfile("objective.txt", "objective_1.txt")

os.remove("objective.txt")

if os.path.isfile("best_model.txt") and info.lfirst:

shutil.copyfile("best_model.txt", "best_model_1.txt")

os.remove("best_model.txt")

if os.path.isfile("csa.txt") and info.lfirst:

shutil.copyfile("csa.txt", "csa_1.txt")

os.remove("csa.txt")

info.lfirst=False

else :

load=0

info.staticlist=[ [random.random()*0.02+0.49 for j in range(info.ndim)] for i in range(info.npop)]

info.dynamiclist=[ [random.random()*0.02+0.49 for j in range(info.ndim)] for i in range(info.npop)]

info.sobjective=[random.random()*8e99 for i in range(info.npop)]

info.dobjective=[random.random()*8e99 for i in range(info.npop)]

if os.path.isfile("objective.txt") and info.lfirst:

shutil.copyfile("objective.txt", "objective_old.txt")

os.remove("objective.txt")

if os.path.isfile("best_model.txt") and info.lfirst:

shutil.copyfile("best_model.txt", "best_model_old.txt")

os.remove("best_model.txt")

if os.path.isfile("csa.txt") and info.lfirst:

shutil.copyfile("csa.txt", "csa_1.txt")

os.remove("csa.txt")

info.lfirst=False

return load

afile=open("dump.txt" ,'r')

line=afile.readline()

info.ndim=int(line.split()[0])

info.nx0=int(line.split()[1])

info.ny0=int(line.split()[2])

line=afile.readline()

info.npop=int(line.split()[0])

info.staticlist=[ [random.random()*0.02+0.49 for j in range(info.ndim)] for i in range(info.npop)]

info.dynamiclist=[ [random.random()*0.02+0.49 for j in range(info.ndim)] for i in range(info.npop)]

info.sobjective=[random.random()*8e99 for i in range(info.npop)]

info.dobjective=[random.random()*8e99 for i in range(info.npop)]

trialxvector=[random.random()*0.02+0.49 for j in range(info.ndim)]

for i in range(info.npop):

line=afile.readline()

for j in range(info.ndim):

trialxvector[j]=float(line.split()[j])

info.staticlist[i]=[trialxvector[j] for j in range(info.ndim)]

for i in range(info.npop):

line=afile.readline()

for j in range(info.ndim):

trialxvector[j]=float(line.split()[j])

info.dynamiclist[i]=[trialxvector[j] for j in range(info.ndim)]

line=afile.readline()

for i in range(info.npop):

info.sobjective[i]=float(line.split()[i])

line=afile.readline()

for i in range(info.npop):

info.dobjective[i]=float(line.split()[i])

afile.close()

return load

---------------------------------------------------------------------------------------------------------------------

출력 리스트와 파일 이름으로 계속해서 추가하여 데이터를 파일에 적는 방식:

---------------------------------------------------------------------------------------------------------------------

hdf5 대용량 파일 처리 방법 r 방법

Hdf5 대용량 파일 처리 방법 r

관련 게시물

파일 복사 안되게 하는 방법

쉽게 살 빼는 방법

바이 낸스 모바일 거래 방법

윈도우 10 프로그램 우선 순위 변경 방법

아이폰 5 ios 11 업데이트 방법

Kt 지하철 와이파이 사용 방법

폴아웃4파하버 올바른 삶을 사는 방법

우체국 택배 방문 접수 방법

아이폰 앱 오픈소스 표기 방법

오트밀 먹는 방법

광고하는

최근 소식

세븐틴의 어느 멋진 날 일본 10화

오늘 축구 몇 시

한자어 중 수식의 방법

유니티로 게임을 만드는 10가지 방법 pdf 토렌트

안드로이드 ios 앱 사용 공유 방법

Nvidia dlss 적용 하는 방법

내장 지방 빼는 방법

사회 조사 방법론 ppt

드래곤 케이브 트레이드 하는 방법

아이폰 시스루 밝기 조절 방법

광고하는

포퓰러

광고하는

에 대한

합법적인

돕다

사회의