digest.py 1.3 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243
  1. class Digest:
  2. '''
  3. Various digest functions. May be merged into something more generic later.
  4. '''
  5. @staticmethod
  6. def rdf_cksum(g):
  7. '''
  8. Generate a checksum for a graph.
  9. This is not straightforward because a graph is derived from an
  10. unordered data structure (RDF).
  11. What this method does is ordering the graph by subject, predicate,
  12. object, then creating a pickle string and a checksum of it.
  13. N.B. The context of the triples is ignored, so isomorphic graphs would
  14. have the same checksum regardless of the context(s) they are found in.
  15. @TODO This can be later reworked to use a custom hashing algorithm.
  16. @param rdflib.Graph g The graph to be hashed.
  17. @return string SHA1 checksum.
  18. '''
  19. # Remove the messageDigest property, which at this point is very likely
  20. # old.
  21. g.remove((Variable('s'), nsc['premis'].messageDigest, Variable('o')))
  22. ord_g = sorted(list(g), key=lambda x : (x[0], x[1], x[2]))
  23. hash = sha1(pickle.dumps(ord_g)).hexdigest()
  24. return hash
  25. @staticmethod
  26. def non_rdf_checksum(data):
  27. '''
  28. Generate a checksum of non-RDF content.
  29. @TODO This can be later reworked to use a custom hashing algorithm.
  30. '''
  31. return sha1(data).hexdigest()