Environment
- Python 2.7.6
Installing the Python Driver for MongoDB - MongoDB 2.6.7
Installing MongoDB on Ubuntu 14.04 - Ubuntu 14.04
JSON Support for Python
Official Documentation: Simplejson is a simple, fast, complete, correct and extensible JSON encoder and decoder for Python 2.5+ and Python 3.3+. It is pure Python code with no dependencies, but includes an optional C extension for a serious speed boost.
Install simple-json using pip:
sudo pip install simple-json
Writing to MongoDB
# -*- coding: utf-8 -*-
import argparse import datetime import pprint import pymongo import json import os import sys import fnmatch
## ARGPARSE USAGE ## <https://docs.python.org/2/howto/argparse.html>
parser = argparse.ArgumentParser(description="Import records into MongoDB") group = parser.add_mutually_exclusive_group() group.add_argument("-v", "--verbose", action="store_true") group.add_argument("-q", "--quiet", action="store_true") parser.add_argument("max", type=int, help="the maximum records to import", default=sys.maxint) parser.add_argument("path", help="The input path for importing. This can be either a file or directory.") parser.add_argument("db", help="The MongoDB name to import into.") parser.add_argument("collection", help="The MongoDB collection to import into.") args = parser.parse_args()
## RETRIEVE files from filesystem
def getfiles(path) : if len(path) <= 1 : print "!Please Supply an Input File" return [] try : input_path = str(path).strip() if os.path.exists(input_path) == 0 : print "!Input Path does not exist (input_path = ", input_path, ")" return [] if os.path.isdir(input_path) == 0 : if args.verbose : print "*Input Path is Valid (input_path = ", input_path, ")" return [input_path] matches = [] for root, dirnames, filenames in os.walk(input_path): for filename in fnmatch.filter(filenames, '*.json'): matches.append(os.path.join(root, filename)) if len(matches) > 0 : if args.verbose : print "*Found Files in Path (input_path = ", input_path, ", total-files = ", len(matches), ")" return matches print "!No Files Found in Path (input_path = ", input_path, ")" except ValueError : print "!Invalid Input (input_path, ", input_path, ")" return []
## IMPORT records into mongo
def read(jsonFiles) : from pymongo import MongoClient client = MongoClient('mongodb://
localhost
:27017/') db = client[args.db] counter = 0 for jsonFile in jsonFiles : with open(jsonFile, 'r') as f: for line in f: # load valid lines (should probably use rstrip) if len(line) < 10 : continue try: db[args.collection].insert(json.loads(line)) counter += 1 except pymongo.errors.DuplicateKeyError as dke: if args.verbose : print "Duplicate Key Error: ", dke except ValueError as e: if args.verbose : print "Value Error: ", e # friendly log message if 0 == counter % 100 and 0 != counter and args.verbose : print "loaded line: ", counter if counter >= args.max : break f.close db.close if 0 == counter : print "Warning: No Records were Loaded" else : print "loaded a total of ", counter, " lines" ## EXECUTE files = getfiles(args.path) read(files)
This will write to MongoDB.
Command line usage is:
python import.py 1000 /media/data/records.json mydb mycollection -v
The -v flag is optional and will log in a verbose manner to the console.
Other Considerations
I've noticed that twitter data from the GNIP firehose can be imported directly into MongoDB.
On the other hand, Java objects serialized into JSON using the GSON package need to be restructured. For example, this an array of objects deserialized using GSON will look like this:
[
{ name : "item1" },
{ name : "item2" },
{ name : "item-n" }
]
If you use a web validator / formatter, such as JsonEditorOnline, this output will be parsed correctly, like this:
However, MongoDB doesn't like this syntax, and prefer this approach:
{ name : "item1" }
{ name : "item2" }
{ name : "item-n" }
Note the absence of both commas to separate the items and the lack of braces at the beginning and end of the structure.
MacOS
The instructions don't vary greatly.I prefer to use a virtualenv on my local dev environment. Virtualenv is described in this blog post here.
Set up the virtualenv on the terminal:
virtualenv --system-site-packages . source bin/activate
Once inside the virtualenv, install pymongo:
(data-imdb-populate-mongo)~/workspaces/data-imdb-populate-mongo$ pip install pymongo Collecting pymongo Downloading pymongo-3.2-cp27-none-macosx_10_8_intel.whl (263kB) 100% |████████████████████████████████| 266kB 1.4MB/s Installing collected packages: pymongo Successfully installed pymongo-3.2
References
- Python Argparse
- The first part of this program uses argparse to access the command line arguments from the user to the program
- [Offical Documentation] PyMongo Tutorial
- This tutorial is intended as an introduction to working with MongoDB and PyMongo
- Unix ULIMIT settings
- I've noticed the bulk insert with PyMongo has a tendency to run out of memory. This details a method for limiting and controlling the usage of system resources that might help.
- [StackOverflow] PyMongo Bulk Insert Runs out of memory
- [MongoDB JIRA] Bug Report (fixed)
Really nice post . Especially the exceptions your are catching in your try catch they can be a pain.
ReplyDeleteI'm getting the following error message and my one json file (myfile.json) isn't importing:
ReplyDeleteTypeError: 'unicode' object does not support item assignment
It also says the following early in the error stream:
DeleteValue Error: Extra data: line 1 column 9 - line 2 column 1 (char 8 - 20)
Value Error: Extra data: line 1 column 13 - line 2 column 1 (char 12 - 16)
Value Error: Extra data: line 1 column 13 - line 2 column 1 (char 12 - 26)
Value Error: Extra data: line 1 column 11 - line 2 column 1 (char 10 - 24)
...etc.
And keeps going for about 20 lines.
Not sure what the problem may be.
Figured it out. The section in the read(jsonFiles) function needs to be...
Delete-----------
for jsonFile in jsonFiles:
with open(jsonFile) as f:
data = f.read()
jsondata = json.loads(data)
try:
db[args.collection].insert(jsondata)
counter += 1
etc.
-----------
I tried this with importing a single json document and it worked.
Take care of Problem in Importing MongoDB Database with MongoDB Technical Support
ReplyDeleteOn the off chance that you discover any issue with respect to MongoDB like, not ready to import MongoDB database at that point attempt beneath recorded strides to unravel your bringing in issue. Initially you need to check how vast the accumulation is then check do both the servers have same measure of physical memory or not. Subsequent to attempting these means if as yet confronting a similar issue at that point contact to MongoDB Online Support or MongoDB Customer Support USA.
For More Info: https://cognegicsystems.com/
Contact Number: 1-800-450-8670
Email Address- info@cognegicsystems.com
Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801
Best tution classes in Gurgaon
ReplyDeleteclass 9 tuition classes in gurgaon
class 10 tuition classes in gurgaon
class 11 tuition classes in gurgaon
what is microsoft azure
azure free trial account
azure adf
azure data factory interview questions
azure certification path
azure traffic manager
Great information, better still to find out your blog that has a great layout. Nicely done https://python.engineering/python-extract-words-from-given-string/
ReplyDeletefon perde modelleri
ReplyDeleteMobil Onay
mobil ödeme bozdurma
nft nasıl alınır
ankara evden eve nakliyat
trafik sigortası
dedektör
Web Site Kurmak
Aşk Kitapları