The RDF Import feature of RDFme allows to populate Drupal with data extracted from either local or remote RDF datasets. However this feature is still in the beta version. The difference between this functionality and the Node import module is obviously the input data format. On one hand Node import is a mature and efficient module but on the other RDF/XML and the mappings system of RDFme allows a lot of new flexibility.
If configured properly RDFme should allow to import even very large datasets quite fast and painlessly. Nevertheless there are still some issues or simplifications. The following guide is supposed to be a set of hints how to prepare your mappings in order to import RDF without much problems.
Limitations and specific datatypes settings:
- [Mappings] In the current version (1.4 and earlier) there is no separate set of mappings for export and import. Therefore, save your mappings to export_mappings.xml and change them to fit the imported data.
- [Post Authors] During import, the users (as in post authors) referenced in the imported data are ignored. The author of all the imported nodes/comments is set to Anonymous by default.
- [CCK fields] All CCK fields are handled normally but there is special treatment for non-CCK fields added by popular modules (e.g. comments module add “map_comment” field to every node). Here is a list of the handled special non-CCK fields (by module):
- [Taxonomy] If the nodes have references to drupal taxonomies, the vocabulary and terms have to be created beforehand manually in Drupal
- [Comment] If there exists a mapping for comment id (cid) the imported comments will first try to update the existing, if the update fails (eg. invalid id), the comment will be saved into the database with a new id. (warning: in RDFme 1.3 and lower in case of failed update comments are not saved at all, please update to RDFme v1.4)
- [Workflow] For RDFme v1.4, the workflow states will be imported correctly only if ‘workflow’ module is present and also if the given workflow state name exactly matches the name in the database (workflow_states.state). So before importing the states this should be adjusted.
- [Voting API] If the 1>value>0 then the rating is treated as percent, otherwise as points. Voting API is fairly simple so the integration should work without many problems. However be advised that the voting widgets use the API is different way (e.g. five star does not recognise points only percent etc.).
- [OPAL] All fields from opal are imported normally, however on the first visit of the nodes the values shall be recalculated anyway.
- [Date] Date is a special CCK field that has to have proper format in order to get imported correctly
Importing large datasets:
There are number of limitations and platform specific settings that you should take care of before starting to work with large datasets.
[Php.ini] Execution time settings (more info):
- max_execution_time
- max_input_time
[Php.ini] Memory size settings (more info):
- memory_limit
- upload_max_filesize
- post_max_size
Performance and statistics:
The import feature performance is extreemly dependant of characteristics of the data (primarily the size of the RDF/XML file but also more specific metrics, e.g.: how much text the ideas have, how many comments, how many triples, how many attributes per idea etc.).
Furthermore, the results will very much depend on the hardware capabilities. The table beneath is just supposed to give a rough idea how fast can the plugin handle different datasets.
Source | # Ideas | # Commnets | # of Triples | Time | ||||
total | per idea | |||||||
average | max | min | ||||||
ETSIT Ideas | 16 | 6 | 1237 | 24.17s | 1.508s | 2.354s | 1.022s | |
Dell IdeaStorm | 9,851 | 65,222 | 520330 | 12h 37m 20.39s | 3.194s | 35.837s | 1.947s | |
myStarbucks Ideas | 10,949 | 21,870 | 194086 | 6h 2m 23.022s | 1.859s | 4.389s | 1.109s | |
Cisco I-Prize | 826 | 7,728 | 133413 | 8m 3.99s | 0.341s | 0.787s | 0.239s | |
Acrobat Ideas | 579 | 767 | 17859 | 1m 1.362s | 0.097s | 0.286s | 0.064s | |
*The above tests where run on a desktop computer with 2 Ghz Core 2 Duo / 2gb RAM. |