orbiter
b0b7a4f9a5
- added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
...
- added monitoring for retrieved records
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6444 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
350d13e153
very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
...
no automatic harvesting by now, this will be done later
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6443 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
58616d99e4
patch for yacy disk usage detection on lvm host
...
by Michael S.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6442 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
79251e6f60
configurable disk space hardlimit for dht
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6441 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a0e891c63d
- some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
...
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4240785f20
added anti-alias function for line drawing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6438 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
30f108f97d
added stub of oai-pmh importer (not working yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6437 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
77c99e500f
added more control over memory allocation
...
should avoid some of the OOMs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6436 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
52470d0de4
- fix for xls parser
...
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5e8038ac4d
- refactoring of blacklists
...
- refactoring of event origin encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6434 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
26fafd85a5
- more refactoring
...
- fixed problem with parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3528b970d6
- refactoring
...
- added new experimental (not-yet-working) image parser
- added new test image
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a8ce192f63
- shifted main classes to new package net.yacy
...
- fixed some bugs in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6427 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b79f4f062f
refactoring of yacy documents and parsers: they depend now only on the kelondro classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
0fd9540866
Configuration of HTTPDProxyHandler logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6425 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
cee7a05ff2
- de-serialized the pdf parser
...
- added fail callback for file indexer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6415 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9db928ce53
replaced fontbox 0.7.3 with fontbox 0.8.0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6414 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c2272785c7
- fix for xlsx and pptx parsing
...
- less exception logging for swf parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6413 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c864901087
- moved httpd.mime to defaults path
...
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
8829ec5f18
*) made sure that is replaced with a space and not just deleted in CharacterCoding.java
...
*) added annotations and made minor changes to serverObjects.java
*) set subversion properties for several files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6409 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6c347a37eb
more options for DocumentIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6408 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6192205533
more final modifier
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6407 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0f6b011e1a
fix for new index location and better way to use own classes by reflection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6406 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7a3bbd950f
:-(
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6405 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b953f04f90
one more reflection fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6404 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
77d6604856
fix for npe, see http://forum.yacy-websuche.de/viewtopic.php?p=17727#p17727
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6403 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2a7fe35f92
performance tuning using more final modifiers in the kelondro core
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6402 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
cb4de9ceee
fixed a bug in table iterator (did not recognize elements in write buffer)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6401 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e7f18ba24b
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6399 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ce8dc575ca
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bea3b99aff
moved table and util classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd876eb4b7
moved io classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6396 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e0e1f422
moved blob classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6395 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1e4f8b56ed
accumulated classes from different packages into the new rwi package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6394 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
194da25a2f
moved kelondro index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6393 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4446acc8cd
moved kelondro order
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6392 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f677d534b1
start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
...
- moved here the logging classes as part of the new net.yacy.kelondro package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6391 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ea473e32b8
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6390 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
735e2737e3
* added index segments
...
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
09de5da74a
once again a performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6388 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2f6d88403e
uä
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6387 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d2615ea5a8
increased memory for scraper buffer to enhance parsing speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6386 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4bbbb74ec4
removed not necessary synchronization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6385 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
67e5464cc2
Fix for SVN6380: x[] Arrays are unsuitable Keys for Maps without using a proper Comparator.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6384 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
aeab8c7917
Prevent failed DHT attemps from overwriting newer peer info
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6382 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
9324b5b6c5
Enhancements to DHT
...
- speed up deletion of containers when selscted from whole index
- correctly eliminate all references to unavailable URLs, not just the first encountered
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6381 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
e49e2d75fe
Limit the time Transmission.Chunks stay in the transmissionCloud by using a Map that iterates entires in insertion order.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6380 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
92db7c5d07
increased timeout for index retrieval
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6379 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
386b9f35f6
activated resource observer for windows 7
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6378 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6e0dc39a7d
- some fixes to prevent blocking situations
...
- better logging for the crawler
- better default values for the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6377 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago