lotus
85ca96227f
fix for re-enable parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6643 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
38a3d55afd
added more possible php extensions for html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6621 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
69c29acb6e
no exception thread dump if parser cannot parse becuase that mime-type/extension is in the deny-set
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6611 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
56e0d9bd01
- testings with image parser
...
- added image size as part of parsed text in images
- avoid unnecessary error messages if parsing of documents failed but one succeeded
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6597 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
7d400b17d0
html parser support for .cfm files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6590 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f6731c6240
more logging etc.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6589 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
007f8297de
added php3 as extension type for html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6588 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5df628a2a4
- added BEncoder class
...
- added BEncodedHeap class that encodes B data structures and stores that to a heap
- refactoring of MapView, this is now named MapHeap to fit into the naming scheme of the BEncodedHeap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6579 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
82f57f79e5
more PMD enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6576 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a06f7ddb33
more PMD recommendations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6572 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
66c0a8e849
more PMD recommendations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6567 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2113fcd7e5
- fixed usage of isEmpty() which is not available in java 1.5
...
- increased visibility of some methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6564 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
dd459281c8
applied code changes that are recommended by PMD
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3f771d2a16
fix for rss parser: be lazy when rss is not well-formed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6552 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
dff4f95c78
some patches to get the torrent parser working
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6551 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fbd24c2d84
integrated the torrent parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6547 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bd32f8b8cb
added a torrent metadata file parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6546 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a37878b7d5
url parser regex performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6524 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
8281e29963
- more configuration for profiling graph (number of events)
...
- more logging for a shutdown: print reason and accessing IP into log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e34e63a039
preset of proper HashMap dimensions: should prevent re-hashing and increase performance
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4a5100789f
replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
491ba6a1ba
- some refactoring in workflow
...
- some refactoring in search process
- fixed image search for json and rss output
- search navigation on bottom of search result page in cases where there are more than 6 results on page
- fixes for number of displayed documents
- disabled pseudostemming
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6504 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
969123385b
added json and rss output for image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6503 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d183f8d980
refactoring (moved code from ContentTransformer to TemplateEngine)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6498 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4df88a4e7a
- fixes for missing or bad hashCode computation
...
- fixes for bad equals() methods that had not been used by hash maps and therefore some classes did not work as objects in hash maps.
- this may also affect some cases where double-checks should have been, but did not work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6495 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
dbdf2570ba
added comparator and more fixes for SortStack/SortStore
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6494 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d2938c44a1
- added bmp parser to the document parsers
...
- image parser that implement the document parser interface return itself in the list of images of the document which should cause that the parsed images contribute to the image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6493 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
06d0dcde20
more enhancements to image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6490 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4c6312d103
enhanced image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6489 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2d8f3ee301
some performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6488 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1a146b0d73
added a patch to ignore bad mime-ignore patterns
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6474 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
29fe436e36
- fixed post-ranking including prefer mask
...
- enhanced a core database access method / less wasted ram
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6473 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a97fdb4566
catch for NPE in image parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6470 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
cd6745b292
accept rss feeds without channel descriptions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6464 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
08f1cbb125
another update to the pdf parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6463 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
605e896d6c
more details for exception catching when parsing pdfs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6461 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4431b9767e
added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
19f31bb043
- moved OAI-PMH source list file from SETTINGS to DICTIONARIES/harvesting
...
- added convenience method for loading of files from the web in LoaderDispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6455 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
11f7da06ed
- fixes to csv parser
...
- automatic OAI-PMH import by just clicking on one link from the provided resource list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6449 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
9b6762ec2e
- added a csv "comma separated values" parser to parse OAI-PMH sources from
...
http://roar.eprints.org/index.php?action=csv
- integrated the csv parser into the crawlers parser list
- added an extension to the OAI-PMH import function to download and show the roar csv file using the csv parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6448 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
176e334aa4
fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6446 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2fa6bf440b
workflow update to OAI-PMH importer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6445 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b0b7a4f9a5
- added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
...
- added monitoring for retrieved records
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6444 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
350d13e153
very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
...
no automatic harvesting by now, this will be done later
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6443 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a0e891c63d
- some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
...
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
30f108f97d
added stub of oai-pmh importer (not working yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6437 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
52470d0de4
- fix for xls parser
...
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
26fafd85a5
- more refactoring
...
- fixed problem with parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3528b970d6
- refactoring
...
- added new experimental (not-yet-working) image parser
- added new test image
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a8ce192f63
- shifted main classes to new package net.yacy
...
- fixed some bugs in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6427 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b79f4f062f
refactoring of yacy documents and parsers: they depend now only on the kelondro classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago